Limits...
Defining the plasticity of transcription factor binding sites by Deconstructing DNA consensus sequences: the PhoP-binding sites among gamma/enterobacteria.

Harari O, Park SY, Huang H, Groisman EA, Zwir I - PLoS Comput. Biol. (2010)

Bottom Line: By partitioning a motif into sub-patterns, computational advantages for classification were produced, resulting in the discovery of new members of a regulon, and alleviating the problem of distinguishing functional sites in chromatin immunoprecipitation and DNA microarray genome-wide analysis.Moreover, we found that certain partitions were useful in revealing biological properties of binding site sequences, including modular gains and losses of PhoP binding sites through evolutionary turnover events, as well as conservation in distant species.Instead, the divergence may be attributed to the fast evolution of orthologous target genes and/or the promoter architectures resulting from the interaction of those binding sites with the RNA polymerase.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain.

ABSTRACT
Transcriptional regulators recognize specific DNA sequences. Because these sequences are embedded in the background of genomic DNA, it is hard to identify the key cis-regulatory elements that determine disparate patterns of gene expression. The detection of the intra- and inter-species differences among these sequences is crucial for understanding the molecular basis of both differential gene expression and evolution. Here, we address this problem by investigating the target promoters controlled by the DNA-binding PhoP protein, which governs virulence and Mg(2+) homeostasis in several bacterial species. PhoP is particularly interesting; it is highly conserved in different gamma/enterobacteria, regulating not only ancestral genes but also governing the expression of dozens of horizontally acquired genes that differ from species to species. Our approach consists of decomposing the DNA binding site sequences for a given regulator into families of motifs (i.e., termed submotifs) using a machine learning method inspired by the "Divide & Conquer" strategy. By partitioning a motif into sub-patterns, computational advantages for classification were produced, resulting in the discovery of new members of a regulon, and alleviating the problem of distinguishing functional sites in chromatin immunoprecipitation and DNA microarray genome-wide analysis. Moreover, we found that certain partitions were useful in revealing biological properties of binding site sequences, including modular gains and losses of PhoP binding sites through evolutionary turnover events, as well as conservation in distant species. The high conservation of PhoP submotifs within gamma/enterobacteria, as well as the regulatory protein that recognizes them, suggests that the major cause of divergence between related species is not due to the binding sites, as was previously suggested for other regulators. Instead, the divergence may be attributed to the fast evolution of orthologous target genes and/or the promoter architectures resulting from the interaction of those binding sites with the RNA polymerase.

Show MeSH
Characterization of clustering methods.Generic data (red dots), clustering partitions (circle and ovals), and their membership scopes as defined by their most characteristic distance metrics and algorithm are represented. A) Substractive clustering [30] applied to sparsely distributed datasets. B) Crisp clustering [28] (e.g., K-means) applied to fuzzy datasets. C) Probabilistic [28] (e.g., Expectation-Maximization) or fuzzy clustering (e.g., C-means) [31] applied to datasets with several outliers. D) Hierarchical clustering [33] applied to datasets displaying many patterns with small extent. E) Feature selection clustering [14], [34] applied to datasets harboring patterns involving different sets of features.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2908699&req=5

pcbi-1000862-g001: Characterization of clustering methods.Generic data (red dots), clustering partitions (circle and ovals), and their membership scopes as defined by their most characteristic distance metrics and algorithm are represented. A) Substractive clustering [30] applied to sparsely distributed datasets. B) Crisp clustering [28] (e.g., K-means) applied to fuzzy datasets. C) Probabilistic [28] (e.g., Expectation-Maximization) or fuzzy clustering (e.g., C-means) [31] applied to datasets with several outliers. D) Hierarchical clustering [33] applied to datasets displaying many patterns with small extent. E) Feature selection clustering [14], [34] applied to datasets harboring patterns involving different sets of features.

Mentions: To circumvent the limitations of consensus methods [1], we decomposed BS motifs into sub-patterns [13], [14] by applying the classical Divide & Conquer (D&C) strategy [15], [16]. We then compared different forms of decomposed BS motifs of a TF into families of motifs (i.e., “submotifs”) from a computational clustering perspective (Figure 1). In so doing, we extracted the maximal amount of useful genomic information through the effective handling of the biological and experimental variability inherent in the data, and then combined them into an accurate multi-classifier predictor [13], [17]. Although there is a computational usefulness of the submotifs [13], [14], it was not clear if these families of motifs were just a computational artifact or if they could provide insights into the regulatory process carried out by a regulator and its targets.


Defining the plasticity of transcription factor binding sites by Deconstructing DNA consensus sequences: the PhoP-binding sites among gamma/enterobacteria.

Harari O, Park SY, Huang H, Groisman EA, Zwir I - PLoS Comput. Biol. (2010)

Characterization of clustering methods.Generic data (red dots), clustering partitions (circle and ovals), and their membership scopes as defined by their most characteristic distance metrics and algorithm are represented. A) Substractive clustering [30] applied to sparsely distributed datasets. B) Crisp clustering [28] (e.g., K-means) applied to fuzzy datasets. C) Probabilistic [28] (e.g., Expectation-Maximization) or fuzzy clustering (e.g., C-means) [31] applied to datasets with several outliers. D) Hierarchical clustering [33] applied to datasets displaying many patterns with small extent. E) Feature selection clustering [14], [34] applied to datasets harboring patterns involving different sets of features.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2908699&req=5

pcbi-1000862-g001: Characterization of clustering methods.Generic data (red dots), clustering partitions (circle and ovals), and their membership scopes as defined by their most characteristic distance metrics and algorithm are represented. A) Substractive clustering [30] applied to sparsely distributed datasets. B) Crisp clustering [28] (e.g., K-means) applied to fuzzy datasets. C) Probabilistic [28] (e.g., Expectation-Maximization) or fuzzy clustering (e.g., C-means) [31] applied to datasets with several outliers. D) Hierarchical clustering [33] applied to datasets displaying many patterns with small extent. E) Feature selection clustering [14], [34] applied to datasets harboring patterns involving different sets of features.
Mentions: To circumvent the limitations of consensus methods [1], we decomposed BS motifs into sub-patterns [13], [14] by applying the classical Divide & Conquer (D&C) strategy [15], [16]. We then compared different forms of decomposed BS motifs of a TF into families of motifs (i.e., “submotifs”) from a computational clustering perspective (Figure 1). In so doing, we extracted the maximal amount of useful genomic information through the effective handling of the biological and experimental variability inherent in the data, and then combined them into an accurate multi-classifier predictor [13], [17]. Although there is a computational usefulness of the submotifs [13], [14], it was not clear if these families of motifs were just a computational artifact or if they could provide insights into the regulatory process carried out by a regulator and its targets.

Bottom Line: By partitioning a motif into sub-patterns, computational advantages for classification were produced, resulting in the discovery of new members of a regulon, and alleviating the problem of distinguishing functional sites in chromatin immunoprecipitation and DNA microarray genome-wide analysis.Moreover, we found that certain partitions were useful in revealing biological properties of binding site sequences, including modular gains and losses of PhoP binding sites through evolutionary turnover events, as well as conservation in distant species.Instead, the divergence may be attributed to the fast evolution of orthologous target genes and/or the promoter architectures resulting from the interaction of those binding sites with the RNA polymerase.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain.

ABSTRACT
Transcriptional regulators recognize specific DNA sequences. Because these sequences are embedded in the background of genomic DNA, it is hard to identify the key cis-regulatory elements that determine disparate patterns of gene expression. The detection of the intra- and inter-species differences among these sequences is crucial for understanding the molecular basis of both differential gene expression and evolution. Here, we address this problem by investigating the target promoters controlled by the DNA-binding PhoP protein, which governs virulence and Mg(2+) homeostasis in several bacterial species. PhoP is particularly interesting; it is highly conserved in different gamma/enterobacteria, regulating not only ancestral genes but also governing the expression of dozens of horizontally acquired genes that differ from species to species. Our approach consists of decomposing the DNA binding site sequences for a given regulator into families of motifs (i.e., termed submotifs) using a machine learning method inspired by the "Divide & Conquer" strategy. By partitioning a motif into sub-patterns, computational advantages for classification were produced, resulting in the discovery of new members of a regulon, and alleviating the problem of distinguishing functional sites in chromatin immunoprecipitation and DNA microarray genome-wide analysis. Moreover, we found that certain partitions were useful in revealing biological properties of binding site sequences, including modular gains and losses of PhoP binding sites through evolutionary turnover events, as well as conservation in distant species. The high conservation of PhoP submotifs within gamma/enterobacteria, as well as the regulatory protein that recognizes them, suggests that the major cause of divergence between related species is not due to the binding sites, as was previously suggested for other regulators. Instead, the divergence may be attributed to the fast evolution of orthologous target genes and/or the promoter architectures resulting from the interaction of those binding sites with the RNA polymerase.

Show MeSH