Limits...
Defining the plasticity of transcription factor binding sites by Deconstructing DNA consensus sequences: the PhoP-binding sites among gamma/enterobacteria.

Harari O, Park SY, Huang H, Groisman EA, Zwir I - PLoS Comput. Biol. (2010)

Bottom Line: By partitioning a motif into sub-patterns, computational advantages for classification were produced, resulting in the discovery of new members of a regulon, and alleviating the problem of distinguishing functional sites in chromatin immunoprecipitation and DNA microarray genome-wide analysis.Moreover, we found that certain partitions were useful in revealing biological properties of binding site sequences, including modular gains and losses of PhoP binding sites through evolutionary turnover events, as well as conservation in distant species.Instead, the divergence may be attributed to the fast evolution of orthologous target genes and/or the promoter architectures resulting from the interaction of those binding sites with the RNA polymerase.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain.

ABSTRACT
Transcriptional regulators recognize specific DNA sequences. Because these sequences are embedded in the background of genomic DNA, it is hard to identify the key cis-regulatory elements that determine disparate patterns of gene expression. The detection of the intra- and inter-species differences among these sequences is crucial for understanding the molecular basis of both differential gene expression and evolution. Here, we address this problem by investigating the target promoters controlled by the DNA-binding PhoP protein, which governs virulence and Mg(2+) homeostasis in several bacterial species. PhoP is particularly interesting; it is highly conserved in different gamma/enterobacteria, regulating not only ancestral genes but also governing the expression of dozens of horizontally acquired genes that differ from species to species. Our approach consists of decomposing the DNA binding site sequences for a given regulator into families of motifs (i.e., termed submotifs) using a machine learning method inspired by the "Divide & Conquer" strategy. By partitioning a motif into sub-patterns, computational advantages for classification were produced, resulting in the discovery of new members of a regulon, and alleviating the problem of distinguishing functional sites in chromatin immunoprecipitation and DNA microarray genome-wide analysis. Moreover, we found that certain partitions were useful in revealing biological properties of binding site sequences, including modular gains and losses of PhoP binding sites through evolutionary turnover events, as well as conservation in distant species. The high conservation of PhoP submotifs within gamma/enterobacteria, as well as the regulatory protein that recognizes them, suggests that the major cause of divergence between related species is not due to the binding sites, as was previously suggested for other regulators. Instead, the divergence may be attributed to the fast evolution of orthologous target genes and/or the promoter architectures resulting from the interaction of those binding sites with the RNA polymerase.

Show MeSH
Families of PhoP BSs submotifs in E. coli K-12 and S. typhimurium.The tree represents the hierarchical organization of PhoP submotifs; which are represented by their logos (three nucleotides between the direct repeat tandems are omitted). The root corresponds to the consensus (single) motif (left panel), while general and specific submotifs are ordered from left to right. Sequences conforming to each specific submotifs (gray boxes) and their genomic source are listed on the right panel. The information content of each submotif is displayed below the logos (i.e., the higher the more informative).
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2908699&req=5

pcbi-1000862-g003: Families of PhoP BSs submotifs in E. coli K-12 and S. typhimurium.The tree represents the hierarchical organization of PhoP submotifs; which are represented by their logos (three nucleotides between the direct repeat tandems are omitted). The root corresponds to the consensus (single) motif (left panel), while general and specific submotifs are ordered from left to right. Sequences conforming to each specific submotifs (gray boxes) and their genomic source are listed on the right panel. The information content of each submotif is displayed below the logos (i.e., the higher the more informative).

Mentions: We studied the PhoP BSs found in E. coli K-12 and S. typhimurium that have been reported in the literature [26], [42], as well as our previous work [41]. As a result, we collected 69 DNA sequences corresponding to PhoP BSs, where 31 are BSs from 25 E. coli genes and 38 are BSs from 28 Salmonella genes. Some promoters have more than one BS, and 14 genes are orthologous among these two species [43]. BSs corresponding to promoters for orthologous genes are considered as independent examples, where every sequence instance is considered equally important. For example, the sequences corresponding to the PhoP BSs in the promoters of the E. coli and Salmonella phoP orthologous genes are similar to each other [42], [44], and both sequences belong to the same submotif (Figure 3). In contrast, the PhoP BS sequences in the promoters of the E. coli and Salmonella slyB genes are grouped into different submotifs (Figure 3), despite the orthology of the genes [44]. Furthermore, PhoP binds to the promoter of the Salmonella ugd gene, but it does not bind to the corresponding promoter in the E. coli ugd gene, despite these genes being 88% identical [45], [46].


Defining the plasticity of transcription factor binding sites by Deconstructing DNA consensus sequences: the PhoP-binding sites among gamma/enterobacteria.

Harari O, Park SY, Huang H, Groisman EA, Zwir I - PLoS Comput. Biol. (2010)

Families of PhoP BSs submotifs in E. coli K-12 and S. typhimurium.The tree represents the hierarchical organization of PhoP submotifs; which are represented by their logos (three nucleotides between the direct repeat tandems are omitted). The root corresponds to the consensus (single) motif (left panel), while general and specific submotifs are ordered from left to right. Sequences conforming to each specific submotifs (gray boxes) and their genomic source are listed on the right panel. The information content of each submotif is displayed below the logos (i.e., the higher the more informative).
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2908699&req=5

pcbi-1000862-g003: Families of PhoP BSs submotifs in E. coli K-12 and S. typhimurium.The tree represents the hierarchical organization of PhoP submotifs; which are represented by their logos (three nucleotides between the direct repeat tandems are omitted). The root corresponds to the consensus (single) motif (left panel), while general and specific submotifs are ordered from left to right. Sequences conforming to each specific submotifs (gray boxes) and their genomic source are listed on the right panel. The information content of each submotif is displayed below the logos (i.e., the higher the more informative).
Mentions: We studied the PhoP BSs found in E. coli K-12 and S. typhimurium that have been reported in the literature [26], [42], as well as our previous work [41]. As a result, we collected 69 DNA sequences corresponding to PhoP BSs, where 31 are BSs from 25 E. coli genes and 38 are BSs from 28 Salmonella genes. Some promoters have more than one BS, and 14 genes are orthologous among these two species [43]. BSs corresponding to promoters for orthologous genes are considered as independent examples, where every sequence instance is considered equally important. For example, the sequences corresponding to the PhoP BSs in the promoters of the E. coli and Salmonella phoP orthologous genes are similar to each other [42], [44], and both sequences belong to the same submotif (Figure 3). In contrast, the PhoP BS sequences in the promoters of the E. coli and Salmonella slyB genes are grouped into different submotifs (Figure 3), despite the orthology of the genes [44]. Furthermore, PhoP binds to the promoter of the Salmonella ugd gene, but it does not bind to the corresponding promoter in the E. coli ugd gene, despite these genes being 88% identical [45], [46].

Bottom Line: By partitioning a motif into sub-patterns, computational advantages for classification were produced, resulting in the discovery of new members of a regulon, and alleviating the problem of distinguishing functional sites in chromatin immunoprecipitation and DNA microarray genome-wide analysis.Moreover, we found that certain partitions were useful in revealing biological properties of binding site sequences, including modular gains and losses of PhoP binding sites through evolutionary turnover events, as well as conservation in distant species.Instead, the divergence may be attributed to the fast evolution of orthologous target genes and/or the promoter architectures resulting from the interaction of those binding sites with the RNA polymerase.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain.

ABSTRACT
Transcriptional regulators recognize specific DNA sequences. Because these sequences are embedded in the background of genomic DNA, it is hard to identify the key cis-regulatory elements that determine disparate patterns of gene expression. The detection of the intra- and inter-species differences among these sequences is crucial for understanding the molecular basis of both differential gene expression and evolution. Here, we address this problem by investigating the target promoters controlled by the DNA-binding PhoP protein, which governs virulence and Mg(2+) homeostasis in several bacterial species. PhoP is particularly interesting; it is highly conserved in different gamma/enterobacteria, regulating not only ancestral genes but also governing the expression of dozens of horizontally acquired genes that differ from species to species. Our approach consists of decomposing the DNA binding site sequences for a given regulator into families of motifs (i.e., termed submotifs) using a machine learning method inspired by the "Divide & Conquer" strategy. By partitioning a motif into sub-patterns, computational advantages for classification were produced, resulting in the discovery of new members of a regulon, and alleviating the problem of distinguishing functional sites in chromatin immunoprecipitation and DNA microarray genome-wide analysis. Moreover, we found that certain partitions were useful in revealing biological properties of binding site sequences, including modular gains and losses of PhoP binding sites through evolutionary turnover events, as well as conservation in distant species. The high conservation of PhoP submotifs within gamma/enterobacteria, as well as the regulatory protein that recognizes them, suggests that the major cause of divergence between related species is not due to the binding sites, as was previously suggested for other regulators. Instead, the divergence may be attributed to the fast evolution of orthologous target genes and/or the promoter architectures resulting from the interaction of those binding sites with the RNA polymerase.

Show MeSH