Limits...
Defining the plasticity of transcription factor binding sites by Deconstructing DNA consensus sequences: the PhoP-binding sites among gamma/enterobacteria.

Harari O, Park SY, Huang H, Groisman EA, Zwir I - PLoS Comput. Biol. (2010)

Bottom Line: By partitioning a motif into sub-patterns, computational advantages for classification were produced, resulting in the discovery of new members of a regulon, and alleviating the problem of distinguishing functional sites in chromatin immunoprecipitation and DNA microarray genome-wide analysis.Moreover, we found that certain partitions were useful in revealing biological properties of binding site sequences, including modular gains and losses of PhoP binding sites through evolutionary turnover events, as well as conservation in distant species.Instead, the divergence may be attributed to the fast evolution of orthologous target genes and/or the promoter architectures resulting from the interaction of those binding sites with the RNA polymerase.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain.

ABSTRACT
Transcriptional regulators recognize specific DNA sequences. Because these sequences are embedded in the background of genomic DNA, it is hard to identify the key cis-regulatory elements that determine disparate patterns of gene expression. The detection of the intra- and inter-species differences among these sequences is crucial for understanding the molecular basis of both differential gene expression and evolution. Here, we address this problem by investigating the target promoters controlled by the DNA-binding PhoP protein, which governs virulence and Mg(2+) homeostasis in several bacterial species. PhoP is particularly interesting; it is highly conserved in different gamma/enterobacteria, regulating not only ancestral genes but also governing the expression of dozens of horizontally acquired genes that differ from species to species. Our approach consists of decomposing the DNA binding site sequences for a given regulator into families of motifs (i.e., termed submotifs) using a machine learning method inspired by the "Divide & Conquer" strategy. By partitioning a motif into sub-patterns, computational advantages for classification were produced, resulting in the discovery of new members of a regulon, and alleviating the problem of distinguishing functional sites in chromatin immunoprecipitation and DNA microarray genome-wide analysis. Moreover, we found that certain partitions were useful in revealing biological properties of binding site sequences, including modular gains and losses of PhoP binding sites through evolutionary turnover events, as well as conservation in distant species. The high conservation of PhoP submotifs within gamma/enterobacteria, as well as the regulatory protein that recognizes them, suggests that the major cause of divergence between related species is not due to the binding sites, as was previously suggested for other regulators. Instead, the divergence may be attributed to the fast evolution of orthologous target genes and/or the promoter architectures resulting from the interaction of those binding sites with the RNA polymerase.

Show MeSH
IF-THEN rules encompassing submotifs and distances between PhoP BSs and RNAP BSs.A) Cells correspond to IF-THEN rules. The antecedent of the rules is composed of submotifs (vertical left panel) and the distances between PhoP BSs and RNAP BSs (horizontal panel). The submotifs are encoded as fuzzy sets based on their score distributions. The distances are approximated by their distributions (close, medium, and far distances from left to right panels), and also encoded by fuzzy sets. Both antecedents are combined a fuzzy AND operator (i.e., product). The consequent of a rule classifies PhoP BSs in the unit interval as a function of the antecedents (1: high, 0: low). Rules are activated concurrently, when they exceed each rule-specific threshold. The isobars show the degree of membership of the training set to the rules (white: none; red: high; blue: low). B) Synthesis of the PhoP BSs recognized by the most representative IF-THEN rules that distinguish E. coli K-12 (green) from S. typhimurium (red) genomes. The percentage of BSs recognized by each rule in each genome is represented by the height of the bars. Submotifs were compressed into their most general families (i.e., S01, S05, and S08 & S09) for simplicity. Three subsets of rules (i.e., S01 and close, S08 & S09 and close; and S08 & S09 and medium) are similarly distributed in both genomes (grey boxes). Specific rules in E. coli (i.e., S05 and close, green boxes), and in Salmonella (S01 and medium; S05 and far; and S8 & S9 and far, red boxes) were also identified. C) Same as in B) but applying IF-THEN rules in Y. pestis. No rules were identified for the S05 family of submotifs, as well as for far distances.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2908699&req=5

pcbi-1000862-g009: IF-THEN rules encompassing submotifs and distances between PhoP BSs and RNAP BSs.A) Cells correspond to IF-THEN rules. The antecedent of the rules is composed of submotifs (vertical left panel) and the distances between PhoP BSs and RNAP BSs (horizontal panel). The submotifs are encoded as fuzzy sets based on their score distributions. The distances are approximated by their distributions (close, medium, and far distances from left to right panels), and also encoded by fuzzy sets. Both antecedents are combined a fuzzy AND operator (i.e., product). The consequent of a rule classifies PhoP BSs in the unit interval as a function of the antecedents (1: high, 0: low). Rules are activated concurrently, when they exceed each rule-specific threshold. The isobars show the degree of membership of the training set to the rules (white: none; red: high; blue: low). B) Synthesis of the PhoP BSs recognized by the most representative IF-THEN rules that distinguish E. coli K-12 (green) from S. typhimurium (red) genomes. The percentage of BSs recognized by each rule in each genome is represented by the height of the bars. Submotifs were compressed into their most general families (i.e., S01, S05, and S08 & S09) for simplicity. Three subsets of rules (i.e., S01 and close, S08 & S09 and close; and S08 & S09 and medium) are similarly distributed in both genomes (grey boxes). Specific rules in E. coli (i.e., S05 and close, green boxes), and in Salmonella (S01 and medium; S05 and far; and S8 & S9 and far, red boxes) were also identified. C) Same as in B) but applying IF-THEN rules in Y. pestis. No rules were identified for the S05 family of submotifs, as well as for far distances.

Mentions: We examined the distances between the PhoP BSs and the RNAP BSs within a region from −90 bp upstream and 10 bp downstream of the transcription start site (TSS). We distinguished three sets of distances: close, medium and far from the TSS (Figure 9). The location of the close set peaks at −34 bp from the TSS, the medium set peaks at −44 bp from the TSS, and the far set peaks at −68 bp from the TSS. Then, we incorporated the distances between the PhoP and the RNAP BSs to the single motif classifier by using fuzzy AND/OR IF-THEN rules. These rules improved SCC by 21% (i.e., 0.63 vs 0.83) for recognizing PhoP BSs compared to the single motif model (Figure S7). Again, the use of submotifs instead of a single motif resulted in a multi-classifier employing 24 of the 36 possible rules (Figure 9A; the optimization of the number of rules was analogous to that performed with the submotifs), improved SCC by 45% (i.e., 0.63 vs. 0.91). This improvement was also seen when analyzing other regulators like CRP (See Text S5, Tables S9, S10, and Figure S8).


Defining the plasticity of transcription factor binding sites by Deconstructing DNA consensus sequences: the PhoP-binding sites among gamma/enterobacteria.

Harari O, Park SY, Huang H, Groisman EA, Zwir I - PLoS Comput. Biol. (2010)

IF-THEN rules encompassing submotifs and distances between PhoP BSs and RNAP BSs.A) Cells correspond to IF-THEN rules. The antecedent of the rules is composed of submotifs (vertical left panel) and the distances between PhoP BSs and RNAP BSs (horizontal panel). The submotifs are encoded as fuzzy sets based on their score distributions. The distances are approximated by their distributions (close, medium, and far distances from left to right panels), and also encoded by fuzzy sets. Both antecedents are combined a fuzzy AND operator (i.e., product). The consequent of a rule classifies PhoP BSs in the unit interval as a function of the antecedents (1: high, 0: low). Rules are activated concurrently, when they exceed each rule-specific threshold. The isobars show the degree of membership of the training set to the rules (white: none; red: high; blue: low). B) Synthesis of the PhoP BSs recognized by the most representative IF-THEN rules that distinguish E. coli K-12 (green) from S. typhimurium (red) genomes. The percentage of BSs recognized by each rule in each genome is represented by the height of the bars. Submotifs were compressed into their most general families (i.e., S01, S05, and S08 & S09) for simplicity. Three subsets of rules (i.e., S01 and close, S08 & S09 and close; and S08 & S09 and medium) are similarly distributed in both genomes (grey boxes). Specific rules in E. coli (i.e., S05 and close, green boxes), and in Salmonella (S01 and medium; S05 and far; and S8 & S9 and far, red boxes) were also identified. C) Same as in B) but applying IF-THEN rules in Y. pestis. No rules were identified for the S05 family of submotifs, as well as for far distances.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2908699&req=5

pcbi-1000862-g009: IF-THEN rules encompassing submotifs and distances between PhoP BSs and RNAP BSs.A) Cells correspond to IF-THEN rules. The antecedent of the rules is composed of submotifs (vertical left panel) and the distances between PhoP BSs and RNAP BSs (horizontal panel). The submotifs are encoded as fuzzy sets based on their score distributions. The distances are approximated by their distributions (close, medium, and far distances from left to right panels), and also encoded by fuzzy sets. Both antecedents are combined a fuzzy AND operator (i.e., product). The consequent of a rule classifies PhoP BSs in the unit interval as a function of the antecedents (1: high, 0: low). Rules are activated concurrently, when they exceed each rule-specific threshold. The isobars show the degree of membership of the training set to the rules (white: none; red: high; blue: low). B) Synthesis of the PhoP BSs recognized by the most representative IF-THEN rules that distinguish E. coli K-12 (green) from S. typhimurium (red) genomes. The percentage of BSs recognized by each rule in each genome is represented by the height of the bars. Submotifs were compressed into their most general families (i.e., S01, S05, and S08 & S09) for simplicity. Three subsets of rules (i.e., S01 and close, S08 & S09 and close; and S08 & S09 and medium) are similarly distributed in both genomes (grey boxes). Specific rules in E. coli (i.e., S05 and close, green boxes), and in Salmonella (S01 and medium; S05 and far; and S8 & S9 and far, red boxes) were also identified. C) Same as in B) but applying IF-THEN rules in Y. pestis. No rules were identified for the S05 family of submotifs, as well as for far distances.
Mentions: We examined the distances between the PhoP BSs and the RNAP BSs within a region from −90 bp upstream and 10 bp downstream of the transcription start site (TSS). We distinguished three sets of distances: close, medium and far from the TSS (Figure 9). The location of the close set peaks at −34 bp from the TSS, the medium set peaks at −44 bp from the TSS, and the far set peaks at −68 bp from the TSS. Then, we incorporated the distances between the PhoP and the RNAP BSs to the single motif classifier by using fuzzy AND/OR IF-THEN rules. These rules improved SCC by 21% (i.e., 0.63 vs 0.83) for recognizing PhoP BSs compared to the single motif model (Figure S7). Again, the use of submotifs instead of a single motif resulted in a multi-classifier employing 24 of the 36 possible rules (Figure 9A; the optimization of the number of rules was analogous to that performed with the submotifs), improved SCC by 45% (i.e., 0.63 vs. 0.91). This improvement was also seen when analyzing other regulators like CRP (See Text S5, Tables S9, S10, and Figure S8).

Bottom Line: By partitioning a motif into sub-patterns, computational advantages for classification were produced, resulting in the discovery of new members of a regulon, and alleviating the problem of distinguishing functional sites in chromatin immunoprecipitation and DNA microarray genome-wide analysis.Moreover, we found that certain partitions were useful in revealing biological properties of binding site sequences, including modular gains and losses of PhoP binding sites through evolutionary turnover events, as well as conservation in distant species.Instead, the divergence may be attributed to the fast evolution of orthologous target genes and/or the promoter architectures resulting from the interaction of those binding sites with the RNA polymerase.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain.

ABSTRACT
Transcriptional regulators recognize specific DNA sequences. Because these sequences are embedded in the background of genomic DNA, it is hard to identify the key cis-regulatory elements that determine disparate patterns of gene expression. The detection of the intra- and inter-species differences among these sequences is crucial for understanding the molecular basis of both differential gene expression and evolution. Here, we address this problem by investigating the target promoters controlled by the DNA-binding PhoP protein, which governs virulence and Mg(2+) homeostasis in several bacterial species. PhoP is particularly interesting; it is highly conserved in different gamma/enterobacteria, regulating not only ancestral genes but also governing the expression of dozens of horizontally acquired genes that differ from species to species. Our approach consists of decomposing the DNA binding site sequences for a given regulator into families of motifs (i.e., termed submotifs) using a machine learning method inspired by the "Divide & Conquer" strategy. By partitioning a motif into sub-patterns, computational advantages for classification were produced, resulting in the discovery of new members of a regulon, and alleviating the problem of distinguishing functional sites in chromatin immunoprecipitation and DNA microarray genome-wide analysis. Moreover, we found that certain partitions were useful in revealing biological properties of binding site sequences, including modular gains and losses of PhoP binding sites through evolutionary turnover events, as well as conservation in distant species. The high conservation of PhoP submotifs within gamma/enterobacteria, as well as the regulatory protein that recognizes them, suggests that the major cause of divergence between related species is not due to the binding sites, as was previously suggested for other regulators. Instead, the divergence may be attributed to the fast evolution of orthologous target genes and/or the promoter architectures resulting from the interaction of those binding sites with the RNA polymerase.

Show MeSH