Limits...
Finding subtypes of transcription factor motif pairs with distinct regulatory roles.

Bais AS, Kaminski N, Benos PV - Nucleic Acids Res. (2011)

Bottom Line: DNA sequences bound by a transcription factor (TF) are presumed to contain sequence elements that reflect its DNA binding preferences and its downstream-regulatory effects.We present DiSCo (Discovery of Subtypes and Cofactors), a novel approach for identifying variants of dyad motifs (and their respective target sequence sets) that are instrumental for differential downstream regulation.Using both simulated and experimental datasets, we demonstrate how current motif discovery can be successfully leveraged to address this question.

View Article: PubMed Central - PubMed

Affiliation: Department of Computational and Systems Biology, Dorothy P. and Richard P. Simmons Center for Interstitial Lung Disease, Division of Pulmonary, Allergy and Critical Care Medicine and Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15260, USA.

ABSTRACT
DNA sequences bound by a transcription factor (TF) are presumed to contain sequence elements that reflect its DNA binding preferences and its downstream-regulatory effects. Experimentally identified TF binding sites (TFBSs) are usually similar enough to be summarized by a 'consensus' motif, representative of the TF DNA binding specificity. Studies have shown that groups of nucleotide TFBS variants (subtypes) can contribute to distinct modes of downstream regulation by the TF via differential recruitment of cofactors. A TF(A) may bind to TFBS subtypes a(1) or a(2) depending on whether it associates with cofactors TF(B) or TF(C), respectively. While some approaches can discover motif pairs (dyads), none address the problem of identifying 'variants' of dyads. TFs are key components of multiple regulatory pathways targeting different sets of genes perhaps with different binding preferences. Identifying the discriminating TF-DNA associations that lead to the differential downstream regulation is thus essential. We present DiSCo (Discovery of Subtypes and Cofactors), a novel approach for identifying variants of dyad motifs (and their respective target sequence sets) that are instrumental for differential downstream regulation. Using both simulated and experimental datasets, we demonstrate how current motif discovery can be successfully leveraged to address this question.

Show MeSH

Related in: MedlinePlus

Motifs found on CRP-N and CRP-S sequences of H. influenzae with one motif of complete CRP length. The dyads discovered in majority polled run after SDD (top row) and DiSCo (two bottom rows) when the search is performed for the complete CRP motif as the main motif, are shown. SDD yields a pair of motifs, one of which has grouped together the complete CRP-N and CRP-S motifs of H. influenzae (Figure 3A and B), and the other is an AT-rich motif. In contrast, DiSCo successfully identifies two clusters C1 and C2 where C1 is enriched with the H. influenzae CRP-N-like motif (Figure 3A) and C2 with the H. influenzae CRP-S-like motif (Figure 3B). Additionally, the second motif discovered in C2 closely resembles the first half of the E. coli σ70 motif, found previously also by Cameron et al. (43). Average misclassification error = 0.24.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3113591&req=5

Figure 6: Motifs found on CRP-N and CRP-S sequences of H. influenzae with one motif of complete CRP length. The dyads discovered in majority polled run after SDD (top row) and DiSCo (two bottom rows) when the search is performed for the complete CRP motif as the main motif, are shown. SDD yields a pair of motifs, one of which has grouped together the complete CRP-N and CRP-S motifs of H. influenzae (Figure 3A and B), and the other is an AT-rich motif. In contrast, DiSCo successfully identifies two clusters C1 and C2 where C1 is enriched with the H. influenzae CRP-N-like motif (Figure 3A) and C2 with the H. influenzae CRP-S-like motif (Figure 3B). Additionally, the second motif discovered in C2 closely resembles the first half of the E. coli σ70 motif, found previously also by Cameron et al. (43). Average misclassification error = 0.24.

Mentions: To study if there exist additional sequence motifs that are needed for regulation of CRP-S sequences, we use DiSCo to analyze the target sequences of both CRP variants to search for dyads where one motif is of length 22 bp. The aim here is 2-fold. One, to study if DiSCo is able to identify the two CRP variants along with their corresponding sequence subsets in each species; and two, to investigate additional sequence signals that might co-occur with each CRP binding site variant, and might aid in the decision of the specific mode of regulation employed by CRP. Since Sxy itself lacks a DNA binding domain and no Sxy binding sites are known, we run DiSCo multiple times with varying parameter values, like motif widths for the second component of the dyad and multiple gap values. This is a typical procedure for many biological problems, when no additional information is known about the potential cofactors. For one of the components, we search for 22-bp long motif. In both species, irrespective of the width of the second component, DiSCo successfully identified the CRP motifs and their target sequence subsets (clusters). On searching for motif pairs of widths W = 8 and w = 22, within a maximum gap of 10 bp on both strands, we found both CRP-N- and CRP-S-like sites co-occurring with AT-rich motifs (data not shown). Previously, Cameron et al. (43) had observed A + T runs upstream of the CRP-S sites in H. influenzae that were required for promoter activation. From our analysis, such motifs seem to be present close to CRP-N sites also. On reversing the order of motif widths (W = 22 and w = 8 bp), again yields AT-rich motifs (data not shown). It seemed that there are only slight differences between the associated motifs of each CRP variant. However, for their motif analysis, Cameron et al. (43) aligned sequences that were 200-bp upstream of genes and found E. coli σ70-like sites downstream of the CRP-S sites. Following that study, we also restricted our search space to 200-bp regions. For this search, we used the same motif length parameters W = 22 and w = 8 bp, and a maximum gap of 20 bp. We searched both strands and the forward strand only. In both cases, we identified two clusters, each enriched with one type of CRP motif. In the latter case, though, for the cluster enriched with the CRP-S like motif, the second component matches the first half of the E. coli σ70 motif (Figure 6) while the cluster enriched with the CRP-N like motif had a second motif which is AT-rich, but dissimilar to the TTG stretch of E. coli σ70. Hence, while the CRP-S sites seem to have at least part of an E. coli σ70-like motif downstream, the CRP-N sites do not. In general, by using DiSCo to analyze this biological dataset thoroughly, we were able to identify the two CRP motif subtypes separately along with the distinguishing possible cofactor motif. This shows the direct applicability and usefulness of DiSCo in addressing biological problems.Figure 6.


Finding subtypes of transcription factor motif pairs with distinct regulatory roles.

Bais AS, Kaminski N, Benos PV - Nucleic Acids Res. (2011)

Motifs found on CRP-N and CRP-S sequences of H. influenzae with one motif of complete CRP length. The dyads discovered in majority polled run after SDD (top row) and DiSCo (two bottom rows) when the search is performed for the complete CRP motif as the main motif, are shown. SDD yields a pair of motifs, one of which has grouped together the complete CRP-N and CRP-S motifs of H. influenzae (Figure 3A and B), and the other is an AT-rich motif. In contrast, DiSCo successfully identifies two clusters C1 and C2 where C1 is enriched with the H. influenzae CRP-N-like motif (Figure 3A) and C2 with the H. influenzae CRP-S-like motif (Figure 3B). Additionally, the second motif discovered in C2 closely resembles the first half of the E. coli σ70 motif, found previously also by Cameron et al. (43). Average misclassification error = 0.24.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3113591&req=5

Figure 6: Motifs found on CRP-N and CRP-S sequences of H. influenzae with one motif of complete CRP length. The dyads discovered in majority polled run after SDD (top row) and DiSCo (two bottom rows) when the search is performed for the complete CRP motif as the main motif, are shown. SDD yields a pair of motifs, one of which has grouped together the complete CRP-N and CRP-S motifs of H. influenzae (Figure 3A and B), and the other is an AT-rich motif. In contrast, DiSCo successfully identifies two clusters C1 and C2 where C1 is enriched with the H. influenzae CRP-N-like motif (Figure 3A) and C2 with the H. influenzae CRP-S-like motif (Figure 3B). Additionally, the second motif discovered in C2 closely resembles the first half of the E. coli σ70 motif, found previously also by Cameron et al. (43). Average misclassification error = 0.24.
Mentions: To study if there exist additional sequence motifs that are needed for regulation of CRP-S sequences, we use DiSCo to analyze the target sequences of both CRP variants to search for dyads where one motif is of length 22 bp. The aim here is 2-fold. One, to study if DiSCo is able to identify the two CRP variants along with their corresponding sequence subsets in each species; and two, to investigate additional sequence signals that might co-occur with each CRP binding site variant, and might aid in the decision of the specific mode of regulation employed by CRP. Since Sxy itself lacks a DNA binding domain and no Sxy binding sites are known, we run DiSCo multiple times with varying parameter values, like motif widths for the second component of the dyad and multiple gap values. This is a typical procedure for many biological problems, when no additional information is known about the potential cofactors. For one of the components, we search for 22-bp long motif. In both species, irrespective of the width of the second component, DiSCo successfully identified the CRP motifs and their target sequence subsets (clusters). On searching for motif pairs of widths W = 8 and w = 22, within a maximum gap of 10 bp on both strands, we found both CRP-N- and CRP-S-like sites co-occurring with AT-rich motifs (data not shown). Previously, Cameron et al. (43) had observed A + T runs upstream of the CRP-S sites in H. influenzae that were required for promoter activation. From our analysis, such motifs seem to be present close to CRP-N sites also. On reversing the order of motif widths (W = 22 and w = 8 bp), again yields AT-rich motifs (data not shown). It seemed that there are only slight differences between the associated motifs of each CRP variant. However, for their motif analysis, Cameron et al. (43) aligned sequences that were 200-bp upstream of genes and found E. coli σ70-like sites downstream of the CRP-S sites. Following that study, we also restricted our search space to 200-bp regions. For this search, we used the same motif length parameters W = 22 and w = 8 bp, and a maximum gap of 20 bp. We searched both strands and the forward strand only. In both cases, we identified two clusters, each enriched with one type of CRP motif. In the latter case, though, for the cluster enriched with the CRP-S like motif, the second component matches the first half of the E. coli σ70 motif (Figure 6) while the cluster enriched with the CRP-N like motif had a second motif which is AT-rich, but dissimilar to the TTG stretch of E. coli σ70. Hence, while the CRP-S sites seem to have at least part of an E. coli σ70-like motif downstream, the CRP-N sites do not. In general, by using DiSCo to analyze this biological dataset thoroughly, we were able to identify the two CRP motif subtypes separately along with the distinguishing possible cofactor motif. This shows the direct applicability and usefulness of DiSCo in addressing biological problems.Figure 6.

Bottom Line: DNA sequences bound by a transcription factor (TF) are presumed to contain sequence elements that reflect its DNA binding preferences and its downstream-regulatory effects.We present DiSCo (Discovery of Subtypes and Cofactors), a novel approach for identifying variants of dyad motifs (and their respective target sequence sets) that are instrumental for differential downstream regulation.Using both simulated and experimental datasets, we demonstrate how current motif discovery can be successfully leveraged to address this question.

View Article: PubMed Central - PubMed

Affiliation: Department of Computational and Systems Biology, Dorothy P. and Richard P. Simmons Center for Interstitial Lung Disease, Division of Pulmonary, Allergy and Critical Care Medicine and Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15260, USA.

ABSTRACT
DNA sequences bound by a transcription factor (TF) are presumed to contain sequence elements that reflect its DNA binding preferences and its downstream-regulatory effects. Experimentally identified TF binding sites (TFBSs) are usually similar enough to be summarized by a 'consensus' motif, representative of the TF DNA binding specificity. Studies have shown that groups of nucleotide TFBS variants (subtypes) can contribute to distinct modes of downstream regulation by the TF via differential recruitment of cofactors. A TF(A) may bind to TFBS subtypes a(1) or a(2) depending on whether it associates with cofactors TF(B) or TF(C), respectively. While some approaches can discover motif pairs (dyads), none address the problem of identifying 'variants' of dyads. TFs are key components of multiple regulatory pathways targeting different sets of genes perhaps with different binding preferences. Identifying the discriminating TF-DNA associations that lead to the differential downstream regulation is thus essential. We present DiSCo (Discovery of Subtypes and Cofactors), a novel approach for identifying variants of dyad motifs (and their respective target sequence sets) that are instrumental for differential downstream regulation. Using both simulated and experimental datasets, we demonstrate how current motif discovery can be successfully leveraged to address this question.

Show MeSH
Related in: MedlinePlus