Limits...
Finding subtypes of transcription factor motif pairs with distinct regulatory roles.

Bais AS, Kaminski N, Benos PV - Nucleic Acids Res. (2011)

Bottom Line: DNA sequences bound by a transcription factor (TF) are presumed to contain sequence elements that reflect its DNA binding preferences and its downstream-regulatory effects.We present DiSCo (Discovery of Subtypes and Cofactors), a novel approach for identifying variants of dyad motifs (and their respective target sequence sets) that are instrumental for differential downstream regulation.Using both simulated and experimental datasets, we demonstrate how current motif discovery can be successfully leveraged to address this question.

View Article: PubMed Central - PubMed

Affiliation: Department of Computational and Systems Biology, Dorothy P. and Richard P. Simmons Center for Interstitial Lung Disease, Division of Pulmonary, Allergy and Critical Care Medicine and Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15260, USA.

ABSTRACT
DNA sequences bound by a transcription factor (TF) are presumed to contain sequence elements that reflect its DNA binding preferences and its downstream-regulatory effects. Experimentally identified TF binding sites (TFBSs) are usually similar enough to be summarized by a 'consensus' motif, representative of the TF DNA binding specificity. Studies have shown that groups of nucleotide TFBS variants (subtypes) can contribute to distinct modes of downstream regulation by the TF via differential recruitment of cofactors. A TF(A) may bind to TFBS subtypes a(1) or a(2) depending on whether it associates with cofactors TF(B) or TF(C), respectively. While some approaches can discover motif pairs (dyads), none address the problem of identifying 'variants' of dyads. TFs are key components of multiple regulatory pathways targeting different sets of genes perhaps with different binding preferences. Identifying the discriminating TF-DNA associations that lead to the differential downstream regulation is thus essential. We present DiSCo (Discovery of Subtypes and Cofactors), a novel approach for identifying variants of dyad motifs (and their respective target sequence sets) that are instrumental for differential downstream regulation. Using both simulated and experimental datasets, we demonstrate how current motif discovery can be successfully leveraged to address this question.

Show MeSH
Analysis on the AP-1 and CREB dataset. (A) Logos of AP-1 (V$AP1_Q2_01) and CREB (V$CREB_Q4_01) matrices from TRANSFAC are shown. (B) Logos formulated from the best scoring sites of the dyads discovered in the majority polled run after SDD (top row) and after clustering (DiSCo; two bottom rows) on the complete set of AP-1 and CREB sequences are shown. SDD yields a dyad composed of a CREB-like motif (STAMP best match to ATF4, E-value ∼10e-09 and to CREB with E-value ∼9e-08) and a motif that matches DEAF1 (E-value ∼2e-04). The clusters resulting from DiSCo are enriched with AP-1 and CREB motifs, respectively. The cluster C1 [second row in (B)] is enriched with a dyad whose components match CREB and CAC-binding motif (E-values of ∼2e-11 and ∼10e-04, respectively). On comparing with JASPAR, the second motif best matches KLF-4 (E-value = 4.4e-03). The components of the dyad discovered in cluster C2 [third row in (B)] match AP-1 and Adf-1 (E-values of ∼6e-11 and 4e-03, respectively). Average misclassification error rate = 0.22.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3113591&req=5

Figure 7: Analysis on the AP-1 and CREB dataset. (A) Logos of AP-1 (V$AP1_Q2_01) and CREB (V$CREB_Q4_01) matrices from TRANSFAC are shown. (B) Logos formulated from the best scoring sites of the dyads discovered in the majority polled run after SDD (top row) and after clustering (DiSCo; two bottom rows) on the complete set of AP-1 and CREB sequences are shown. SDD yields a dyad composed of a CREB-like motif (STAMP best match to ATF4, E-value ∼10e-09 and to CREB with E-value ∼9e-08) and a motif that matches DEAF1 (E-value ∼2e-04). The clusters resulting from DiSCo are enriched with AP-1 and CREB motifs, respectively. The cluster C1 [second row in (B)] is enriched with a dyad whose components match CREB and CAC-binding motif (E-values of ∼2e-11 and ∼10e-04, respectively). On comparing with JASPAR, the second motif best matches KLF-4 (E-value = 4.4e-03). The components of the dyad discovered in cluster C2 [third row in (B)] match AP-1 and Adf-1 (E-values of ∼6e-11 and 4e-03, respectively). Average misclassification error rate = 0.22.

Mentions: AP-1 and CREB are two TFs with similar but not identical binding sites, which mainly differ at the 3′ end of the core regions (Figure 7A). We tested whether DiSCo is able to identify the two types of binding sites, automatically partition their target sequence sets and in the process identify the surrounding sequence motifs associated with them (if any). To this end, we pooled together and analyzed the complete set of sequences containing both kinds of TFBSs. We searched both strands of all sequences for motif pairs of widths 7 bp each and a maximum gap of 5 bp using sites of both motifs to calculate the clustering measure in DiSCo (i.e. wrt = 3). The motifs found in the majority polled runs of SDD and DiSCo are presented in (Figure 7B, P-values are reported in Supplementary Table S1). SDD identified a dyad where one motif is similar to both AP-1 and CREB. In other words, the two kinds of TFBSs are pooled together (Figure 7B, top row). However, DiSCo segregates the two sequence sets into two clusters and identifies the individual TFBS motifs (Figure 7B, two bottom rows). The cluster of sequences enriched with the CREB-like motif, contains another co-occurring motif with high-ranking matches to CAC binding motif and Pax-4 in TRANSFAC. On comparing with JASPAR (47) (version 2010), this motif matches Krueppel-like factor 4 (KLF-4). The protein KLF-4 has been shown to be involved in the regulation of mouse B2R promoter by the formation of a higher-order complex with CREB and p53 in conjunction with the co-activator p300/CBP (CREB binding protein) (48). It is likely that CREB and KLF-4 are together involved in the regulation of other genes too, using possibly a similar mechanism. However, we could not find any evidence for the second motif we identified in the sequence set enriched with the AP-1–like motif. In summary, for this set of TFBSs, surrounding sequence lengths and search parameters, the TFBSs of CREB tend to co-occur with those of KLF-4.Figure 7.


Finding subtypes of transcription factor motif pairs with distinct regulatory roles.

Bais AS, Kaminski N, Benos PV - Nucleic Acids Res. (2011)

Analysis on the AP-1 and CREB dataset. (A) Logos of AP-1 (V$AP1_Q2_01) and CREB (V$CREB_Q4_01) matrices from TRANSFAC are shown. (B) Logos formulated from the best scoring sites of the dyads discovered in the majority polled run after SDD (top row) and after clustering (DiSCo; two bottom rows) on the complete set of AP-1 and CREB sequences are shown. SDD yields a dyad composed of a CREB-like motif (STAMP best match to ATF4, E-value ∼10e-09 and to CREB with E-value ∼9e-08) and a motif that matches DEAF1 (E-value ∼2e-04). The clusters resulting from DiSCo are enriched with AP-1 and CREB motifs, respectively. The cluster C1 [second row in (B)] is enriched with a dyad whose components match CREB and CAC-binding motif (E-values of ∼2e-11 and ∼10e-04, respectively). On comparing with JASPAR, the second motif best matches KLF-4 (E-value = 4.4e-03). The components of the dyad discovered in cluster C2 [third row in (B)] match AP-1 and Adf-1 (E-values of ∼6e-11 and 4e-03, respectively). Average misclassification error rate = 0.22.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3113591&req=5

Figure 7: Analysis on the AP-1 and CREB dataset. (A) Logos of AP-1 (V$AP1_Q2_01) and CREB (V$CREB_Q4_01) matrices from TRANSFAC are shown. (B) Logos formulated from the best scoring sites of the dyads discovered in the majority polled run after SDD (top row) and after clustering (DiSCo; two bottom rows) on the complete set of AP-1 and CREB sequences are shown. SDD yields a dyad composed of a CREB-like motif (STAMP best match to ATF4, E-value ∼10e-09 and to CREB with E-value ∼9e-08) and a motif that matches DEAF1 (E-value ∼2e-04). The clusters resulting from DiSCo are enriched with AP-1 and CREB motifs, respectively. The cluster C1 [second row in (B)] is enriched with a dyad whose components match CREB and CAC-binding motif (E-values of ∼2e-11 and ∼10e-04, respectively). On comparing with JASPAR, the second motif best matches KLF-4 (E-value = 4.4e-03). The components of the dyad discovered in cluster C2 [third row in (B)] match AP-1 and Adf-1 (E-values of ∼6e-11 and 4e-03, respectively). Average misclassification error rate = 0.22.
Mentions: AP-1 and CREB are two TFs with similar but not identical binding sites, which mainly differ at the 3′ end of the core regions (Figure 7A). We tested whether DiSCo is able to identify the two types of binding sites, automatically partition their target sequence sets and in the process identify the surrounding sequence motifs associated with them (if any). To this end, we pooled together and analyzed the complete set of sequences containing both kinds of TFBSs. We searched both strands of all sequences for motif pairs of widths 7 bp each and a maximum gap of 5 bp using sites of both motifs to calculate the clustering measure in DiSCo (i.e. wrt = 3). The motifs found in the majority polled runs of SDD and DiSCo are presented in (Figure 7B, P-values are reported in Supplementary Table S1). SDD identified a dyad where one motif is similar to both AP-1 and CREB. In other words, the two kinds of TFBSs are pooled together (Figure 7B, top row). However, DiSCo segregates the two sequence sets into two clusters and identifies the individual TFBS motifs (Figure 7B, two bottom rows). The cluster of sequences enriched with the CREB-like motif, contains another co-occurring motif with high-ranking matches to CAC binding motif and Pax-4 in TRANSFAC. On comparing with JASPAR (47) (version 2010), this motif matches Krueppel-like factor 4 (KLF-4). The protein KLF-4 has been shown to be involved in the regulation of mouse B2R promoter by the formation of a higher-order complex with CREB and p53 in conjunction with the co-activator p300/CBP (CREB binding protein) (48). It is likely that CREB and KLF-4 are together involved in the regulation of other genes too, using possibly a similar mechanism. However, we could not find any evidence for the second motif we identified in the sequence set enriched with the AP-1–like motif. In summary, for this set of TFBSs, surrounding sequence lengths and search parameters, the TFBSs of CREB tend to co-occur with those of KLF-4.Figure 7.

Bottom Line: DNA sequences bound by a transcription factor (TF) are presumed to contain sequence elements that reflect its DNA binding preferences and its downstream-regulatory effects.We present DiSCo (Discovery of Subtypes and Cofactors), a novel approach for identifying variants of dyad motifs (and their respective target sequence sets) that are instrumental for differential downstream regulation.Using both simulated and experimental datasets, we demonstrate how current motif discovery can be successfully leveraged to address this question.

View Article: PubMed Central - PubMed

Affiliation: Department of Computational and Systems Biology, Dorothy P. and Richard P. Simmons Center for Interstitial Lung Disease, Division of Pulmonary, Allergy and Critical Care Medicine and Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15260, USA.

ABSTRACT
DNA sequences bound by a transcription factor (TF) are presumed to contain sequence elements that reflect its DNA binding preferences and its downstream-regulatory effects. Experimentally identified TF binding sites (TFBSs) are usually similar enough to be summarized by a 'consensus' motif, representative of the TF DNA binding specificity. Studies have shown that groups of nucleotide TFBS variants (subtypes) can contribute to distinct modes of downstream regulation by the TF via differential recruitment of cofactors. A TF(A) may bind to TFBS subtypes a(1) or a(2) depending on whether it associates with cofactors TF(B) or TF(C), respectively. While some approaches can discover motif pairs (dyads), none address the problem of identifying 'variants' of dyads. TFs are key components of multiple regulatory pathways targeting different sets of genes perhaps with different binding preferences. Identifying the discriminating TF-DNA associations that lead to the differential downstream regulation is thus essential. We present DiSCo (Discovery of Subtypes and Cofactors), a novel approach for identifying variants of dyad motifs (and their respective target sequence sets) that are instrumental for differential downstream regulation. Using both simulated and experimental datasets, we demonstrate how current motif discovery can be successfully leveraged to address this question.

Show MeSH