Limits...
Identification of co-occurring transcription factor binding sites from DNA sequence using clustered position weight matrices.

Oh YM, Kim JK, Choi S, Yoo JY - Nucleic Acids Res. (2011)

Bottom Line: The Z-scores of the area under a receiver operating characteristic curve (AUC) values of 368 TFs were calculated and used to statistically identify co-occurring regulatory motifs in the TF bound ChIP loci.Motifs that are co-occurring along with the empirical bindings of E2F, JUN or MYC have been evaluated, in the basal or stimulated condition.Results prove our method can be useful to systematically identify the co-occurring motifs of the TF for the given conditions.

View Article: PubMed Central - PubMed

Affiliation: Department of Life Sciences, Pohang University of Science and Technology, Pohang, Republic of Korea.

ABSTRACT
Accurate prediction of transcription factor binding sites (TFBSs) is a prerequisite for identifying cis-regulatory modules that underlie transcriptional regulatory circuits encoded in the genome. Here, we present a computational framework for detecting TFBSs, when multiple position weight matrices (PWMs) for a transcription factor are available. Grouping multiple PWMs of a transcription factor (TF) based on their sequence similarity improves the specificity of TFBS prediction, which was evaluated using multiple genome-wide ChIP-Seq data sets from 26 TFs. The Z-scores of the area under a receiver operating characteristic curve (AUC) values of 368 TFs were calculated and used to statistically identify co-occurring regulatory motifs in the TF bound ChIP loci. Motifs that are co-occurring along with the empirical bindings of E2F, JUN or MYC have been evaluated, in the basal or stimulated condition. Results prove our method can be useful to systematically identify the co-occurring motifs of the TF for the given conditions.

Show MeSH
Properties of the TF-PWM network. Histograms of the number of TFs connected to a PWM (NTF-PWM) (A) and the number of PWMs for each TF (NPWM-TF) (B) in the TF-PWM network derived from TRANSFAC (11). (C) Average number of the NTF-PWM and NPWM-TF for each connected component (CC) in the TF-PWM network. Subgraphs of the representative CC (denoted as CC-#) are visualized by Cytoscape (29). (D) The total number of PWMs and TFs that form each CC are shown. (E) Degree of physical interaction among TFs belonging to each CC. TF–TF interaction (%) was calculated by dividing the number of the annotated interactions by the total number of possible interactions for each CC. (F) PWM dissimilarity, the mean value of the pairwise PWM–PWM dissimilarity for each CC.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3300004&req=5

gkr1252-F1: Properties of the TF-PWM network. Histograms of the number of TFs connected to a PWM (NTF-PWM) (A) and the number of PWMs for each TF (NPWM-TF) (B) in the TF-PWM network derived from TRANSFAC (11). (C) Average number of the NTF-PWM and NPWM-TF for each connected component (CC) in the TF-PWM network. Subgraphs of the representative CC (denoted as CC-#) are visualized by Cytoscape (29). (D) The total number of PWMs and TFs that form each CC are shown. (E) Degree of physical interaction among TFs belonging to each CC. TF–TF interaction (%) was calculated by dividing the number of the annotated interactions by the total number of possible interactions for each CC. (F) PWM dissimilarity, the mean value of the pairwise PWM–PWM dissimilarity for each CC.

Mentions: To apply a motif-based scanning program to every TF with multiple position weight matrices (PWMs), we first collected the available information of PWMs and their interactions with cognate TFs. A total of 368 TFs and 565 vertebrate PWMs from the TRANSFAC database were considered; each PWM was linked to its associated TFs when the interaction was supported (see ‘Materials and Methods’ section). The resulting TF-PWM network is a bipartite graph whose nodes (368 TFs and 474 PWMs) are linked to each other (Supplementary Table S1A). The average number of TFs connected to a single PWM was 1.53, and a total of 61 (10.8%) PWMs had only one TF linkage (PWM:TF = 1:1). The V$EBOX_Q6_01 had the largest connection, with 30 different TFs (Figure 1A). Multiple PWMs were also found to be connected to single TFs; the average number of linked PWMs per TF was 2.35 (Figure 1B).Figure 1.


Identification of co-occurring transcription factor binding sites from DNA sequence using clustered position weight matrices.

Oh YM, Kim JK, Choi S, Yoo JY - Nucleic Acids Res. (2011)

Properties of the TF-PWM network. Histograms of the number of TFs connected to a PWM (NTF-PWM) (A) and the number of PWMs for each TF (NPWM-TF) (B) in the TF-PWM network derived from TRANSFAC (11). (C) Average number of the NTF-PWM and NPWM-TF for each connected component (CC) in the TF-PWM network. Subgraphs of the representative CC (denoted as CC-#) are visualized by Cytoscape (29). (D) The total number of PWMs and TFs that form each CC are shown. (E) Degree of physical interaction among TFs belonging to each CC. TF–TF interaction (%) was calculated by dividing the number of the annotated interactions by the total number of possible interactions for each CC. (F) PWM dissimilarity, the mean value of the pairwise PWM–PWM dissimilarity for each CC.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3300004&req=5

gkr1252-F1: Properties of the TF-PWM network. Histograms of the number of TFs connected to a PWM (NTF-PWM) (A) and the number of PWMs for each TF (NPWM-TF) (B) in the TF-PWM network derived from TRANSFAC (11). (C) Average number of the NTF-PWM and NPWM-TF for each connected component (CC) in the TF-PWM network. Subgraphs of the representative CC (denoted as CC-#) are visualized by Cytoscape (29). (D) The total number of PWMs and TFs that form each CC are shown. (E) Degree of physical interaction among TFs belonging to each CC. TF–TF interaction (%) was calculated by dividing the number of the annotated interactions by the total number of possible interactions for each CC. (F) PWM dissimilarity, the mean value of the pairwise PWM–PWM dissimilarity for each CC.
Mentions: To apply a motif-based scanning program to every TF with multiple position weight matrices (PWMs), we first collected the available information of PWMs and their interactions with cognate TFs. A total of 368 TFs and 565 vertebrate PWMs from the TRANSFAC database were considered; each PWM was linked to its associated TFs when the interaction was supported (see ‘Materials and Methods’ section). The resulting TF-PWM network is a bipartite graph whose nodes (368 TFs and 474 PWMs) are linked to each other (Supplementary Table S1A). The average number of TFs connected to a single PWM was 1.53, and a total of 61 (10.8%) PWMs had only one TF linkage (PWM:TF = 1:1). The V$EBOX_Q6_01 had the largest connection, with 30 different TFs (Figure 1A). Multiple PWMs were also found to be connected to single TFs; the average number of linked PWMs per TF was 2.35 (Figure 1B).Figure 1.

Bottom Line: The Z-scores of the area under a receiver operating characteristic curve (AUC) values of 368 TFs were calculated and used to statistically identify co-occurring regulatory motifs in the TF bound ChIP loci.Motifs that are co-occurring along with the empirical bindings of E2F, JUN or MYC have been evaluated, in the basal or stimulated condition.Results prove our method can be useful to systematically identify the co-occurring motifs of the TF for the given conditions.

View Article: PubMed Central - PubMed

Affiliation: Department of Life Sciences, Pohang University of Science and Technology, Pohang, Republic of Korea.

ABSTRACT
Accurate prediction of transcription factor binding sites (TFBSs) is a prerequisite for identifying cis-regulatory modules that underlie transcriptional regulatory circuits encoded in the genome. Here, we present a computational framework for detecting TFBSs, when multiple position weight matrices (PWMs) for a transcription factor are available. Grouping multiple PWMs of a transcription factor (TF) based on their sequence similarity improves the specificity of TFBS prediction, which was evaluated using multiple genome-wide ChIP-Seq data sets from 26 TFs. The Z-scores of the area under a receiver operating characteristic curve (AUC) values of 368 TFs were calculated and used to statistically identify co-occurring regulatory motifs in the TF bound ChIP loci. Motifs that are co-occurring along with the empirical bindings of E2F, JUN or MYC have been evaluated, in the basal or stimulated condition. Results prove our method can be useful to systematically identify the co-occurring motifs of the TF for the given conditions.

Show MeSH