Limits...
Predicting combinatorial binding of transcription factors to regulatory elements in the human genome by association rule mining.

Morgan XC, Ni S, Miranker DP, Iyer VR - BMC Bioinformatics (2007)

Bottom Line: Known true positive motif pairs showed higher association rule support, confidence, and significance than background.Our subsets of high-confidence, high-significance mined pairs of transcription factors showed enrichment for co-citation in PubMed abstracts relative to all pairs, and the predicted associations were often readily verifiable in the literature.Functional elements in the genome where transcription factors bind to regulate expression in a combinatorial manner are more likely to be predicted by identifying statistically and biologically significant combinations of transcription factor binding motifs than by simply scanning the genome for the occurrence of binding sites for a single transcription factor.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute for Cellular and Molecular Biology and Center for Systems and Synthetic Biology, The University of Texas at Austin, Austin, Texas 78712-0159, USA. morganx@mail.utexas.edu

ABSTRACT

Background: Cis-acting transcriptional regulatory elements in mammalian genomes typically contain specific combinations of binding sites for various transcription factors. Although some cis-regulatory elements have been well studied, the combinations of transcription factors that regulate normal expression levels for the vast majority of the 20,000 genes in the human genome are unknown. We hypothesized that it should be possible to discover transcription factor combinations that regulate gene expression in concert by identifying over-represented combinations of sequence motifs that occur together in the genome. In order to detect combinations of transcription factor binding motifs, we developed a data mining approach based on the use of association rules, which are typically used in market basket analysis. We scored each segment of the genome for the presence or absence of each of 83 transcription factor binding motifs, then used association rule mining algorithms to mine this dataset, thus identifying frequently occurring pairs of distinct motifs within a segment.

Results: Support for most pairs of transcription factor binding motifs was highly correlated across different chromosomes although pair significance varied. Known true positive motif pairs showed higher association rule support, confidence, and significance than background. Our subsets of high-confidence, high-significance mined pairs of transcription factors showed enrichment for co-citation in PubMed abstracts relative to all pairs, and the predicted associations were often readily verifiable in the literature.

Conclusion: Functional elements in the genome where transcription factors bind to regulate expression in a combinatorial manner are more likely to be predicted by identifying statistically and biologically significant combinations of transcription factor binding motifs than by simply scanning the genome for the occurrence of binding sites for a single transcription factor.

Show MeSH

Related in: MedlinePlus

Distributions of support, confidence, and P-value for true positives and all pairs. Distribution histograms of support, confidence, and P-value for 131 true positives versus all pairs show higher support and confidence and lower P-values for true positives in the entire human genome, human promoter regions, and mouse chromosome 1.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2211755&req=5

Figure 4: Distributions of support, confidence, and P-value for true positives and all pairs. Distribution histograms of support, confidence, and P-value for 131 true positives versus all pairs show higher support and confidence and lower P-values for true positives in the entire human genome, human promoter regions, and mouse chromosome 1.

Mentions: We next manually examined the literature for evidence of biological associations and joint regulation of target genes by the "genomewide" and "mouse" subsets of PWM pairs that were identified by data mining. We found that many of these TF pairs were readily verifiable in the literature as true co-regulators of human and mouse genes (Table 1). For example the subsets "mouse" and "genomewide" both included the pair "Ap-2, Egr1." Genes known to be regulated by these two transcription factors include tumor necrosis factor α [40,41], human phenylethanolamine N-methyltransferase [42], and rat chromogranin B [43]. The subsets "mouse" and "genomewide" contain the pair "Sp1, p53"; each has been shown to regulate ICAM-1[44,45]. A comparison of distributions for all pairs compared to 131 true positives collected from the literature revealed that true positive pairs exhibited higher support and confidence and lower P-values than did all pairs (Figure 4), regardless of whether the entire human genome, human promoters, or mouse chromosome 1 were mined. As an exhaustive manual analysis of the literature for all TF pairs was not feasible, we used high-throughput co-citation analysis to further assess the biological relevance of the high-support, high-confidence TF pairs.


Predicting combinatorial binding of transcription factors to regulatory elements in the human genome by association rule mining.

Morgan XC, Ni S, Miranker DP, Iyer VR - BMC Bioinformatics (2007)

Distributions of support, confidence, and P-value for true positives and all pairs. Distribution histograms of support, confidence, and P-value for 131 true positives versus all pairs show higher support and confidence and lower P-values for true positives in the entire human genome, human promoter regions, and mouse chromosome 1.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2211755&req=5

Figure 4: Distributions of support, confidence, and P-value for true positives and all pairs. Distribution histograms of support, confidence, and P-value for 131 true positives versus all pairs show higher support and confidence and lower P-values for true positives in the entire human genome, human promoter regions, and mouse chromosome 1.
Mentions: We next manually examined the literature for evidence of biological associations and joint regulation of target genes by the "genomewide" and "mouse" subsets of PWM pairs that were identified by data mining. We found that many of these TF pairs were readily verifiable in the literature as true co-regulators of human and mouse genes (Table 1). For example the subsets "mouse" and "genomewide" both included the pair "Ap-2, Egr1." Genes known to be regulated by these two transcription factors include tumor necrosis factor α [40,41], human phenylethanolamine N-methyltransferase [42], and rat chromogranin B [43]. The subsets "mouse" and "genomewide" contain the pair "Sp1, p53"; each has been shown to regulate ICAM-1[44,45]. A comparison of distributions for all pairs compared to 131 true positives collected from the literature revealed that true positive pairs exhibited higher support and confidence and lower P-values than did all pairs (Figure 4), regardless of whether the entire human genome, human promoters, or mouse chromosome 1 were mined. As an exhaustive manual analysis of the literature for all TF pairs was not feasible, we used high-throughput co-citation analysis to further assess the biological relevance of the high-support, high-confidence TF pairs.

Bottom Line: Known true positive motif pairs showed higher association rule support, confidence, and significance than background.Our subsets of high-confidence, high-significance mined pairs of transcription factors showed enrichment for co-citation in PubMed abstracts relative to all pairs, and the predicted associations were often readily verifiable in the literature.Functional elements in the genome where transcription factors bind to regulate expression in a combinatorial manner are more likely to be predicted by identifying statistically and biologically significant combinations of transcription factor binding motifs than by simply scanning the genome for the occurrence of binding sites for a single transcription factor.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute for Cellular and Molecular Biology and Center for Systems and Synthetic Biology, The University of Texas at Austin, Austin, Texas 78712-0159, USA. morganx@mail.utexas.edu

ABSTRACT

Background: Cis-acting transcriptional regulatory elements in mammalian genomes typically contain specific combinations of binding sites for various transcription factors. Although some cis-regulatory elements have been well studied, the combinations of transcription factors that regulate normal expression levels for the vast majority of the 20,000 genes in the human genome are unknown. We hypothesized that it should be possible to discover transcription factor combinations that regulate gene expression in concert by identifying over-represented combinations of sequence motifs that occur together in the genome. In order to detect combinations of transcription factor binding motifs, we developed a data mining approach based on the use of association rules, which are typically used in market basket analysis. We scored each segment of the genome for the presence or absence of each of 83 transcription factor binding motifs, then used association rule mining algorithms to mine this dataset, thus identifying frequently occurring pairs of distinct motifs within a segment.

Results: Support for most pairs of transcription factor binding motifs was highly correlated across different chromosomes although pair significance varied. Known true positive motif pairs showed higher association rule support, confidence, and significance than background. Our subsets of high-confidence, high-significance mined pairs of transcription factors showed enrichment for co-citation in PubMed abstracts relative to all pairs, and the predicted associations were often readily verifiable in the literature.

Conclusion: Functional elements in the genome where transcription factors bind to regulate expression in a combinatorial manner are more likely to be predicted by identifying statistically and biologically significant combinations of transcription factor binding motifs than by simply scanning the genome for the occurrence of binding sites for a single transcription factor.

Show MeSH
Related in: MedlinePlus