Limits...
Prediction of tissue-specific cis-regulatory modules using Bayesian networks and regression trees.

Chen X, Blanchette M - BMC Bioinformatics (2007)

Bottom Line: The network integrates predicted transcription factor binding site information, transcription factor expression data, and target gene expression data.Our approach is shown to accurately identify known human liver and erythroid-specific modules.When applied to the prediction of tissue-specific modules in 10 different tissues, the network predicts a number of important transcription factor combinations whose concerted binding is associated to specific expression.

View Article: PubMed Central - HTML - PubMed

Affiliation: McGill Centre for Bioinformatics, 3775 University Street, room 332, Montreal, Quebec, Canada, H3A 2B4. xchen@cs.washington.edu

ABSTRACT

Background: In vertebrates, a large part of gene transcriptional regulation is operated by cis-regulatory modules. These modules are believed to be regulating much of the tissue-specificity of gene expression.

Result: We develop a Bayesian network approach for identifying cis-regulatory modules likely to regulate tissue-specific expression. The network integrates predicted transcription factor binding site information, transcription factor expression data, and target gene expression data. At its core is a regression tree modeling the effect of combinations of transcription factors bound to a module. A new unsupervised EM-like algorithm is developed to learn the parameters of the network, including the regression tree structure.

Conclusion: Our approach is shown to accurately identify known human liver and erythroid-specific modules. When applied to the prediction of tissue-specific modules in 10 different tissues, the network predicts a number of important transcription factor combinations whose concerted binding is associated to specific expression.

Show MeSH

Related in: MedlinePlus

The precision v.s. recall curve for the 1X (left) and 2X (right) data sets, where precision = TP/(TP + FP) and recall = TP/(TP + FN). The blue curve (diamond markers) is generated from the results of our approach, the brown curve (× markers) is generated from the results of the Supervised-NaiveBayes approach (see Appendix 4), and the green curve (circle markers) is generated from the results of the NaiveBayesInNet classifier (see Appendix 5). The pink triangle shows the result obtained by the expressionOnly classifier. Error bars denote one standard deviation of the precision, over 100 random choices of negative examples. The increase in the standard deviation on precision at lower recall is due to the small number of predictions made for these thresholds.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2230503&req=5

Figure 3: The precision v.s. recall curve for the 1X (left) and 2X (right) data sets, where precision = TP/(TP + FP) and recall = TP/(TP + FN). The blue curve (diamond markers) is generated from the results of our approach, the brown curve (× markers) is generated from the results of the Supervised-NaiveBayes approach (see Appendix 4), and the green curve (circle markers) is generated from the results of the NaiveBayesInNet classifier (see Appendix 5). The pink triangle shows the result obtained by the expressionOnly classifier. Error bars denote one standard deviation of the precision, over 100 random choices of negative examples. The increase in the standard deviation on precision at lower recall is due to the small number of predictions made for these thresholds.

Mentions: One hundred different runs of our EM-like algorithm were done on 1X and 2X datasets, each time with a different sample of negative modules. Each run used 100 EM-like iterations (taking approximately 10 minutes of running time), which was sufficient to achieve convergence, although different runs converge to slightly different likelihoods and regression trees (see Additional File 1). Since we do not know which of the putativeLiver and putativeErythroid CRMs are actually tissue-specific modules, we evaluate the performance of our algorithm based only on the positive and the negative modules. For each run, the network with the best likelihood over 100 EM-like iterations is used to compute Pr[tRm/A, E, F] for all examples and a module-tissue pair is predicted positive if this probability exceed some threshold t. The resulting precision-recall curve, averaged over all 100 runs, is shown in Figure 3, for both the 1X and 2X data set.


Prediction of tissue-specific cis-regulatory modules using Bayesian networks and regression trees.

Chen X, Blanchette M - BMC Bioinformatics (2007)

The precision v.s. recall curve for the 1X (left) and 2X (right) data sets, where precision = TP/(TP + FP) and recall = TP/(TP + FN). The blue curve (diamond markers) is generated from the results of our approach, the brown curve (× markers) is generated from the results of the Supervised-NaiveBayes approach (see Appendix 4), and the green curve (circle markers) is generated from the results of the NaiveBayesInNet classifier (see Appendix 5). The pink triangle shows the result obtained by the expressionOnly classifier. Error bars denote one standard deviation of the precision, over 100 random choices of negative examples. The increase in the standard deviation on precision at lower recall is due to the small number of predictions made for these thresholds.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2230503&req=5

Figure 3: The precision v.s. recall curve for the 1X (left) and 2X (right) data sets, where precision = TP/(TP + FP) and recall = TP/(TP + FN). The blue curve (diamond markers) is generated from the results of our approach, the brown curve (× markers) is generated from the results of the Supervised-NaiveBayes approach (see Appendix 4), and the green curve (circle markers) is generated from the results of the NaiveBayesInNet classifier (see Appendix 5). The pink triangle shows the result obtained by the expressionOnly classifier. Error bars denote one standard deviation of the precision, over 100 random choices of negative examples. The increase in the standard deviation on precision at lower recall is due to the small number of predictions made for these thresholds.
Mentions: One hundred different runs of our EM-like algorithm were done on 1X and 2X datasets, each time with a different sample of negative modules. Each run used 100 EM-like iterations (taking approximately 10 minutes of running time), which was sufficient to achieve convergence, although different runs converge to slightly different likelihoods and regression trees (see Additional File 1). Since we do not know which of the putativeLiver and putativeErythroid CRMs are actually tissue-specific modules, we evaluate the performance of our algorithm based only on the positive and the negative modules. For each run, the network with the best likelihood over 100 EM-like iterations is used to compute Pr[tRm/A, E, F] for all examples and a module-tissue pair is predicted positive if this probability exceed some threshold t. The resulting precision-recall curve, averaged over all 100 runs, is shown in Figure 3, for both the 1X and 2X data set.

Bottom Line: The network integrates predicted transcription factor binding site information, transcription factor expression data, and target gene expression data.Our approach is shown to accurately identify known human liver and erythroid-specific modules.When applied to the prediction of tissue-specific modules in 10 different tissues, the network predicts a number of important transcription factor combinations whose concerted binding is associated to specific expression.

View Article: PubMed Central - HTML - PubMed

Affiliation: McGill Centre for Bioinformatics, 3775 University Street, room 332, Montreal, Quebec, Canada, H3A 2B4. xchen@cs.washington.edu

ABSTRACT

Background: In vertebrates, a large part of gene transcriptional regulation is operated by cis-regulatory modules. These modules are believed to be regulating much of the tissue-specificity of gene expression.

Result: We develop a Bayesian network approach for identifying cis-regulatory modules likely to regulate tissue-specific expression. The network integrates predicted transcription factor binding site information, transcription factor expression data, and target gene expression data. At its core is a regression tree modeling the effect of combinations of transcription factors bound to a module. A new unsupervised EM-like algorithm is developed to learn the parameters of the network, including the regression tree structure.

Conclusion: Our approach is shown to accurately identify known human liver and erythroid-specific modules. When applied to the prediction of tissue-specific modules in 10 different tissues, the network predicts a number of important transcription factor combinations whose concerted binding is associated to specific expression.

Show MeSH
Related in: MedlinePlus