Limits...
Identification of yeast transcriptional regulation networks using multivariate random forests.

Xiao Y, Segal MR - PLoS Comput. Biol. (2009)

Bottom Line: In addition, we present evidence of the existence of an alternative MCB-binding pathway, which we confirm using data from two independent cell cycle studies and two other physioloigical processes.Finally, we have uncovered elaborate transcription regulation refinement mechanisms involving PAC and mRRPE motifs that govern essential rRNA processing.These include intriguing instances of differing motif dosages and differing combinatorial motif control that promote regulatory specificity in rRNA metabolism under differing physiological processes.

View Article: PubMed Central - PubMed

Affiliation: Department of Epidemiology and Biostatistics, Center for Bioinformatics and Molecular Biostatistics, University of California, San Francisco, California, USA. Yuanyuan.Xiao@ucsf.edu

ABSTRACT
The recent availability of whole-genome scale data sets that investigate complementary and diverse aspects of transcriptional regulation has spawned an increased need for new and effective computational approaches to analyze and integrate these large scale assays. Here, we propose a novel algorithm, based on random forest methodology, to relate gene expression (as derived from expression microarrays) to sequence features residing in gene promoters (as derived from DNA motif data) and transcription factor binding to gene promoters (as derived from tiling microarrays). We extend the random forest approach to model a multivariate response as represented, for example, by time-course gene expression measures. An analysis of the multivariate random forest output reveals complex regulatory networks, which consist of cohesive, condition-dependent regulatory cliques. Each regulatory clique features homogeneous gene expression profiles and common motifs or synergistic motif groups. We apply our method to several yeast physiological processes: cell cycle, sporulation, and various stress conditions. Our technique displays excellent performance with regard to identifying known regulatory motifs, including high order interactions. In addition, we present evidence of the existence of an alternative MCB-binding pathway, which we confirm using data from two independent cell cycle studies and two other physioloigical processes. Finally, we have uncovered elaborate transcription regulation refinement mechanisms involving PAC and mRRPE motifs that govern essential rRNA processing. These include intriguing instances of differing motif dosages and differing combinatorial motif control that promote regulatory specificity in rRNA metabolism under differing physiological processes.

Show MeSH
Signature motifs in identified cell cycle RCs using motifs as predictors.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2691601&req=5

pcbi-1000414-g007: Signature motifs in identified cell cycle RCs using motifs as predictors.

Mentions: We then compared our method to unsupervised clustering, which had early successes in analyzing motif-expression relationships. We applied PAM to the cell cycle expression data, and prescribed 13 clusters, matching the number of clusters used with MRF. The sizes of the resulting clusters ranged from 94 to 256, appreciably more uniform than those derived from MRF, which ranged from 54 to 599. The largest MRF cluster consists of essentially (non-varying) genes. Cross-tabulating these two gene categorization schemes reveals that the members of this large cluster are evenly distributed across all unsupervised clusters, potentially diluting meaningful cluster-specific information. Indeed, enrichment analysis conducted within each unsupervised cluster yields only four clusters with significant feature motifs. Moreover, the signals within each cluster are much more attenuated. The RAP1 cluster has 145 genes, but only 30.3% of them contain the actual RAP1 motif. The MCB cluster has 96 members with 28.6% MCB motif occurrence. The MCM1' cluster contains 181 genes with a 56.9% prevalence of the MCM1' motif. Lastly, the MSE cluster is comprised of 228 members, only 9.1% of which possess the MSE motifs. The stark contrast in motif enrichment strength compared to MRF (See Figure 7) is due to a lack of simultaneous evaluation of both components of regulation: motif and expression. Such limitations are inherent in unsupervised approaches and have been widely noted in the context of microarray classification / regression problems. Increasing the number of clusters does not lead to discovery of more meaningful regulatory modules (results not shown).


Identification of yeast transcriptional regulation networks using multivariate random forests.

Xiao Y, Segal MR - PLoS Comput. Biol. (2009)

Signature motifs in identified cell cycle RCs using motifs as predictors.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2691601&req=5

pcbi-1000414-g007: Signature motifs in identified cell cycle RCs using motifs as predictors.
Mentions: We then compared our method to unsupervised clustering, which had early successes in analyzing motif-expression relationships. We applied PAM to the cell cycle expression data, and prescribed 13 clusters, matching the number of clusters used with MRF. The sizes of the resulting clusters ranged from 94 to 256, appreciably more uniform than those derived from MRF, which ranged from 54 to 599. The largest MRF cluster consists of essentially (non-varying) genes. Cross-tabulating these two gene categorization schemes reveals that the members of this large cluster are evenly distributed across all unsupervised clusters, potentially diluting meaningful cluster-specific information. Indeed, enrichment analysis conducted within each unsupervised cluster yields only four clusters with significant feature motifs. Moreover, the signals within each cluster are much more attenuated. The RAP1 cluster has 145 genes, but only 30.3% of them contain the actual RAP1 motif. The MCB cluster has 96 members with 28.6% MCB motif occurrence. The MCM1' cluster contains 181 genes with a 56.9% prevalence of the MCM1' motif. Lastly, the MSE cluster is comprised of 228 members, only 9.1% of which possess the MSE motifs. The stark contrast in motif enrichment strength compared to MRF (See Figure 7) is due to a lack of simultaneous evaluation of both components of regulation: motif and expression. Such limitations are inherent in unsupervised approaches and have been widely noted in the context of microarray classification / regression problems. Increasing the number of clusters does not lead to discovery of more meaningful regulatory modules (results not shown).

Bottom Line: In addition, we present evidence of the existence of an alternative MCB-binding pathway, which we confirm using data from two independent cell cycle studies and two other physioloigical processes.Finally, we have uncovered elaborate transcription regulation refinement mechanisms involving PAC and mRRPE motifs that govern essential rRNA processing.These include intriguing instances of differing motif dosages and differing combinatorial motif control that promote regulatory specificity in rRNA metabolism under differing physiological processes.

View Article: PubMed Central - PubMed

Affiliation: Department of Epidemiology and Biostatistics, Center for Bioinformatics and Molecular Biostatistics, University of California, San Francisco, California, USA. Yuanyuan.Xiao@ucsf.edu

ABSTRACT
The recent availability of whole-genome scale data sets that investigate complementary and diverse aspects of transcriptional regulation has spawned an increased need for new and effective computational approaches to analyze and integrate these large scale assays. Here, we propose a novel algorithm, based on random forest methodology, to relate gene expression (as derived from expression microarrays) to sequence features residing in gene promoters (as derived from DNA motif data) and transcription factor binding to gene promoters (as derived from tiling microarrays). We extend the random forest approach to model a multivariate response as represented, for example, by time-course gene expression measures. An analysis of the multivariate random forest output reveals complex regulatory networks, which consist of cohesive, condition-dependent regulatory cliques. Each regulatory clique features homogeneous gene expression profiles and common motifs or synergistic motif groups. We apply our method to several yeast physiological processes: cell cycle, sporulation, and various stress conditions. Our technique displays excellent performance with regard to identifying known regulatory motifs, including high order interactions. In addition, we present evidence of the existence of an alternative MCB-binding pathway, which we confirm using data from two independent cell cycle studies and two other physioloigical processes. Finally, we have uncovered elaborate transcription regulation refinement mechanisms involving PAC and mRRPE motifs that govern essential rRNA processing. These include intriguing instances of differing motif dosages and differing combinatorial motif control that promote regulatory specificity in rRNA metabolism under differing physiological processes.

Show MeSH