Limits...
Identification of yeast transcriptional regulation networks using multivariate random forests.

Xiao Y, Segal MR - PLoS Comput. Biol. (2009)

Bottom Line: In addition, we present evidence of the existence of an alternative MCB-binding pathway, which we confirm using data from two independent cell cycle studies and two other physioloigical processes.Finally, we have uncovered elaborate transcription regulation refinement mechanisms involving PAC and mRRPE motifs that govern essential rRNA processing.These include intriguing instances of differing motif dosages and differing combinatorial motif control that promote regulatory specificity in rRNA metabolism under differing physiological processes.

View Article: PubMed Central - PubMed

Affiliation: Department of Epidemiology and Biostatistics, Center for Bioinformatics and Molecular Biostatistics, University of California, San Francisco, California, USA. Yuanyuan.Xiao@ucsf.edu

ABSTRACT
The recent availability of whole-genome scale data sets that investigate complementary and diverse aspects of transcriptional regulation has spawned an increased need for new and effective computational approaches to analyze and integrate these large scale assays. Here, we propose a novel algorithm, based on random forest methodology, to relate gene expression (as derived from expression microarrays) to sequence features residing in gene promoters (as derived from DNA motif data) and transcription factor binding to gene promoters (as derived from tiling microarrays). We extend the random forest approach to model a multivariate response as represented, for example, by time-course gene expression measures. An analysis of the multivariate random forest output reveals complex regulatory networks, which consist of cohesive, condition-dependent regulatory cliques. Each regulatory clique features homogeneous gene expression profiles and common motifs or synergistic motif groups. We apply our method to several yeast physiological processes: cell cycle, sporulation, and various stress conditions. Our technique displays excellent performance with regard to identifying known regulatory motifs, including high order interactions. In addition, we present evidence of the existence of an alternative MCB-binding pathway, which we confirm using data from two independent cell cycle studies and two other physioloigical processes. Finally, we have uncovered elaborate transcription regulation refinement mechanisms involving PAC and mRRPE motifs that govern essential rRNA processing. These include intriguing instances of differing motif dosages and differing combinatorial motif control that promote regulatory specificity in rRNA metabolism under differing physiological processes.

Show MeSH
Outputs of (A) relative prediction error (left axis) and absolute prediction error (right axis) and (B) variable importance measures from MRT.Black traces are the real, observed statistics, whereas gray traces are derived from the 100 permuted data. Only top 100 ordered motifs are drawn in B.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2691601&req=5

pcbi-1000414-g003: Outputs of (A) relative prediction error (left axis) and absolute prediction error (right axis) and (B) variable importance measures from MRT.Black traces are the real, observed statistics, whereas gray traces are derived from the 100 permuted data. Only top 100 ordered motifs are drawn in B.

Mentions: An obvious question is whether the observed improved prediction performance and the highly ranked motifs (via variable importance measures) result from meaningful regulatory relationships. In the absence of experimental validation, we address this by disrupting the original motif (X matrix) – expression (Y matrix for the cell cycle data) correspondence by randomly permuting the rows of the expression matrix. So doing disassociates response-predictor relationships, but preserves within-predictor and within-response correlation structures. The relative prediction error traces and the ordered variable importance measures for the 100 permuted data sets (in gray) are displayed in contrast to those calculated from the original data set (in black) in Figure 3. The randomization process provides a means to assess model quality and significance of the observed summaries including relative prediction error and motif importance measures. This is carried out by computing the relative prediction error and motif importance measures for each permuted data set. A histogram is then formed for each statistic and a permutation p value derived. The permutation p values for variable importance were evaluated collectively and adjusted using the false discovery rate (FDR) control procedure proposed by Benjamini and Hochberg [30]. There are 19 motifs that have a FDR p value≤0.1, and they are highlighted in Figure 2B. A detailed discussion of motif importances and regulatory cliques for the cell cycle data follows.


Identification of yeast transcriptional regulation networks using multivariate random forests.

Xiao Y, Segal MR - PLoS Comput. Biol. (2009)

Outputs of (A) relative prediction error (left axis) and absolute prediction error (right axis) and (B) variable importance measures from MRT.Black traces are the real, observed statistics, whereas gray traces are derived from the 100 permuted data. Only top 100 ordered motifs are drawn in B.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2691601&req=5

pcbi-1000414-g003: Outputs of (A) relative prediction error (left axis) and absolute prediction error (right axis) and (B) variable importance measures from MRT.Black traces are the real, observed statistics, whereas gray traces are derived from the 100 permuted data. Only top 100 ordered motifs are drawn in B.
Mentions: An obvious question is whether the observed improved prediction performance and the highly ranked motifs (via variable importance measures) result from meaningful regulatory relationships. In the absence of experimental validation, we address this by disrupting the original motif (X matrix) – expression (Y matrix for the cell cycle data) correspondence by randomly permuting the rows of the expression matrix. So doing disassociates response-predictor relationships, but preserves within-predictor and within-response correlation structures. The relative prediction error traces and the ordered variable importance measures for the 100 permuted data sets (in gray) are displayed in contrast to those calculated from the original data set (in black) in Figure 3. The randomization process provides a means to assess model quality and significance of the observed summaries including relative prediction error and motif importance measures. This is carried out by computing the relative prediction error and motif importance measures for each permuted data set. A histogram is then formed for each statistic and a permutation p value derived. The permutation p values for variable importance were evaluated collectively and adjusted using the false discovery rate (FDR) control procedure proposed by Benjamini and Hochberg [30]. There are 19 motifs that have a FDR p value≤0.1, and they are highlighted in Figure 2B. A detailed discussion of motif importances and regulatory cliques for the cell cycle data follows.

Bottom Line: In addition, we present evidence of the existence of an alternative MCB-binding pathway, which we confirm using data from two independent cell cycle studies and two other physioloigical processes.Finally, we have uncovered elaborate transcription regulation refinement mechanisms involving PAC and mRRPE motifs that govern essential rRNA processing.These include intriguing instances of differing motif dosages and differing combinatorial motif control that promote regulatory specificity in rRNA metabolism under differing physiological processes.

View Article: PubMed Central - PubMed

Affiliation: Department of Epidemiology and Biostatistics, Center for Bioinformatics and Molecular Biostatistics, University of California, San Francisco, California, USA. Yuanyuan.Xiao@ucsf.edu

ABSTRACT
The recent availability of whole-genome scale data sets that investigate complementary and diverse aspects of transcriptional regulation has spawned an increased need for new and effective computational approaches to analyze and integrate these large scale assays. Here, we propose a novel algorithm, based on random forest methodology, to relate gene expression (as derived from expression microarrays) to sequence features residing in gene promoters (as derived from DNA motif data) and transcription factor binding to gene promoters (as derived from tiling microarrays). We extend the random forest approach to model a multivariate response as represented, for example, by time-course gene expression measures. An analysis of the multivariate random forest output reveals complex regulatory networks, which consist of cohesive, condition-dependent regulatory cliques. Each regulatory clique features homogeneous gene expression profiles and common motifs or synergistic motif groups. We apply our method to several yeast physiological processes: cell cycle, sporulation, and various stress conditions. Our technique displays excellent performance with regard to identifying known regulatory motifs, including high order interactions. In addition, we present evidence of the existence of an alternative MCB-binding pathway, which we confirm using data from two independent cell cycle studies and two other physioloigical processes. Finally, we have uncovered elaborate transcription regulation refinement mechanisms involving PAC and mRRPE motifs that govern essential rRNA processing. These include intriguing instances of differing motif dosages and differing combinatorial motif control that promote regulatory specificity in rRNA metabolism under differing physiological processes.

Show MeSH