Limits...
Predicting cellular growth from gene expression signatures.

Airoldi EM, Huttenhower C, Gresham D, Lu C, Caudy AA, Dunham MJ, Broach JR, Botstein D, Troyanskaya OG - PLoS Comput. Biol. (2009)

Bottom Line: The proposed model is also effective in predicting growth rates for the related yeast Saccharomyces bayanus and the highly diverged yeast Schizosaccharomyces pombe, suggesting that the underlying regulatory signature is conserved across a wide range of unicellular evolution.We investigate the biological significance of the gene expression signature that the predictions are based upon from multiple perspectives: by perturbing the regulatory network through the Ras/PKA pathway, observing strong upregulation of growth rate even in the absence of appropriate nutrients, and discovering putative transcription factor binding sites, observing enrichment in growth-correlated genes.More broadly, the proposed methodology enables biological insights about growth at an instantaneous time scale, inaccessible by direct experimental methods.

View Article: PubMed Central - PubMed

Affiliation: Lewis-Sigler Institute for Integrative Genomics, Carl Icahn Laboratory, Princeton University, Princeton, New Jersey, United States of America.

ABSTRACT
Maintaining balanced growth in a changing environment is a fundamental systems-level challenge for cellular physiology, particularly in microorganisms. While the complete set of regulatory and functional pathways supporting growth and cellular proliferation are not yet known, portions of them are well understood. In particular, cellular proliferation is governed by mechanisms that are highly conserved from unicellular to multicellular organisms, and the disruption of these processes in metazoans is a major factor in the development of cancer. In this paper, we develop statistical methodology to identify quantitative aspects of the regulatory mechanisms underlying cellular proliferation in Saccharomyces cerevisiae. We find that the expression levels of a small set of genes can be exploited to predict the instantaneous growth rate of any cellular culture with high accuracy. The predictions obtained in this fashion are robust to changing biological conditions, experimental methods, and technological platforms. The proposed model is also effective in predicting growth rates for the related yeast Saccharomyces bayanus and the highly diverged yeast Schizosaccharomyces pombe, suggesting that the underlying regulatory signature is conserved across a wide range of unicellular evolution. We investigate the biological significance of the gene expression signature that the predictions are based upon from multiple perspectives: by perturbing the regulatory network through the Ras/PKA pathway, observing strong upregulation of growth rate even in the absence of appropriate nutrients, and discovering putative transcription factor binding sites, observing enrichment in growth-correlated genes. More broadly, the proposed methodology enables biological insights about growth at an instantaneous time scale, inaccessible by direct experimental methods. Data and tools enabling others to apply our methods are available at http://function.princeton.edu/growthrate.

Show MeSH

Related in: MedlinePlus

Assessment of accuracy and outlier detection during growth rateinference.(A) We performed an out-of-sample cross-validation of our model byrandomly sub-sampling 24 of the 36 training expression arrays 1,000times. We refit our linear model in each random sample, calculatedbootstrapped  distributions for all gene parameters, and found setsof the most significant growth-specific genes. These were then used toinfer growth rates for the 12 held-out conditions, providing an estimateof the accuracy of the model's growth rate predictions. (B)When predicting the growth rate of a new collection of expression data,our model excludes any calibration gene with an expression level outsidethe inner fence (1.5 times the inter-quartile range below or above thefirst or third quartiles). This improves predicted growth rate accuracywhile also calling out genes potentially responding to specificnon-growth stimuli under some biological condition. For example, in the[6] mild heat shock time course, two ofthe six outliers are known heat shock genes (HSP26 and HSP78). The otherfour (YLR327C, MOH1, YBL048W, and TMA10) are uncharacterized genes,suggesting potential roles in the response to heat shock.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2599889&req=5

pcbi-1000257-g005: Assessment of accuracy and outlier detection during growth rateinference.(A) We performed an out-of-sample cross-validation of our model byrandomly sub-sampling 24 of the 36 training expression arrays 1,000times. We refit our linear model in each random sample, calculatedbootstrapped distributions for all gene parameters, and found setsof the most significant growth-specific genes. These were then used toinfer growth rates for the 12 held-out conditions, providing an estimateof the accuracy of the model's growth rate predictions. (B)When predicting the growth rate of a new collection of expression data,our model excludes any calibration gene with an expression level outsidethe inner fence (1.5 times the inter-quartile range below or above thefirst or third quartiles). This improves predicted growth rate accuracywhile also calling out genes potentially responding to specificnon-growth stimuli under some biological condition. For example, in the[6] mild heat shock time course, two ofthe six outliers are known heat shock genes (HSP26 and HSP78). The otherfour (YLR327C, MOH1, YBL048W, and TMA10) are uncharacterized genes,suggesting potential roles in the response to heat shock.

Mentions: We assessed the quality of our growth rate predictions using 1,000 out-of-sampleexperiments, according to a hybrid bootstrap/cross-validation setup, using thedata from [4]. Results are shown in Figure 5A. In each experiment, we randomlywithheld 12 of the 36 conditions for testing, fit our linear model on theremaining 24, derived bootstrapped distributions using only these data, anddetermined growth-specific gene sets to use for growth rate inference on theheld-out conditions. This experimental setup leads to absolute growth ratepredictions directly, as all the dual-channel microarrays share the sametranscriptional readout in the reference channel. This out-of-sample validationallowed us to assess the accuracy and variability of our predictions onconditions with known growth rates not included in the model building procedure.In addition to the performance indicated by Figure 5A, the out-of-sample experimentsdemonstrated robustness of p-value cutoffs and number of growth-specific genes;these ranged in number from ∼50 to ∼110 across the randomizedvalidations (of a total ∼5,500 possible genes), and changes of thismagnitude in the final calibration gene set had little impact on predictedgrowth rates. We further quantified a notion of reliability for each of the 72growth-specific genes. Specifically, we computed the percentage,P, of bootstrap experiments in which each individual gene wasselected as a member of the growth-specific gene set. The percentages provide anexpectation about whether each individual gene should be considered reliable ina new study. We found that 69 genes were selected in more than half of theexperiments, P>0.5. Full results arereported in Table 2.


Predicting cellular growth from gene expression signatures.

Airoldi EM, Huttenhower C, Gresham D, Lu C, Caudy AA, Dunham MJ, Broach JR, Botstein D, Troyanskaya OG - PLoS Comput. Biol. (2009)

Assessment of accuracy and outlier detection during growth rateinference.(A) We performed an out-of-sample cross-validation of our model byrandomly sub-sampling 24 of the 36 training expression arrays 1,000times. We refit our linear model in each random sample, calculatedbootstrapped  distributions for all gene parameters, and found setsof the most significant growth-specific genes. These were then used toinfer growth rates for the 12 held-out conditions, providing an estimateof the accuracy of the model's growth rate predictions. (B)When predicting the growth rate of a new collection of expression data,our model excludes any calibration gene with an expression level outsidethe inner fence (1.5 times the inter-quartile range below or above thefirst or third quartiles). This improves predicted growth rate accuracywhile also calling out genes potentially responding to specificnon-growth stimuli under some biological condition. For example, in the[6] mild heat shock time course, two ofthe six outliers are known heat shock genes (HSP26 and HSP78). The otherfour (YLR327C, MOH1, YBL048W, and TMA10) are uncharacterized genes,suggesting potential roles in the response to heat shock.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2599889&req=5

pcbi-1000257-g005: Assessment of accuracy and outlier detection during growth rateinference.(A) We performed an out-of-sample cross-validation of our model byrandomly sub-sampling 24 of the 36 training expression arrays 1,000times. We refit our linear model in each random sample, calculatedbootstrapped distributions for all gene parameters, and found setsof the most significant growth-specific genes. These were then used toinfer growth rates for the 12 held-out conditions, providing an estimateof the accuracy of the model's growth rate predictions. (B)When predicting the growth rate of a new collection of expression data,our model excludes any calibration gene with an expression level outsidethe inner fence (1.5 times the inter-quartile range below or above thefirst or third quartiles). This improves predicted growth rate accuracywhile also calling out genes potentially responding to specificnon-growth stimuli under some biological condition. For example, in the[6] mild heat shock time course, two ofthe six outliers are known heat shock genes (HSP26 and HSP78). The otherfour (YLR327C, MOH1, YBL048W, and TMA10) are uncharacterized genes,suggesting potential roles in the response to heat shock.
Mentions: We assessed the quality of our growth rate predictions using 1,000 out-of-sampleexperiments, according to a hybrid bootstrap/cross-validation setup, using thedata from [4]. Results are shown in Figure 5A. In each experiment, we randomlywithheld 12 of the 36 conditions for testing, fit our linear model on theremaining 24, derived bootstrapped distributions using only these data, anddetermined growth-specific gene sets to use for growth rate inference on theheld-out conditions. This experimental setup leads to absolute growth ratepredictions directly, as all the dual-channel microarrays share the sametranscriptional readout in the reference channel. This out-of-sample validationallowed us to assess the accuracy and variability of our predictions onconditions with known growth rates not included in the model building procedure.In addition to the performance indicated by Figure 5A, the out-of-sample experimentsdemonstrated robustness of p-value cutoffs and number of growth-specific genes;these ranged in number from ∼50 to ∼110 across the randomizedvalidations (of a total ∼5,500 possible genes), and changes of thismagnitude in the final calibration gene set had little impact on predictedgrowth rates. We further quantified a notion of reliability for each of the 72growth-specific genes. Specifically, we computed the percentage,P, of bootstrap experiments in which each individual gene wasselected as a member of the growth-specific gene set. The percentages provide anexpectation about whether each individual gene should be considered reliable ina new study. We found that 69 genes were selected in more than half of theexperiments, P>0.5. Full results arereported in Table 2.

Bottom Line: The proposed model is also effective in predicting growth rates for the related yeast Saccharomyces bayanus and the highly diverged yeast Schizosaccharomyces pombe, suggesting that the underlying regulatory signature is conserved across a wide range of unicellular evolution.We investigate the biological significance of the gene expression signature that the predictions are based upon from multiple perspectives: by perturbing the regulatory network through the Ras/PKA pathway, observing strong upregulation of growth rate even in the absence of appropriate nutrients, and discovering putative transcription factor binding sites, observing enrichment in growth-correlated genes.More broadly, the proposed methodology enables biological insights about growth at an instantaneous time scale, inaccessible by direct experimental methods.

View Article: PubMed Central - PubMed

Affiliation: Lewis-Sigler Institute for Integrative Genomics, Carl Icahn Laboratory, Princeton University, Princeton, New Jersey, United States of America.

ABSTRACT
Maintaining balanced growth in a changing environment is a fundamental systems-level challenge for cellular physiology, particularly in microorganisms. While the complete set of regulatory and functional pathways supporting growth and cellular proliferation are not yet known, portions of them are well understood. In particular, cellular proliferation is governed by mechanisms that are highly conserved from unicellular to multicellular organisms, and the disruption of these processes in metazoans is a major factor in the development of cancer. In this paper, we develop statistical methodology to identify quantitative aspects of the regulatory mechanisms underlying cellular proliferation in Saccharomyces cerevisiae. We find that the expression levels of a small set of genes can be exploited to predict the instantaneous growth rate of any cellular culture with high accuracy. The predictions obtained in this fashion are robust to changing biological conditions, experimental methods, and technological platforms. The proposed model is also effective in predicting growth rates for the related yeast Saccharomyces bayanus and the highly diverged yeast Schizosaccharomyces pombe, suggesting that the underlying regulatory signature is conserved across a wide range of unicellular evolution. We investigate the biological significance of the gene expression signature that the predictions are based upon from multiple perspectives: by perturbing the regulatory network through the Ras/PKA pathway, observing strong upregulation of growth rate even in the absence of appropriate nutrients, and discovering putative transcription factor binding sites, observing enrichment in growth-correlated genes. More broadly, the proposed methodology enables biological insights about growth at an instantaneous time scale, inaccessible by direct experimental methods. Data and tools enabling others to apply our methods are available at http://function.princeton.edu/growthrate.

Show MeSH
Related in: MedlinePlus