Limits...
Predicting cellular growth from gene expression signatures.

Airoldi EM, Huttenhower C, Gresham D, Lu C, Caudy AA, Dunham MJ, Broach JR, Botstein D, Troyanskaya OG - PLoS Comput. Biol. (2009)

Bottom Line: The proposed model is also effective in predicting growth rates for the related yeast Saccharomyces bayanus and the highly diverged yeast Schizosaccharomyces pombe, suggesting that the underlying regulatory signature is conserved across a wide range of unicellular evolution.We investigate the biological significance of the gene expression signature that the predictions are based upon from multiple perspectives: by perturbing the regulatory network through the Ras/PKA pathway, observing strong upregulation of growth rate even in the absence of appropriate nutrients, and discovering putative transcription factor binding sites, observing enrichment in growth-correlated genes.More broadly, the proposed methodology enables biological insights about growth at an instantaneous time scale, inaccessible by direct experimental methods.

View Article: PubMed Central - PubMed

Affiliation: Lewis-Sigler Institute for Integrative Genomics, Carl Icahn Laboratory, Princeton University, Princeton, New Jersey, United States of America.

ABSTRACT
Maintaining balanced growth in a changing environment is a fundamental systems-level challenge for cellular physiology, particularly in microorganisms. While the complete set of regulatory and functional pathways supporting growth and cellular proliferation are not yet known, portions of them are well understood. In particular, cellular proliferation is governed by mechanisms that are highly conserved from unicellular to multicellular organisms, and the disruption of these processes in metazoans is a major factor in the development of cancer. In this paper, we develop statistical methodology to identify quantitative aspects of the regulatory mechanisms underlying cellular proliferation in Saccharomyces cerevisiae. We find that the expression levels of a small set of genes can be exploited to predict the instantaneous growth rate of any cellular culture with high accuracy. The predictions obtained in this fashion are robust to changing biological conditions, experimental methods, and technological platforms. The proposed model is also effective in predicting growth rates for the related yeast Saccharomyces bayanus and the highly diverged yeast Schizosaccharomyces pombe, suggesting that the underlying regulatory signature is conserved across a wide range of unicellular evolution. We investigate the biological significance of the gene expression signature that the predictions are based upon from multiple perspectives: by perturbing the regulatory network through the Ras/PKA pathway, observing strong upregulation of growth rate even in the absence of appropriate nutrients, and discovering putative transcription factor binding sites, observing enrichment in growth-correlated genes. More broadly, the proposed methodology enables biological insights about growth at an instantaneous time scale, inaccessible by direct experimental methods. Data and tools enabling others to apply our methods are available at http://function.princeton.edu/growthrate.

Show MeSH

Related in: MedlinePlus

Assessment of accuracy and outlier detection during growth rate                            inference.(A) We performed an out-of-sample cross-validation of our model by                            randomly sub-sampling 24 of the 36 training expression arrays 1,000                            times. We refit our linear model in each random sample, calculated                            bootstrapped  distributions for all gene parameters, and found sets                            of the most significant growth-specific genes. These were then used to                            infer growth rates for the 12 held-out conditions, providing an estimate                            of the accuracy of the model's growth rate predictions. (B)                            When predicting the growth rate of a new collection of expression data,                            our model excludes any calibration gene with an expression level outside                            the inner fence (1.5 times the inter-quartile range below or above the                            first or third quartiles). This improves predicted growth rate accuracy                            while also calling out genes potentially responding to specific                            non-growth stimuli under some biological condition. For example, in the                                [6] mild heat shock time course, two of                            the six outliers are known heat shock genes (HSP26 and HSP78). The other                            four (YLR327C, MOH1, YBL048W, and TMA10) are uncharacterized genes,                            suggesting potential roles in the response to heat shock.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2599889&req=5

pcbi-1000257-g005: Assessment of accuracy and outlier detection during growth rate inference.(A) We performed an out-of-sample cross-validation of our model by randomly sub-sampling 24 of the 36 training expression arrays 1,000 times. We refit our linear model in each random sample, calculated bootstrapped distributions for all gene parameters, and found sets of the most significant growth-specific genes. These were then used to infer growth rates for the 12 held-out conditions, providing an estimate of the accuracy of the model's growth rate predictions. (B) When predicting the growth rate of a new collection of expression data, our model excludes any calibration gene with an expression level outside the inner fence (1.5 times the inter-quartile range below or above the first or third quartiles). This improves predicted growth rate accuracy while also calling out genes potentially responding to specific non-growth stimuli under some biological condition. For example, in the [6] mild heat shock time course, two of the six outliers are known heat shock genes (HSP26 and HSP78). The other four (YLR327C, MOH1, YBL048W, and TMA10) are uncharacterized genes, suggesting potential roles in the response to heat shock.

Mentions: We assessed the quality of our growth rate predictions using 1,000 out-of-sample experiments, according to a hybrid bootstrap/cross-validation setup, using the data from [4]. Results are shown in Figure 5A. In each experiment, we randomly withheld 12 of the 36 conditions for testing, fit our linear model on the remaining 24, derived bootstrapped distributions using only these data, and determined growth-specific gene sets to use for growth rate inference on the held-out conditions. This experimental setup leads to absolute growth rate predictions directly, as all the dual-channel microarrays share the same transcriptional readout in the reference channel. This out-of-sample validation allowed us to assess the accuracy and variability of our predictions on conditions with known growth rates not included in the model building procedure. In addition to the performance indicated by Figure 5A, the out-of-sample experiments demonstrated robustness of p-value cutoffs and number of growth-specific genes; these ranged in number from ∼50 to ∼110 across the randomized validations (of a total ∼5,500 possible genes), and changes of this magnitude in the final calibration gene set had little impact on predicted growth rates. We further quantified a notion of reliability for each of the 72 growth-specific genes. Specifically, we computed the percentage, P, of bootstrap experiments in which each individual gene was selected as a member of the growth-specific gene set. The percentages provide an expectation about whether each individual gene should be considered reliable in a new study. We found that 69 genes were selected in more than half of the experiments, P>0.5. Full results are reported in Table 2.


Predicting cellular growth from gene expression signatures.

Airoldi EM, Huttenhower C, Gresham D, Lu C, Caudy AA, Dunham MJ, Broach JR, Botstein D, Troyanskaya OG - PLoS Comput. Biol. (2009)

Assessment of accuracy and outlier detection during growth rate                            inference.(A) We performed an out-of-sample cross-validation of our model by                            randomly sub-sampling 24 of the 36 training expression arrays 1,000                            times. We refit our linear model in each random sample, calculated                            bootstrapped  distributions for all gene parameters, and found sets                            of the most significant growth-specific genes. These were then used to                            infer growth rates for the 12 held-out conditions, providing an estimate                            of the accuracy of the model's growth rate predictions. (B)                            When predicting the growth rate of a new collection of expression data,                            our model excludes any calibration gene with an expression level outside                            the inner fence (1.5 times the inter-quartile range below or above the                            first or third quartiles). This improves predicted growth rate accuracy                            while also calling out genes potentially responding to specific                            non-growth stimuli under some biological condition. For example, in the                                [6] mild heat shock time course, two of                            the six outliers are known heat shock genes (HSP26 and HSP78). The other                            four (YLR327C, MOH1, YBL048W, and TMA10) are uncharacterized genes,                            suggesting potential roles in the response to heat shock.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2599889&req=5

pcbi-1000257-g005: Assessment of accuracy and outlier detection during growth rate inference.(A) We performed an out-of-sample cross-validation of our model by randomly sub-sampling 24 of the 36 training expression arrays 1,000 times. We refit our linear model in each random sample, calculated bootstrapped distributions for all gene parameters, and found sets of the most significant growth-specific genes. These were then used to infer growth rates for the 12 held-out conditions, providing an estimate of the accuracy of the model's growth rate predictions. (B) When predicting the growth rate of a new collection of expression data, our model excludes any calibration gene with an expression level outside the inner fence (1.5 times the inter-quartile range below or above the first or third quartiles). This improves predicted growth rate accuracy while also calling out genes potentially responding to specific non-growth stimuli under some biological condition. For example, in the [6] mild heat shock time course, two of the six outliers are known heat shock genes (HSP26 and HSP78). The other four (YLR327C, MOH1, YBL048W, and TMA10) are uncharacterized genes, suggesting potential roles in the response to heat shock.
Mentions: We assessed the quality of our growth rate predictions using 1,000 out-of-sample experiments, according to a hybrid bootstrap/cross-validation setup, using the data from [4]. Results are shown in Figure 5A. In each experiment, we randomly withheld 12 of the 36 conditions for testing, fit our linear model on the remaining 24, derived bootstrapped distributions using only these data, and determined growth-specific gene sets to use for growth rate inference on the held-out conditions. This experimental setup leads to absolute growth rate predictions directly, as all the dual-channel microarrays share the same transcriptional readout in the reference channel. This out-of-sample validation allowed us to assess the accuracy and variability of our predictions on conditions with known growth rates not included in the model building procedure. In addition to the performance indicated by Figure 5A, the out-of-sample experiments demonstrated robustness of p-value cutoffs and number of growth-specific genes; these ranged in number from ∼50 to ∼110 across the randomized validations (of a total ∼5,500 possible genes), and changes of this magnitude in the final calibration gene set had little impact on predicted growth rates. We further quantified a notion of reliability for each of the 72 growth-specific genes. Specifically, we computed the percentage, P, of bootstrap experiments in which each individual gene was selected as a member of the growth-specific gene set. The percentages provide an expectation about whether each individual gene should be considered reliable in a new study. We found that 69 genes were selected in more than half of the experiments, P>0.5. Full results are reported in Table 2.

Bottom Line: The proposed model is also effective in predicting growth rates for the related yeast Saccharomyces bayanus and the highly diverged yeast Schizosaccharomyces pombe, suggesting that the underlying regulatory signature is conserved across a wide range of unicellular evolution.We investigate the biological significance of the gene expression signature that the predictions are based upon from multiple perspectives: by perturbing the regulatory network through the Ras/PKA pathway, observing strong upregulation of growth rate even in the absence of appropriate nutrients, and discovering putative transcription factor binding sites, observing enrichment in growth-correlated genes.More broadly, the proposed methodology enables biological insights about growth at an instantaneous time scale, inaccessible by direct experimental methods.

View Article: PubMed Central - PubMed

Affiliation: Lewis-Sigler Institute for Integrative Genomics, Carl Icahn Laboratory, Princeton University, Princeton, New Jersey, United States of America.

ABSTRACT
Maintaining balanced growth in a changing environment is a fundamental systems-level challenge for cellular physiology, particularly in microorganisms. While the complete set of regulatory and functional pathways supporting growth and cellular proliferation are not yet known, portions of them are well understood. In particular, cellular proliferation is governed by mechanisms that are highly conserved from unicellular to multicellular organisms, and the disruption of these processes in metazoans is a major factor in the development of cancer. In this paper, we develop statistical methodology to identify quantitative aspects of the regulatory mechanisms underlying cellular proliferation in Saccharomyces cerevisiae. We find that the expression levels of a small set of genes can be exploited to predict the instantaneous growth rate of any cellular culture with high accuracy. The predictions obtained in this fashion are robust to changing biological conditions, experimental methods, and technological platforms. The proposed model is also effective in predicting growth rates for the related yeast Saccharomyces bayanus and the highly diverged yeast Schizosaccharomyces pombe, suggesting that the underlying regulatory signature is conserved across a wide range of unicellular evolution. We investigate the biological significance of the gene expression signature that the predictions are based upon from multiple perspectives: by perturbing the regulatory network through the Ras/PKA pathway, observing strong upregulation of growth rate even in the absence of appropriate nutrients, and discovering putative transcription factor binding sites, observing enrichment in growth-correlated genes. More broadly, the proposed methodology enables biological insights about growth at an instantaneous time scale, inaccessible by direct experimental methods. Data and tools enabling others to apply our methods are available at http://function.princeton.edu/growthrate.

Show MeSH
Related in: MedlinePlus