Limits...
Determinants of protein abundance and translation efficiency in S. cerevisiae.

Tuller T, Kupiec M, Ruppin E - PLoS Comput. Biol. (2007)

Bottom Line: It attains a correlation of 0.76 with experimentally determined protein abundance levels on unseen data and successfully cross-predicts protein abundance levels in another yeast species (Schizosaccharomyces pombe).The predicted abundance levels of proteins in known S. cerevisiae complexes, and of interacting proteins, are significantly more coherent than their corresponding mRNA expression levels.Our analysis shows that in parallel to the adaptation occurring at the tRNA level via the codon bias, proteins do undergo a complementary adaptation at the amino acid level to further increase their abundance.

View Article: PubMed Central - PubMed

Affiliation: School of Computer Science, Tel Aviv University, Tel Aviv, Israel. tamirtul@post.tau.ac.il

ABSTRACT
The translation efficiency of most Saccharomyces cerevisiae genes remains fairly constant across poor and rich growth media. This observation has led us to revisit the available data and to examine the potential utility of a protein abundance predictor in reinterpreting existing mRNA expression data. Our predictor is based on large-scale data of mRNA levels, the tRNA adaptation index, and the evolutionary rate. It attains a correlation of 0.76 with experimentally determined protein abundance levels on unseen data and successfully cross-predicts protein abundance levels in another yeast species (Schizosaccharomyces pombe). The predicted abundance levels of proteins in known S. cerevisiae complexes, and of interacting proteins, are significantly more coherent than their corresponding mRNA expression levels. Analysis of gene expression measurement experiments using the predicted protein abundance levels yields new insights that are not readily discernable when clustering the corresponding mRNA expression levels. Comparing protein abundance levels across poor and rich media, we find a general trend for homeostatic regulation where transcription and translation change in a reciprocal manner. This phenomenon is more prominent near origins of replications. Our analysis shows that in parallel to the adaptation occurring at the tRNA level via the codon bias, proteins do undergo a complementary adaptation at the amino acid level to further increase their abundance.

Show MeSH
Performances of the Linear Predictor of (log) Protein Abundance(A) The accuracy of various linear predictors of (log) protein abundance, measured by the Spearman rank correlation coefficient over a held-out test set, using a single data source of protein abundance [2] and mRNA levels [15]. ER values are from [19], and tAI data are taken from [20]. The numbers below the arrows denote the t-test p-values for checking the  hypothesis that the predictor with the new added feature has identical performance to its predecessor (see Methods). The final predictor for protein abundance (PA) is log(PA) = 3.97 + 0.4 × log(mRNA) + 10.34 × tAI − 3.35 × ER.(B) Accuracy of various linear predictors, in the case where protein and mRNA levels are generated by averaging measurements from at least two data sources. The final predictor for protein abundance obtained in this case is log(PA) = 3.47 + 0.63 × log(mRNA) + 10.89 × tAI − 2.923 × ER.(C) The Spearman correlations (y-axis) of predicted protein abundance (mRNA) with measured protein abundance levels, binned at various levels of protein abundance p (x-axis, natural log). All the correlations are higher and significant in the case of predicted protein abundance (p < 2 × 10−5), except for the lowest bin log(p) < 7.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2230678&req=5

pcbi-0030248-g002: Performances of the Linear Predictor of (log) Protein Abundance(A) The accuracy of various linear predictors of (log) protein abundance, measured by the Spearman rank correlation coefficient over a held-out test set, using a single data source of protein abundance [2] and mRNA levels [15]. ER values are from [19], and tAI data are taken from [20]. The numbers below the arrows denote the t-test p-values for checking the hypothesis that the predictor with the new added feature has identical performance to its predecessor (see Methods). The final predictor for protein abundance (PA) is log(PA) = 3.97 + 0.4 × log(mRNA) + 10.34 × tAI − 3.35 × ER.(B) Accuracy of various linear predictors, in the case where protein and mRNA levels are generated by averaging measurements from at least two data sources. The final predictor for protein abundance obtained in this case is log(PA) = 3.47 + 0.63 × log(mRNA) + 10.89 × tAI − 2.923 × ER.(C) The Spearman correlations (y-axis) of predicted protein abundance (mRNA) with measured protein abundance levels, binned at various levels of protein abundance p (x-axis, natural log). All the correlations are higher and significant in the case of predicted protein abundance (p < 2 × 10−5), except for the lowest bin log(p) < 7.

Mentions: The two protein features yielding a significant improvement in prediction accuracy were the tRNA adaptation index (tAI) [16,17], and the evolutionary rate (ER) [18,19]. tAI is based on the synonymous codon usage bias and gene copy number of different tRNAs and is related to the codon adaptation index (CAI) [16,17]. ER measures the rate of evolution of a protein by comparing its orthologs across related species [18,19]. These two features have been shown previously to be correlated with protein abundance levels [18,20]. Combining tAI with mRNA levels increases the prediction accuracy from the levels of rs = 0.55 obtained using mRNA levels alone to a Spearman rank correlation coefficient of rs = 0.61 on the same dataset as above. Adding evolutionary rate values increases the correlation to 0.63. The incremental improvement of consecutively adding these two features to the basic linear regression protein abundance predictor is statistically significant (Figure 2 and Methods).


Determinants of protein abundance and translation efficiency in S. cerevisiae.

Tuller T, Kupiec M, Ruppin E - PLoS Comput. Biol. (2007)

Performances of the Linear Predictor of (log) Protein Abundance(A) The accuracy of various linear predictors of (log) protein abundance, measured by the Spearman rank correlation coefficient over a held-out test set, using a single data source of protein abundance [2] and mRNA levels [15]. ER values are from [19], and tAI data are taken from [20]. The numbers below the arrows denote the t-test p-values for checking the  hypothesis that the predictor with the new added feature has identical performance to its predecessor (see Methods). The final predictor for protein abundance (PA) is log(PA) = 3.97 + 0.4 × log(mRNA) + 10.34 × tAI − 3.35 × ER.(B) Accuracy of various linear predictors, in the case where protein and mRNA levels are generated by averaging measurements from at least two data sources. The final predictor for protein abundance obtained in this case is log(PA) = 3.47 + 0.63 × log(mRNA) + 10.89 × tAI − 2.923 × ER.(C) The Spearman correlations (y-axis) of predicted protein abundance (mRNA) with measured protein abundance levels, binned at various levels of protein abundance p (x-axis, natural log). All the correlations are higher and significant in the case of predicted protein abundance (p < 2 × 10−5), except for the lowest bin log(p) < 7.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2230678&req=5

pcbi-0030248-g002: Performances of the Linear Predictor of (log) Protein Abundance(A) The accuracy of various linear predictors of (log) protein abundance, measured by the Spearman rank correlation coefficient over a held-out test set, using a single data source of protein abundance [2] and mRNA levels [15]. ER values are from [19], and tAI data are taken from [20]. The numbers below the arrows denote the t-test p-values for checking the hypothesis that the predictor with the new added feature has identical performance to its predecessor (see Methods). The final predictor for protein abundance (PA) is log(PA) = 3.97 + 0.4 × log(mRNA) + 10.34 × tAI − 3.35 × ER.(B) Accuracy of various linear predictors, in the case where protein and mRNA levels are generated by averaging measurements from at least two data sources. The final predictor for protein abundance obtained in this case is log(PA) = 3.47 + 0.63 × log(mRNA) + 10.89 × tAI − 2.923 × ER.(C) The Spearman correlations (y-axis) of predicted protein abundance (mRNA) with measured protein abundance levels, binned at various levels of protein abundance p (x-axis, natural log). All the correlations are higher and significant in the case of predicted protein abundance (p < 2 × 10−5), except for the lowest bin log(p) < 7.
Mentions: The two protein features yielding a significant improvement in prediction accuracy were the tRNA adaptation index (tAI) [16,17], and the evolutionary rate (ER) [18,19]. tAI is based on the synonymous codon usage bias and gene copy number of different tRNAs and is related to the codon adaptation index (CAI) [16,17]. ER measures the rate of evolution of a protein by comparing its orthologs across related species [18,19]. These two features have been shown previously to be correlated with protein abundance levels [18,20]. Combining tAI with mRNA levels increases the prediction accuracy from the levels of rs = 0.55 obtained using mRNA levels alone to a Spearman rank correlation coefficient of rs = 0.61 on the same dataset as above. Adding evolutionary rate values increases the correlation to 0.63. The incremental improvement of consecutively adding these two features to the basic linear regression protein abundance predictor is statistically significant (Figure 2 and Methods).

Bottom Line: It attains a correlation of 0.76 with experimentally determined protein abundance levels on unseen data and successfully cross-predicts protein abundance levels in another yeast species (Schizosaccharomyces pombe).The predicted abundance levels of proteins in known S. cerevisiae complexes, and of interacting proteins, are significantly more coherent than their corresponding mRNA expression levels.Our analysis shows that in parallel to the adaptation occurring at the tRNA level via the codon bias, proteins do undergo a complementary adaptation at the amino acid level to further increase their abundance.

View Article: PubMed Central - PubMed

Affiliation: School of Computer Science, Tel Aviv University, Tel Aviv, Israel. tamirtul@post.tau.ac.il

ABSTRACT
The translation efficiency of most Saccharomyces cerevisiae genes remains fairly constant across poor and rich growth media. This observation has led us to revisit the available data and to examine the potential utility of a protein abundance predictor in reinterpreting existing mRNA expression data. Our predictor is based on large-scale data of mRNA levels, the tRNA adaptation index, and the evolutionary rate. It attains a correlation of 0.76 with experimentally determined protein abundance levels on unseen data and successfully cross-predicts protein abundance levels in another yeast species (Schizosaccharomyces pombe). The predicted abundance levels of proteins in known S. cerevisiae complexes, and of interacting proteins, are significantly more coherent than their corresponding mRNA expression levels. Analysis of gene expression measurement experiments using the predicted protein abundance levels yields new insights that are not readily discernable when clustering the corresponding mRNA expression levels. Comparing protein abundance levels across poor and rich media, we find a general trend for homeostatic regulation where transcription and translation change in a reciprocal manner. This phenomenon is more prominent near origins of replications. Our analysis shows that in parallel to the adaptation occurring at the tRNA level via the codon bias, proteins do undergo a complementary adaptation at the amino acid level to further increase their abundance.

Show MeSH