Limits...
Estimation of absolute protein quantities of unlabeled samples by selected reaction monitoring mass spectrometry.

Ludwig C, Claassen M, Schmidt A, Aebersold R - Mol. Cell Proteomics (2011)

Bottom Line: We found that a linear model based on the two most intense transitions of the three best flying peptides per proteins (TopPep3/TopTra2) generated optimal results with a cross-correlated mean fold error of 1.8 and a squared Pearson coefficient R(2) of 0.88.Applying the optimized model to lysates of the microbe Leptospira interrogans, we detected significant protein abundance changes of 39 target proteins upon antibiotic treatment, which correlate well with literature values.The described method is generally applicable and exploits the inherent performance advantages of SRM, such as high sensitivity, selectivity, reproducibility, and dynamic range, and estimates absolute protein concentrations of selected proteins at minimized costs.

View Article: PubMed Central - PubMed

Affiliation: Department of Biology, Institute of Molecular Systems Biology, ETH Zürich, 8093 Zürich, Switzerland.

ABSTRACT
For many research questions in modern molecular and systems biology, information about absolute protein quantities is imperative. This information includes, for example, kinetic modeling of processes, protein turnover determinations, stoichiometric investigations of protein complexes, or quantitative comparisons of different proteins within one sample or across samples. To date, the vast majority of proteomic studies are limited to providing relative quantitative comparisons of protein levels between limited numbers of samples. Here we describe and demonstrate the utility of a targeting MS technique for the estimation of absolute protein abundance in unlabeled and nonfractionated cell lysates. The method is based on selected reaction monitoring (SRM) mass spectrometry and the "best flyer" hypothesis, which assumes that the specific MS signal intensity of the most intense tryptic peptides per protein is approximately constant throughout a whole proteome. SRM-targeted best flyer peptides were selected for each protein from the peptide precursor ion signal intensities from directed MS data. The most intense transitions per peptide were selected from full MS/MS scans of crude synthetic analogs. We used Monte Carlo cross-validation to systematically investigate the accuracy of the technique as a function of the number of measured best flyer peptides and the number of SRM transitions per peptide. We found that a linear model based on the two most intense transitions of the three best flying peptides per proteins (TopPep3/TopTra2) generated optimal results with a cross-correlated mean fold error of 1.8 and a squared Pearson coefficient R(2) of 0.88. Applying the optimized model to lysates of the microbe Leptospira interrogans, we detected significant protein abundance changes of 39 target proteins upon antibiotic treatment, which correlate well with literature values. The described method is generally applicable and exploits the inherent performance advantages of SRM, such as high sensitivity, selectivity, reproducibility, and dynamic range, and estimates absolute protein concentrations of selected proteins at minimized costs.

Show MeSH

Related in: MedlinePlus

Model selection and accuracy estimation using Monte Carlo cross-validation.A, heat map visualization of the predictive measurement accuracy, represented by the cross-validated mean fold error, applying different models based on varying peptide and transition counts. Each square represents one particular linear model, which considers a specific number of summed best flyer peptides and most intense transitions, as annotated by the axes. Ranking of peptides and transitions was performed based on decreasing signal intensity. B and C, prediction error histograms for the linear models considering either the single best flying peptide per protein (TopPep1/TopTra6) or the summed intensity of the three best flying peptides (TopPep3/TopTra2). D and E, linear regression curves for the two models TopPep1/TopTra6 and TopPep3/TopTra2, respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3316728&req=5

Figure 3: Model selection and accuracy estimation using Monte Carlo cross-validation.A, heat map visualization of the predictive measurement accuracy, represented by the cross-validated mean fold error, applying different models based on varying peptide and transition counts. Each square represents one particular linear model, which considers a specific number of summed best flyer peptides and most intense transitions, as annotated by the axes. Ranking of peptides and transitions was performed based on decreasing signal intensity. B and C, prediction error histograms for the linear models considering either the single best flying peptide per protein (TopPep1/TopTra6) or the summed intensity of the three best flying peptides (TopPep3/TopTra2). D and E, linear regression curves for the two models TopPep1/TopTra6 and TopPep3/TopTra2, respectively.

Mentions: The thus determined protein intensities were log-transformed, and each was plotted against the determined log-transformed absolute protein quantities. Subsequently linear regression was performed. To assess the ability of each linear fit to estimate absolute protein abundances on new data, we determined the expected fold errors by Monte Carlo cross-validation (see “Experimental Procedures”). This analysis revealed that the best prediction accuracy could be obtained by considering only the best flying peptide per protein, i.e. mean fold errors increased with the number of summed peptides per protein (Fig. 3A). Furthermore, summing of the two most intense transitions per peptide led to improved abundance predictions, regardless of the number of considered peptides (Fig. 3A). The statistically most accurate model considered the summed signal intensities of the six most intense transitions of the best flying peptide per protein (TopPep1/TopTra6; Fig. 3B), showing an estimated mean fold error of 1.76. However, performance differences across all peptide and transition combinations tested were small (mean fold error ranging from 1.76 to 2.03). Specifically, the model TopPep3/TopTra2 predicted absolute protein abundances with a mean fold error of 1.83 and a maximal detected error of 4.5-fold (Fig. 3C). The linear calibration curves from TopPep1/TopTra6 and TopPep3/TopTra2 were highly similar (squared Pearson coefficient R2 = 0.90 and 0.88, respectively; compare Fig. 3 (D and E)), and estimated absolute protein abundances differed in average by only 4%. This indicates that several combinations of best flyer peptides per protein and transition signals per peptide showed a reasonable and robust ability to predict absolute protein quantities from SRM data sets. Finally, we selected TopPep3/TopTra2 as the model of choice for further analysis, because the estimation of protein abundance based on three independent peptide measures per protein is beneficial, because this represents a less sensitive model toward peptide outlier values, which is an especially important issue when working within complex biological samples.


Estimation of absolute protein quantities of unlabeled samples by selected reaction monitoring mass spectrometry.

Ludwig C, Claassen M, Schmidt A, Aebersold R - Mol. Cell Proteomics (2011)

Model selection and accuracy estimation using Monte Carlo cross-validation.A, heat map visualization of the predictive measurement accuracy, represented by the cross-validated mean fold error, applying different models based on varying peptide and transition counts. Each square represents one particular linear model, which considers a specific number of summed best flyer peptides and most intense transitions, as annotated by the axes. Ranking of peptides and transitions was performed based on decreasing signal intensity. B and C, prediction error histograms for the linear models considering either the single best flying peptide per protein (TopPep1/TopTra6) or the summed intensity of the three best flying peptides (TopPep3/TopTra2). D and E, linear regression curves for the two models TopPep1/TopTra6 and TopPep3/TopTra2, respectively.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3316728&req=5

Figure 3: Model selection and accuracy estimation using Monte Carlo cross-validation.A, heat map visualization of the predictive measurement accuracy, represented by the cross-validated mean fold error, applying different models based on varying peptide and transition counts. Each square represents one particular linear model, which considers a specific number of summed best flyer peptides and most intense transitions, as annotated by the axes. Ranking of peptides and transitions was performed based on decreasing signal intensity. B and C, prediction error histograms for the linear models considering either the single best flying peptide per protein (TopPep1/TopTra6) or the summed intensity of the three best flying peptides (TopPep3/TopTra2). D and E, linear regression curves for the two models TopPep1/TopTra6 and TopPep3/TopTra2, respectively.
Mentions: The thus determined protein intensities were log-transformed, and each was plotted against the determined log-transformed absolute protein quantities. Subsequently linear regression was performed. To assess the ability of each linear fit to estimate absolute protein abundances on new data, we determined the expected fold errors by Monte Carlo cross-validation (see “Experimental Procedures”). This analysis revealed that the best prediction accuracy could be obtained by considering only the best flying peptide per protein, i.e. mean fold errors increased with the number of summed peptides per protein (Fig. 3A). Furthermore, summing of the two most intense transitions per peptide led to improved abundance predictions, regardless of the number of considered peptides (Fig. 3A). The statistically most accurate model considered the summed signal intensities of the six most intense transitions of the best flying peptide per protein (TopPep1/TopTra6; Fig. 3B), showing an estimated mean fold error of 1.76. However, performance differences across all peptide and transition combinations tested were small (mean fold error ranging from 1.76 to 2.03). Specifically, the model TopPep3/TopTra2 predicted absolute protein abundances with a mean fold error of 1.83 and a maximal detected error of 4.5-fold (Fig. 3C). The linear calibration curves from TopPep1/TopTra6 and TopPep3/TopTra2 were highly similar (squared Pearson coefficient R2 = 0.90 and 0.88, respectively; compare Fig. 3 (D and E)), and estimated absolute protein abundances differed in average by only 4%. This indicates that several combinations of best flyer peptides per protein and transition signals per peptide showed a reasonable and robust ability to predict absolute protein quantities from SRM data sets. Finally, we selected TopPep3/TopTra2 as the model of choice for further analysis, because the estimation of protein abundance based on three independent peptide measures per protein is beneficial, because this represents a less sensitive model toward peptide outlier values, which is an especially important issue when working within complex biological samples.

Bottom Line: We found that a linear model based on the two most intense transitions of the three best flying peptides per proteins (TopPep3/TopTra2) generated optimal results with a cross-correlated mean fold error of 1.8 and a squared Pearson coefficient R(2) of 0.88.Applying the optimized model to lysates of the microbe Leptospira interrogans, we detected significant protein abundance changes of 39 target proteins upon antibiotic treatment, which correlate well with literature values.The described method is generally applicable and exploits the inherent performance advantages of SRM, such as high sensitivity, selectivity, reproducibility, and dynamic range, and estimates absolute protein concentrations of selected proteins at minimized costs.

View Article: PubMed Central - PubMed

Affiliation: Department of Biology, Institute of Molecular Systems Biology, ETH Zürich, 8093 Zürich, Switzerland.

ABSTRACT
For many research questions in modern molecular and systems biology, information about absolute protein quantities is imperative. This information includes, for example, kinetic modeling of processes, protein turnover determinations, stoichiometric investigations of protein complexes, or quantitative comparisons of different proteins within one sample or across samples. To date, the vast majority of proteomic studies are limited to providing relative quantitative comparisons of protein levels between limited numbers of samples. Here we describe and demonstrate the utility of a targeting MS technique for the estimation of absolute protein abundance in unlabeled and nonfractionated cell lysates. The method is based on selected reaction monitoring (SRM) mass spectrometry and the "best flyer" hypothesis, which assumes that the specific MS signal intensity of the most intense tryptic peptides per protein is approximately constant throughout a whole proteome. SRM-targeted best flyer peptides were selected for each protein from the peptide precursor ion signal intensities from directed MS data. The most intense transitions per peptide were selected from full MS/MS scans of crude synthetic analogs. We used Monte Carlo cross-validation to systematically investigate the accuracy of the technique as a function of the number of measured best flyer peptides and the number of SRM transitions per peptide. We found that a linear model based on the two most intense transitions of the three best flying peptides per proteins (TopPep3/TopTra2) generated optimal results with a cross-correlated mean fold error of 1.8 and a squared Pearson coefficient R(2) of 0.88. Applying the optimized model to lysates of the microbe Leptospira interrogans, we detected significant protein abundance changes of 39 target proteins upon antibiotic treatment, which correlate well with literature values. The described method is generally applicable and exploits the inherent performance advantages of SRM, such as high sensitivity, selectivity, reproducibility, and dynamic range, and estimates absolute protein concentrations of selected proteins at minimized costs.

Show MeSH
Related in: MedlinePlus