Limits...
Peak intensity prediction in MALDI-TOF mass spectrometry: a machine learning study to support quantitative proteomics.

Timm W, Scherbart A, Böcker S, Kohlbacher O, Nattkemper TW - BMC Bioinformatics (2008)

Bottom Line: Features encoding the peptides' physico-chemical properties as well as string-based features were extracted.The techniques presented here are a useful first step going beyond the binary prediction of proteotypic peptides towards a more quantitative prediction of peak intensities.These predictions in turn will turn out to be beneficial for mass spectrometry-based quantitative proteomics.

View Article: PubMed Central - HTML - PubMed

Affiliation: Applied Neuroinformatics Group, Bielefeld University, Germany. wiebke.timm@childrens.harvard.edu

ABSTRACT

Background: Mass spectrometry is a key technique in proteomics and can be used to analyze complex samples quickly. One key problem with the mass spectrometric analysis of peptides and proteins, however, is the fact that absolute quantification is severely hampered by the unclear relationship between the observed peak intensity and the peptide concentration in the sample. While there are numerous approaches to circumvent this problem experimentally (e.g. labeling techniques), reliable prediction of the peak intensities from peptide sequences could provide a peptide-specific correction factor. Thus, it would be a valuable tool towards label-free absolute quantification.

Results: In this work we present machine learning techniques for peak intensity prediction for MALDI mass spectra. Features encoding the peptides' physico-chemical properties as well as string-based features were extracted. A feature subset was obtained from multiple forward feature selections on the extracted features. Based on these features, two advanced machine learning methods (support vector regression and local linear maps) are shown to yield good results for this problem (Pearson correlation of 0.68 in a ten-fold cross validation).

Conclusion: The techniques presented here are a useful first step going beyond the binary prediction of proteotypic peptides towards a more quantitative prediction of peak intensities. These predictions in turn will turn out to be beneficial for mass spectrometry-based quantitative proteomics.

Show MeSH

Related in: MedlinePlus

Scatter plot target vs. predicted values. Prediction results for dataset A with the ν-SVR indicate that peak intensity prediction is feasible. Left: Cross-validation on dataset A. Right: Prediction using a model parameter-tuned on dataset B. r denotes the Pearson's correlation between target and predicted values. Plots for dataset B and the other feature sets are shown in additional files. A summary of all additional files showing scatter plots is presented in Table 4.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2600826&req=5

Figure 2: Scatter plot target vs. predicted values. Prediction results for dataset A with the ν-SVR indicate that peak intensity prediction is feasible. Left: Cross-validation on dataset A. Right: Prediction using a model parameter-tuned on dataset B. r denotes the Pearson's correlation between target and predicted values. Plots for dataset B and the other feature sets are shown in additional files. A summary of all additional files showing scatter plots is presented in Table 4.

Mentions: Among all combinations, the best prediction result is achieved using the ν-SVR algorithm on dataset A with mic normalization and the mono feature set (single amino acid counts), shown in Fig. 2. Here, 10-fold cross-validation yields an overall correlation of r = 0.68. In the across-dataset validation, the correlation coefficient is only slightly reduced to r = 0.66, or r = 0.61 when peptides present in both datasets were excluded. ν-SVR, using chemical (aa) or selected subset (sss) feature sets results in prediction accuracy almost as good as for the mono feature set; see Table 2, as well as additional files of A and B scatter plots referred to in Table 3. These correlations are significant and show that we can predict peak intensities with statistical learning methods.


Peak intensity prediction in MALDI-TOF mass spectrometry: a machine learning study to support quantitative proteomics.

Timm W, Scherbart A, Böcker S, Kohlbacher O, Nattkemper TW - BMC Bioinformatics (2008)

Scatter plot target vs. predicted values. Prediction results for dataset A with the ν-SVR indicate that peak intensity prediction is feasible. Left: Cross-validation on dataset A. Right: Prediction using a model parameter-tuned on dataset B. r denotes the Pearson's correlation between target and predicted values. Plots for dataset B and the other feature sets are shown in additional files. A summary of all additional files showing scatter plots is presented in Table 4.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2600826&req=5

Figure 2: Scatter plot target vs. predicted values. Prediction results for dataset A with the ν-SVR indicate that peak intensity prediction is feasible. Left: Cross-validation on dataset A. Right: Prediction using a model parameter-tuned on dataset B. r denotes the Pearson's correlation between target and predicted values. Plots for dataset B and the other feature sets are shown in additional files. A summary of all additional files showing scatter plots is presented in Table 4.
Mentions: Among all combinations, the best prediction result is achieved using the ν-SVR algorithm on dataset A with mic normalization and the mono feature set (single amino acid counts), shown in Fig. 2. Here, 10-fold cross-validation yields an overall correlation of r = 0.68. In the across-dataset validation, the correlation coefficient is only slightly reduced to r = 0.66, or r = 0.61 when peptides present in both datasets were excluded. ν-SVR, using chemical (aa) or selected subset (sss) feature sets results in prediction accuracy almost as good as for the mono feature set; see Table 2, as well as additional files of A and B scatter plots referred to in Table 3. These correlations are significant and show that we can predict peak intensities with statistical learning methods.

Bottom Line: Features encoding the peptides' physico-chemical properties as well as string-based features were extracted.The techniques presented here are a useful first step going beyond the binary prediction of proteotypic peptides towards a more quantitative prediction of peak intensities.These predictions in turn will turn out to be beneficial for mass spectrometry-based quantitative proteomics.

View Article: PubMed Central - HTML - PubMed

Affiliation: Applied Neuroinformatics Group, Bielefeld University, Germany. wiebke.timm@childrens.harvard.edu

ABSTRACT

Background: Mass spectrometry is a key technique in proteomics and can be used to analyze complex samples quickly. One key problem with the mass spectrometric analysis of peptides and proteins, however, is the fact that absolute quantification is severely hampered by the unclear relationship between the observed peak intensity and the peptide concentration in the sample. While there are numerous approaches to circumvent this problem experimentally (e.g. labeling techniques), reliable prediction of the peak intensities from peptide sequences could provide a peptide-specific correction factor. Thus, it would be a valuable tool towards label-free absolute quantification.

Results: In this work we present machine learning techniques for peak intensity prediction for MALDI mass spectra. Features encoding the peptides' physico-chemical properties as well as string-based features were extracted. A feature subset was obtained from multiple forward feature selections on the extracted features. Based on these features, two advanced machine learning methods (support vector regression and local linear maps) are shown to yield good results for this problem (Pearson correlation of 0.68 in a ten-fold cross validation).

Conclusion: The techniques presented here are a useful first step going beyond the binary prediction of proteotypic peptides towards a more quantitative prediction of peak intensities. These predictions in turn will turn out to be beneficial for mass spectrometry-based quantitative proteomics.

Show MeSH
Related in: MedlinePlus