Limits...
Peak intensity prediction in MALDI-TOF mass spectrometry: a machine learning study to support quantitative proteomics.

Timm W, Scherbart A, Böcker S, Kohlbacher O, Nattkemper TW - BMC Bioinformatics (2008)

Bottom Line: Features encoding the peptides' physico-chemical properties as well as string-based features were extracted.The techniques presented here are a useful first step going beyond the binary prediction of proteotypic peptides towards a more quantitative prediction of peak intensities.These predictions in turn will turn out to be beneficial for mass spectrometry-based quantitative proteomics.

View Article: PubMed Central - HTML - PubMed

Affiliation: Applied Neuroinformatics Group, Bielefeld University, Germany. wiebke.timm@childrens.harvard.edu

ABSTRACT

Background: Mass spectrometry is a key technique in proteomics and can be used to analyze complex samples quickly. One key problem with the mass spectrometric analysis of peptides and proteins, however, is the fact that absolute quantification is severely hampered by the unclear relationship between the observed peak intensity and the peptide concentration in the sample. While there are numerous approaches to circumvent this problem experimentally (e.g. labeling techniques), reliable prediction of the peak intensities from peptide sequences could provide a peptide-specific correction factor. Thus, it would be a valuable tool towards label-free absolute quantification.

Results: In this work we present machine learning techniques for peak intensity prediction for MALDI mass spectra. Features encoding the peptides' physico-chemical properties as well as string-based features were extracted. A feature subset was obtained from multiple forward feature selections on the extracted features. Based on these features, two advanced machine learning methods (support vector regression and local linear maps) are shown to yield good results for this problem (Pearson correlation of 0.68 in a ten-fold cross validation).

Conclusion: The techniques presented here are a useful first step going beyond the binary prediction of proteotypic peptides towards a more quantitative prediction of peak intensities. These predictions in turn will turn out to be beneficial for mass spectrometry-based quantitative proteomics.

Show MeSH

Related in: MedlinePlus

Feature importance. Plot of percentage increase of the prediction error if the corresponding feature is randomly permuted, using random forests for regression [42]. Of all features in the sss feature set, the relative population of conformational state E (VASM830103, [38]), the estimated gas-phase basicity (GB500, [36]), and the theoretical mass lead to the highest increase of the error if the peptide's values are permuted. The number of positive charges (FAUJ880111, [41]) and the number of glutamine residues (Q) are rated the least important features.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2600826&req=5

Figure 5: Feature importance. Plot of percentage increase of the prediction error if the corresponding feature is randomly permuted, using random forests for regression [42]. Of all features in the sss feature set, the relative population of conformational state E (VASM830103, [38]), the estimated gas-phase basicity (GB500, [36]), and the theoretical mass lead to the highest increase of the error if the peptide's values are permuted. The number of positive charges (FAUJ880111, [41]) and the number of glutamine residues (Q) are rated the least important features.

Mentions: We access the importance of the single features that constitute the final sss feature set using random forests for regression [42,43]. Fig. 5 visualizes the percentage increase of the prediction error if values of the corresponding feature are randomly permuted. According to this, VASM830103 is the most important feature, followed by GB500 and the peptide's theoretical mass.


Peak intensity prediction in MALDI-TOF mass spectrometry: a machine learning study to support quantitative proteomics.

Timm W, Scherbart A, Böcker S, Kohlbacher O, Nattkemper TW - BMC Bioinformatics (2008)

Feature importance. Plot of percentage increase of the prediction error if the corresponding feature is randomly permuted, using random forests for regression [42]. Of all features in the sss feature set, the relative population of conformational state E (VASM830103, [38]), the estimated gas-phase basicity (GB500, [36]), and the theoretical mass lead to the highest increase of the error if the peptide's values are permuted. The number of positive charges (FAUJ880111, [41]) and the number of glutamine residues (Q) are rated the least important features.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2600826&req=5

Figure 5: Feature importance. Plot of percentage increase of the prediction error if the corresponding feature is randomly permuted, using random forests for regression [42]. Of all features in the sss feature set, the relative population of conformational state E (VASM830103, [38]), the estimated gas-phase basicity (GB500, [36]), and the theoretical mass lead to the highest increase of the error if the peptide's values are permuted. The number of positive charges (FAUJ880111, [41]) and the number of glutamine residues (Q) are rated the least important features.
Mentions: We access the importance of the single features that constitute the final sss feature set using random forests for regression [42,43]. Fig. 5 visualizes the percentage increase of the prediction error if values of the corresponding feature are randomly permuted. According to this, VASM830103 is the most important feature, followed by GB500 and the peptide's theoretical mass.

Bottom Line: Features encoding the peptides' physico-chemical properties as well as string-based features were extracted.The techniques presented here are a useful first step going beyond the binary prediction of proteotypic peptides towards a more quantitative prediction of peak intensities.These predictions in turn will turn out to be beneficial for mass spectrometry-based quantitative proteomics.

View Article: PubMed Central - HTML - PubMed

Affiliation: Applied Neuroinformatics Group, Bielefeld University, Germany. wiebke.timm@childrens.harvard.edu

ABSTRACT

Background: Mass spectrometry is a key technique in proteomics and can be used to analyze complex samples quickly. One key problem with the mass spectrometric analysis of peptides and proteins, however, is the fact that absolute quantification is severely hampered by the unclear relationship between the observed peak intensity and the peptide concentration in the sample. While there are numerous approaches to circumvent this problem experimentally (e.g. labeling techniques), reliable prediction of the peak intensities from peptide sequences could provide a peptide-specific correction factor. Thus, it would be a valuable tool towards label-free absolute quantification.

Results: In this work we present machine learning techniques for peak intensity prediction for MALDI mass spectra. Features encoding the peptides' physico-chemical properties as well as string-based features were extracted. A feature subset was obtained from multiple forward feature selections on the extracted features. Based on these features, two advanced machine learning methods (support vector regression and local linear maps) are shown to yield good results for this problem (Pearson correlation of 0.68 in a ten-fold cross validation).

Conclusion: The techniques presented here are a useful first step going beyond the binary prediction of proteotypic peptides towards a more quantitative prediction of peak intensities. These predictions in turn will turn out to be beneficial for mass spectrometry-based quantitative proteomics.

Show MeSH
Related in: MedlinePlus