Limits...
Modeling peptide fragmentation with dynamic Bayesian networks for peptide identification.

Klammer AA, Reynolds SM, Bilmes JA, MacCoss MJ, Noble WS - Bioinformatics (2008)

Bottom Line: We train a set of DBNs on high-confidence peptide-spectrum matches.Using Riptide in this way yields improved discrimination when compared to other state-of-the-art MS/MS identification algorithms, increasing the number of positive identifications by as much as 12% at a 1% false discovery rate.Python and C source code are available upon request from the authors.

View Article: PubMed Central - PubMed

Affiliation: Department of Genome Sciences, University of Washington, Seattle, WA, USA.

ABSTRACT

Motivation: Tandem mass spectrometry (MS/MS) is an indispensable technology for identification of proteins from complex mixtures. Proteins are digested to peptides that are then identified by their fragmentation patterns in the mass spectrometer. Thus, at its core, MS/MS protein identification relies on the relative predictability of peptide fragmentation. Unfortunately, peptide fragmentation is complex and not fully understood, and what is understood is not always exploited by peptide identification algorithms.

Results: We use a hybrid dynamic Bayesian network (DBN)/support vector machine (SVM) approach to address these two problems. We train a set of DBNs on high-confidence peptide-spectrum matches. These DBNs, known collectively as Riptide, comprise a probabilistic model of peptide fragmentation chemistry. Examination of the distributions learned by Riptide allows identification of new trends, such as prevalent a-ion fragmentation at peptide cleavage sites C-term to hydrophobic residues. In addition, Riptide can be used to produce likelihood scores that indicate whether a given peptide-spectrum match is correct. A vector of such scores is evaluated by an SVM, which produces a final score to be used in peptide identification. Using Riptide in this way yields improved discrimination when compared to other state-of-the-art MS/MS identification algorithms, increasing the number of positive identifications by as much as 12% at a 1% false discovery rate.

Availability: Python and C source code are available upon request from the authors. The curated training sets are available at http://noble.gs.washington.edu/proj/intense/. The Graphical Model Tool Kit (GMTK) is freely available at http://ssli.ee.washington.edu/bilmes/gmtk.

Show MeSH

Related in: MedlinePlus

Experimental overview. We start with a collection of high-confidence PSMs. These training PSMs are used to train the Riptide model, which consists of a collection of DBNs that model the probability distributions governing peptide fragment ion intensities. Riptide is used to evaluate testing PSMs to produce a vector of features for each PSM, each feature related to a probability assigned to the PSM by one of the Riptide DBNs. Finally, these feature vectors can be analyzed by additional algorithms (such as SVMs) to produce scores for the test PSMs.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2665034&req=5

Figure 1: Experimental overview. We start with a collection of high-confidence PSMs. These training PSMs are used to train the Riptide model, which consists of a collection of DBNs that model the probability distributions governing peptide fragment ion intensities. Riptide is used to evaluate testing PSMs to produce a vector of features for each PSM, each feature related to a probability assigned to the PSM by one of the Riptide DBNs. Finally, these feature vectors can be analyzed by additional algorithms (such as SVMs) to produce scores for the test PSMs.

Mentions: Although the details of the Riptide model are complex, the inputs to and outputs from the Riptide training and testing procedure are quite simple (Fig. 1). We start with a collection of high-confidence PSMs generated as described in Section 3.2. These PSMs are used to train the Riptide model, which consists of a collection of DBNs that model the probability distributions governing peptide fragment ion intensities. The resulting Riptide model is then evaluated on a set of test PSMs, generating for each PSM a feature vector of probabilities. These vectors can then be used as input to analysis software, assigning scores to the PSMs. Examples of analysis software include support vector machines (SVMs) or the semi-supervised learning algorithm Percolator of Käll et al. (2007) (Section 4).Fig. 1.


Modeling peptide fragmentation with dynamic Bayesian networks for peptide identification.

Klammer AA, Reynolds SM, Bilmes JA, MacCoss MJ, Noble WS - Bioinformatics (2008)

Experimental overview. We start with a collection of high-confidence PSMs. These training PSMs are used to train the Riptide model, which consists of a collection of DBNs that model the probability distributions governing peptide fragment ion intensities. Riptide is used to evaluate testing PSMs to produce a vector of features for each PSM, each feature related to a probability assigned to the PSM by one of the Riptide DBNs. Finally, these feature vectors can be analyzed by additional algorithms (such as SVMs) to produce scores for the test PSMs.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2665034&req=5

Figure 1: Experimental overview. We start with a collection of high-confidence PSMs. These training PSMs are used to train the Riptide model, which consists of a collection of DBNs that model the probability distributions governing peptide fragment ion intensities. Riptide is used to evaluate testing PSMs to produce a vector of features for each PSM, each feature related to a probability assigned to the PSM by one of the Riptide DBNs. Finally, these feature vectors can be analyzed by additional algorithms (such as SVMs) to produce scores for the test PSMs.
Mentions: Although the details of the Riptide model are complex, the inputs to and outputs from the Riptide training and testing procedure are quite simple (Fig. 1). We start with a collection of high-confidence PSMs generated as described in Section 3.2. These PSMs are used to train the Riptide model, which consists of a collection of DBNs that model the probability distributions governing peptide fragment ion intensities. The resulting Riptide model is then evaluated on a set of test PSMs, generating for each PSM a feature vector of probabilities. These vectors can then be used as input to analysis software, assigning scores to the PSMs. Examples of analysis software include support vector machines (SVMs) or the semi-supervised learning algorithm Percolator of Käll et al. (2007) (Section 4).Fig. 1.

Bottom Line: We train a set of DBNs on high-confidence peptide-spectrum matches.Using Riptide in this way yields improved discrimination when compared to other state-of-the-art MS/MS identification algorithms, increasing the number of positive identifications by as much as 12% at a 1% false discovery rate.Python and C source code are available upon request from the authors.

View Article: PubMed Central - PubMed

Affiliation: Department of Genome Sciences, University of Washington, Seattle, WA, USA.

ABSTRACT

Motivation: Tandem mass spectrometry (MS/MS) is an indispensable technology for identification of proteins from complex mixtures. Proteins are digested to peptides that are then identified by their fragmentation patterns in the mass spectrometer. Thus, at its core, MS/MS protein identification relies on the relative predictability of peptide fragmentation. Unfortunately, peptide fragmentation is complex and not fully understood, and what is understood is not always exploited by peptide identification algorithms.

Results: We use a hybrid dynamic Bayesian network (DBN)/support vector machine (SVM) approach to address these two problems. We train a set of DBNs on high-confidence peptide-spectrum matches. These DBNs, known collectively as Riptide, comprise a probabilistic model of peptide fragmentation chemistry. Examination of the distributions learned by Riptide allows identification of new trends, such as prevalent a-ion fragmentation at peptide cleavage sites C-term to hydrophobic residues. In addition, Riptide can be used to produce likelihood scores that indicate whether a given peptide-spectrum match is correct. A vector of such scores is evaluated by an SVM, which produces a final score to be used in peptide identification. Using Riptide in this way yields improved discrimination when compared to other state-of-the-art MS/MS identification algorithms, increasing the number of positive identifications by as much as 12% at a 1% false discovery rate.

Availability: Python and C source code are available upon request from the authors. The curated training sets are available at http://noble.gs.washington.edu/proj/intense/. The Graphical Model Tool Kit (GMTK) is freely available at http://ssli.ee.washington.edu/bilmes/gmtk.

Show MeSH
Related in: MedlinePlus