Limits...
Modeling peptide fragmentation with dynamic Bayesian networks for peptide identification.

Klammer AA, Reynolds SM, Bilmes JA, MacCoss MJ, Noble WS - Bioinformatics (2008)

Bottom Line: We train a set of DBNs on high-confidence peptide-spectrum matches.Using Riptide in this way yields improved discrimination when compared to other state-of-the-art MS/MS identification algorithms, increasing the number of positive identifications by as much as 12% at a 1% false discovery rate.Python and C source code are available upon request from the authors.

View Article: PubMed Central - PubMed

Affiliation: Department of Genome Sciences, University of Washington, Seattle, WA, USA.

ABSTRACT

Motivation: Tandem mass spectrometry (MS/MS) is an indispensable technology for identification of proteins from complex mixtures. Proteins are digested to peptides that are then identified by their fragmentation patterns in the mass spectrometer. Thus, at its core, MS/MS protein identification relies on the relative predictability of peptide fragmentation. Unfortunately, peptide fragmentation is complex and not fully understood, and what is understood is not always exploited by peptide identification algorithms.

Results: We use a hybrid dynamic Bayesian network (DBN)/support vector machine (SVM) approach to address these two problems. We train a set of DBNs on high-confidence peptide-spectrum matches. These DBNs, known collectively as Riptide, comprise a probabilistic model of peptide fragmentation chemistry. Examination of the distributions learned by Riptide allows identification of new trends, such as prevalent a-ion fragmentation at peptide cleavage sites C-term to hydrophobic residues. In addition, Riptide can be used to produce likelihood scores that indicate whether a given peptide-spectrum match is correct. A vector of such scores is evaluated by an SVM, which produces a final score to be used in peptide identification. Using Riptide in this way yields improved discrimination when compared to other state-of-the-art MS/MS identification algorithms, increasing the number of positive identifications by as much as 12% at a 1% false discovery rate.

Availability: Python and C source code are available upon request from the authors. The curated training sets are available at http://noble.gs.washington.edu/proj/intense/. The Graphical Model Tool Kit (GMTK) is freely available at http://ssli.ee.washington.edu/bilmes/gmtk.

Show MeSH

Related in: MedlinePlus

Positive peptide identifications as a function of q -value (a measure of FDR). The Riptide scoring function is compared with the SEQUEST scoring function Xcorr, to test the utility of the SVM normalized discriminant score function (A). In addition, the Riptide DBN feature vectors are used as input to the algorithm Percolator (Käll et al., 2007), and are compared with the original Percolator features (B).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2665034&req=5

Figure 4: Positive peptide identifications as a function of q -value (a measure of FDR). The Riptide scoring function is compared with the SEQUEST scoring function Xcorr, to test the utility of the SVM normalized discriminant score function (A). In addition, the Riptide DBN feature vectors are used as input to the algorithm Percolator (Käll et al., 2007), and are compared with the original Percolator features (B).

Mentions: Figure 4A compares the performance of Riptide+SVM with the performance of XCorr, the score function used by SEQUEST [re-implemented in the software package Crux (C.Y.Park et al., In Press)]. To generate thefigure, we searched each spectrum in the test set against a shuffled decoy version of the same protein sequence database (Klammer et al., 2007). We use the number of matches to the decoy database at a particular score threshold to estimate the rate of false identifications among the target PSMs (Käll et al., 2008). For each PSM, we then compute a q value, which is defined as the minimal FDR threshold at which the PSM is deemed significant (Storey and Tibshirani, 2003). Each series in the figure plots the number of target PSMs identified as a function of q-value threshold. We selected this mode of evaluation because it closely matches the goal of the typical mass spectrometrist: identifying the largest number of peptides with the lowest rate of false identifications. Riptide with the static SVM outperforms SEQUEST by 10.8% at a 1% FDR. In this experiment, the Riptide DBNs failed on many short (length seven or less) peptides, so they are not included in the analysis. If these peptides are included, performance deteriorates dramatically.Fig. 4.


Modeling peptide fragmentation with dynamic Bayesian networks for peptide identification.

Klammer AA, Reynolds SM, Bilmes JA, MacCoss MJ, Noble WS - Bioinformatics (2008)

Positive peptide identifications as a function of q -value (a measure of FDR). The Riptide scoring function is compared with the SEQUEST scoring function Xcorr, to test the utility of the SVM normalized discriminant score function (A). In addition, the Riptide DBN feature vectors are used as input to the algorithm Percolator (Käll et al., 2007), and are compared with the original Percolator features (B).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2665034&req=5

Figure 4: Positive peptide identifications as a function of q -value (a measure of FDR). The Riptide scoring function is compared with the SEQUEST scoring function Xcorr, to test the utility of the SVM normalized discriminant score function (A). In addition, the Riptide DBN feature vectors are used as input to the algorithm Percolator (Käll et al., 2007), and are compared with the original Percolator features (B).
Mentions: Figure 4A compares the performance of Riptide+SVM with the performance of XCorr, the score function used by SEQUEST [re-implemented in the software package Crux (C.Y.Park et al., In Press)]. To generate thefigure, we searched each spectrum in the test set against a shuffled decoy version of the same protein sequence database (Klammer et al., 2007). We use the number of matches to the decoy database at a particular score threshold to estimate the rate of false identifications among the target PSMs (Käll et al., 2008). For each PSM, we then compute a q value, which is defined as the minimal FDR threshold at which the PSM is deemed significant (Storey and Tibshirani, 2003). Each series in the figure plots the number of target PSMs identified as a function of q-value threshold. We selected this mode of evaluation because it closely matches the goal of the typical mass spectrometrist: identifying the largest number of peptides with the lowest rate of false identifications. Riptide with the static SVM outperforms SEQUEST by 10.8% at a 1% FDR. In this experiment, the Riptide DBNs failed on many short (length seven or less) peptides, so they are not included in the analysis. If these peptides are included, performance deteriorates dramatically.Fig. 4.

Bottom Line: We train a set of DBNs on high-confidence peptide-spectrum matches.Using Riptide in this way yields improved discrimination when compared to other state-of-the-art MS/MS identification algorithms, increasing the number of positive identifications by as much as 12% at a 1% false discovery rate.Python and C source code are available upon request from the authors.

View Article: PubMed Central - PubMed

Affiliation: Department of Genome Sciences, University of Washington, Seattle, WA, USA.

ABSTRACT

Motivation: Tandem mass spectrometry (MS/MS) is an indispensable technology for identification of proteins from complex mixtures. Proteins are digested to peptides that are then identified by their fragmentation patterns in the mass spectrometer. Thus, at its core, MS/MS protein identification relies on the relative predictability of peptide fragmentation. Unfortunately, peptide fragmentation is complex and not fully understood, and what is understood is not always exploited by peptide identification algorithms.

Results: We use a hybrid dynamic Bayesian network (DBN)/support vector machine (SVM) approach to address these two problems. We train a set of DBNs on high-confidence peptide-spectrum matches. These DBNs, known collectively as Riptide, comprise a probabilistic model of peptide fragmentation chemistry. Examination of the distributions learned by Riptide allows identification of new trends, such as prevalent a-ion fragmentation at peptide cleavage sites C-term to hydrophobic residues. In addition, Riptide can be used to produce likelihood scores that indicate whether a given peptide-spectrum match is correct. A vector of such scores is evaluated by an SVM, which produces a final score to be used in peptide identification. Using Riptide in this way yields improved discrimination when compared to other state-of-the-art MS/MS identification algorithms, increasing the number of positive identifications by as much as 12% at a 1% false discovery rate.

Availability: Python and C source code are available upon request from the authors. The curated training sets are available at http://noble.gs.washington.edu/proj/intense/. The Graphical Model Tool Kit (GMTK) is freely available at http://ssli.ee.washington.edu/bilmes/gmtk.

Show MeSH
Related in: MedlinePlus