Limits...
Enhanced peptide identification by electron transfer dissociation using an improved Mascot Percolator.

Wright JC, Collins MO, Yu L, Käll L, Brosch M, Choudhary JS - Mol. Cell Proteomics (2012)

Bottom Line: We have previously interfaced the Mascot search engine with Percolator and demonstrated sensitivity and specificity benefits with CID data.Using a data set of CID and ETcaD spectral pairs, we find that at a 1% false discovery rate, the overlap in peptide identifications by CID and ETD is 83%, which is significantly higher than that obtained using either stand-alone Mascot (69%) or OMSSA (39%).We conclude that Mascot Percolator is a highly sensitive and accurate post-search algorithm for peptide identification and allows direct comparison of peptide identifications using multiple alternative fragmentation techniques.

View Article: PubMed Central - PubMed

Affiliation: Proteomic Mass Spectrometry, Wellcome Trust Sanger Institute, Hinxton, Cambridge.

ABSTRACT
Peptide identification using tandem mass spectrometry is a core technology in proteomics. Latest generations of mass spectrometry instruments enable the use of electron transfer dissociation (ETD) to complement collision induced dissociation (CID) for peptide fragmentation. However, a critical limitation to the use of ETD has been optimal database search software. Percolator is a post-search algorithm, which uses semi-supervised machine learning to improve the rate of peptide spectrum identifications (PSMs) together with providing reliable significance measures. We have previously interfaced the Mascot search engine with Percolator and demonstrated sensitivity and specificity benefits with CID data. Here, we report recent developments in the Mascot Percolator V2.0 software including an improved feature calculator and support for a wider range of ion series. The updated software is applied to the analysis of several CID and ETD fragmented peptide data sets. This version of Mascot Percolator increases the number of CID PSMs by up to 80% and ETD PSMs by up to 60% at a 0.01 q-value (1% false discovery rate) threshold over a standard Mascot search, notably recovering PSMs from high charge state precursor ions. The greatly increased number of PSMs and peptide coverage afforded by Mascot Percolator has enabled a fuller assessment of CID/ETD complementarity to be performed. Using a data set of CID and ETcaD spectral pairs, we find that at a 1% false discovery rate, the overlap in peptide identifications by CID and ETD is 83%, which is significantly higher than that obtained using either stand-alone Mascot (69%) or OMSSA (39%). We conclude that Mascot Percolator is a highly sensitive and accurate post-search algorithm for peptide identification and allows direct comparison of peptide identifications using multiple alternative fragmentation techniques.

Show MeSH

Related in: MedlinePlus

False discovery rate validation—Two statistical analyses to evaluated the accuracy of values reported by Mascot Percolator. A, This log based graph plots the reported q-values from Mascot Percolator against the FDR estimated from a bipartite database search. Both the CID and ETcaD data sets show good consistency between the estimated FDR and p values with the majority of deviation from y = x occurring below the 0.01 p value threshold. The dotted lines represent y = 2x and y = x/2. B, This is a QQ plot of the observed Mascot Percolator  value (entrapment PSM) p values against a theoretical uniform p value distribution.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3412976&req=5

Figure 2: False discovery rate validation—Two statistical analyses to evaluated the accuracy of values reported by Mascot Percolator. A, This log based graph plots the reported q-values from Mascot Percolator against the FDR estimated from a bipartite database search. Both the CID and ETcaD data sets show good consistency between the estimated FDR and p values with the majority of deviation from y = x occurring below the 0.01 p value threshold. The dotted lines represent y = 2x and y = x/2. B, This is a QQ plot of the observed Mascot Percolator value (entrapment PSM) p values against a theoretical uniform p value distribution.

Mentions: To complete the validation, spectra from these UPS experiments were then searched against a bipartite database (34) containing only the IPI sequences for the 48 proteins in the standard, plus common contaminates. These selected sequences are concatenated with 10 times that number of shuffled entrapment protein sequences. The resulting PSMs from this database were filtered and hits to the entrapment proteins used to estimate false positives over a range of Mascot Percolator q-values. Fig. 2 depicts the plotted q-value and FDR estimates from Mascot Percolator for both the CID and ETcaD UPS data on a log scale. Employing a two-sample Kolmogorov-Smirnov (K-S) test to the bipartite database FDR estimates and the Mascot Percolator q-values, we achieve a maximum difference of 0.02 for the CID data and 0.13 for the ETcaD data. The slightly higher value for the ETcaD data is likely because of the much smaller data set size, also if we examine the plot in Fig. 2 we can see that the majority of the difference between the estimated FDR and Mascot Percolator's q-values is for PSMs below the typical 1% false discovery rate threshold. If we perform the K-S test on the same CID data processed using Mascot Percolator v1.09 the maximum difference is 0.03, reflective of a consistent performance between the different versions. This comparison cannot be done for the ETD data as that version of the software could only generate features for b and y ion series.


Enhanced peptide identification by electron transfer dissociation using an improved Mascot Percolator.

Wright JC, Collins MO, Yu L, Käll L, Brosch M, Choudhary JS - Mol. Cell Proteomics (2012)

False discovery rate validation—Two statistical analyses to evaluated the accuracy of values reported by Mascot Percolator. A, This log based graph plots the reported q-values from Mascot Percolator against the FDR estimated from a bipartite database search. Both the CID and ETcaD data sets show good consistency between the estimated FDR and p values with the majority of deviation from y = x occurring below the 0.01 p value threshold. The dotted lines represent y = 2x and y = x/2. B, This is a QQ plot of the observed Mascot Percolator  value (entrapment PSM) p values against a theoretical uniform p value distribution.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3412976&req=5

Figure 2: False discovery rate validation—Two statistical analyses to evaluated the accuracy of values reported by Mascot Percolator. A, This log based graph plots the reported q-values from Mascot Percolator against the FDR estimated from a bipartite database search. Both the CID and ETcaD data sets show good consistency between the estimated FDR and p values with the majority of deviation from y = x occurring below the 0.01 p value threshold. The dotted lines represent y = 2x and y = x/2. B, This is a QQ plot of the observed Mascot Percolator value (entrapment PSM) p values against a theoretical uniform p value distribution.
Mentions: To complete the validation, spectra from these UPS experiments were then searched against a bipartite database (34) containing only the IPI sequences for the 48 proteins in the standard, plus common contaminates. These selected sequences are concatenated with 10 times that number of shuffled entrapment protein sequences. The resulting PSMs from this database were filtered and hits to the entrapment proteins used to estimate false positives over a range of Mascot Percolator q-values. Fig. 2 depicts the plotted q-value and FDR estimates from Mascot Percolator for both the CID and ETcaD UPS data on a log scale. Employing a two-sample Kolmogorov-Smirnov (K-S) test to the bipartite database FDR estimates and the Mascot Percolator q-values, we achieve a maximum difference of 0.02 for the CID data and 0.13 for the ETcaD data. The slightly higher value for the ETcaD data is likely because of the much smaller data set size, also if we examine the plot in Fig. 2 we can see that the majority of the difference between the estimated FDR and Mascot Percolator's q-values is for PSMs below the typical 1% false discovery rate threshold. If we perform the K-S test on the same CID data processed using Mascot Percolator v1.09 the maximum difference is 0.03, reflective of a consistent performance between the different versions. This comparison cannot be done for the ETD data as that version of the software could only generate features for b and y ion series.

Bottom Line: We have previously interfaced the Mascot search engine with Percolator and demonstrated sensitivity and specificity benefits with CID data.Using a data set of CID and ETcaD spectral pairs, we find that at a 1% false discovery rate, the overlap in peptide identifications by CID and ETD is 83%, which is significantly higher than that obtained using either stand-alone Mascot (69%) or OMSSA (39%).We conclude that Mascot Percolator is a highly sensitive and accurate post-search algorithm for peptide identification and allows direct comparison of peptide identifications using multiple alternative fragmentation techniques.

View Article: PubMed Central - PubMed

Affiliation: Proteomic Mass Spectrometry, Wellcome Trust Sanger Institute, Hinxton, Cambridge.

ABSTRACT
Peptide identification using tandem mass spectrometry is a core technology in proteomics. Latest generations of mass spectrometry instruments enable the use of electron transfer dissociation (ETD) to complement collision induced dissociation (CID) for peptide fragmentation. However, a critical limitation to the use of ETD has been optimal database search software. Percolator is a post-search algorithm, which uses semi-supervised machine learning to improve the rate of peptide spectrum identifications (PSMs) together with providing reliable significance measures. We have previously interfaced the Mascot search engine with Percolator and demonstrated sensitivity and specificity benefits with CID data. Here, we report recent developments in the Mascot Percolator V2.0 software including an improved feature calculator and support for a wider range of ion series. The updated software is applied to the analysis of several CID and ETD fragmented peptide data sets. This version of Mascot Percolator increases the number of CID PSMs by up to 80% and ETD PSMs by up to 60% at a 0.01 q-value (1% false discovery rate) threshold over a standard Mascot search, notably recovering PSMs from high charge state precursor ions. The greatly increased number of PSMs and peptide coverage afforded by Mascot Percolator has enabled a fuller assessment of CID/ETD complementarity to be performed. Using a data set of CID and ETcaD spectral pairs, we find that at a 1% false discovery rate, the overlap in peptide identifications by CID and ETD is 83%, which is significantly higher than that obtained using either stand-alone Mascot (69%) or OMSSA (39%). We conclude that Mascot Percolator is a highly sensitive and accurate post-search algorithm for peptide identification and allows direct comparison of peptide identifications using multiple alternative fragmentation techniques.

Show MeSH
Related in: MedlinePlus