Limits...
Enhanced peptide identification by electron transfer dissociation using an improved Mascot Percolator.

Wright JC, Collins MO, Yu L, Käll L, Brosch M, Choudhary JS - Mol. Cell Proteomics (2012)

Bottom Line: We have previously interfaced the Mascot search engine with Percolator and demonstrated sensitivity and specificity benefits with CID data.Using a data set of CID and ETcaD spectral pairs, we find that at a 1% false discovery rate, the overlap in peptide identifications by CID and ETD is 83%, which is significantly higher than that obtained using either stand-alone Mascot (69%) or OMSSA (39%).We conclude that Mascot Percolator is a highly sensitive and accurate post-search algorithm for peptide identification and allows direct comparison of peptide identifications using multiple alternative fragmentation techniques.

View Article: PubMed Central - PubMed

Affiliation: Proteomic Mass Spectrometry, Wellcome Trust Sanger Institute, Hinxton, Cambridge.

ABSTRACT
Peptide identification using tandem mass spectrometry is a core technology in proteomics. Latest generations of mass spectrometry instruments enable the use of electron transfer dissociation (ETD) to complement collision induced dissociation (CID) for peptide fragmentation. However, a critical limitation to the use of ETD has been optimal database search software. Percolator is a post-search algorithm, which uses semi-supervised machine learning to improve the rate of peptide spectrum identifications (PSMs) together with providing reliable significance measures. We have previously interfaced the Mascot search engine with Percolator and demonstrated sensitivity and specificity benefits with CID data. Here, we report recent developments in the Mascot Percolator V2.0 software including an improved feature calculator and support for a wider range of ion series. The updated software is applied to the analysis of several CID and ETD fragmented peptide data sets. This version of Mascot Percolator increases the number of CID PSMs by up to 80% and ETD PSMs by up to 60% at a 0.01 q-value (1% false discovery rate) threshold over a standard Mascot search, notably recovering PSMs from high charge state precursor ions. The greatly increased number of PSMs and peptide coverage afforded by Mascot Percolator has enabled a fuller assessment of CID/ETD complementarity to be performed. Using a data set of CID and ETcaD spectral pairs, we find that at a 1% false discovery rate, the overlap in peptide identifications by CID and ETD is 83%, which is significantly higher than that obtained using either stand-alone Mascot (69%) or OMSSA (39%). We conclude that Mascot Percolator is a highly sensitive and accurate post-search algorithm for peptide identification and allows direct comparison of peptide identifications using multiple alternative fragmentation techniques.

Show MeSH

Related in: MedlinePlus

Yeast peptide spectrum match q-p plots—These four q-value PSM plots display the estimated number of correct PSMs for the Yeast ETD and ETcaD data sets using Mascot, OMSSA, and Mascot Percolator across a range of q-value thresholds. Plot A, shows all the complete data set and plots B, C, and D, show the estimated correct PSMs for 2+, 3+, and >3+ precursor charge states.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3412976&req=5

Figure 3: Yeast peptide spectrum match q-p plots—These four q-value PSM plots display the estimated number of correct PSMs for the Yeast ETD and ETcaD data sets using Mascot, OMSSA, and Mascot Percolator across a range of q-value thresholds. Plot A, shows all the complete data set and plots B, C, and D, show the estimated correct PSMs for 2+, 3+, and >3+ precursor charge states.

Mentions: Two LysC digested Yeast experimental ETD and ETcaD data sets generated by the Coon Research Group to optimize the decision tree protocol (27) were chosen because of their previous detailed analysis using OMSSA and large size, having more than 50,000 spectra in each set. The principle difference between the two data sets is that in the ETcaD set, supplemental activation was employed to improve ETD fragmentation efficiency for doubly charged precursor ions (31). This is the only ETD data set we present in this study that does not use supplemental activation. The corresponding raw MS files were processed using Proteome Discoverer rather than COMPASS (36), as reported in the original publication. A fractional difference of 502 and 18 spectra is observed for the ETD and ETcaD data sets because of the different processing methods. A summary of the results obtained at a PSM q-value threshold of 0.01 for each search method is shown in Table 3B. At this high confidence threshold Mascot identifies 20 and 25% of the total spectra for the ETD and ETcaD data sets; the data set coverage is slightly lower for OMSSA, identifying 17 and 18% of spectra respectively. Further processing of the Mascot search results using Mascot Percolator increases the percentage of spectra identified to 32 and 35% respectively, resulting in an average gain in the number of PSMs of 50% across the two experiments. The q-value PSM plots displayed in Fig. 3 highlight the observed performances from OMSSA, Mascot, and Mascot Percolator. Detailed inspection of individual precursor charge state q-value PSM plots shown in Fig. 3 indicate that Mascot outperforms OMSSA at lower charge states, but at higher charge states OMSSA performs better. Notably, Mascot Percolator shows gains over the stand alone search methods at all charge states, resulting in a rise in ETD (ETcaD) PSMs, for 2+, 3+, and >3+ precursors, of 2311 (1760), 2096 (1828), and 2632 (2519) over Mascot and 3436 (4570), 3750 (3402), and 2092 (2044) over OMSSA at a q-value threshold of 0.01. For both the ETD and ETcaD data sets only 32 doubly charged peptides are identified with OMSSA, a low identification rate for doubly charged spectra when conducting an ETD search has been previously documented as a limitation of the engine (19). Finally, these plots show that the use of supplemental activation substantially enhances identification of 2+ charge PSMs significantly increasing the identification rate with Mascot and Mascot Percolator, this trend is not noticeable for 3+ or greater charged PSMs. In the original publication of this data, 12,193 and 11,470 PSMs are reported for the ETD and ETcaD data sets at a 1% false positive rate (27). Mascot Percolator shows an increase of 59 and 77% in the number of PSMs over those originally reported.


Enhanced peptide identification by electron transfer dissociation using an improved Mascot Percolator.

Wright JC, Collins MO, Yu L, Käll L, Brosch M, Choudhary JS - Mol. Cell Proteomics (2012)

Yeast peptide spectrum match q-p plots—These four q-value PSM plots display the estimated number of correct PSMs for the Yeast ETD and ETcaD data sets using Mascot, OMSSA, and Mascot Percolator across a range of q-value thresholds. Plot A, shows all the complete data set and plots B, C, and D, show the estimated correct PSMs for 2+, 3+, and >3+ precursor charge states.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3412976&req=5

Figure 3: Yeast peptide spectrum match q-p plots—These four q-value PSM plots display the estimated number of correct PSMs for the Yeast ETD and ETcaD data sets using Mascot, OMSSA, and Mascot Percolator across a range of q-value thresholds. Plot A, shows all the complete data set and plots B, C, and D, show the estimated correct PSMs for 2+, 3+, and >3+ precursor charge states.
Mentions: Two LysC digested Yeast experimental ETD and ETcaD data sets generated by the Coon Research Group to optimize the decision tree protocol (27) were chosen because of their previous detailed analysis using OMSSA and large size, having more than 50,000 spectra in each set. The principle difference between the two data sets is that in the ETcaD set, supplemental activation was employed to improve ETD fragmentation efficiency for doubly charged precursor ions (31). This is the only ETD data set we present in this study that does not use supplemental activation. The corresponding raw MS files were processed using Proteome Discoverer rather than COMPASS (36), as reported in the original publication. A fractional difference of 502 and 18 spectra is observed for the ETD and ETcaD data sets because of the different processing methods. A summary of the results obtained at a PSM q-value threshold of 0.01 for each search method is shown in Table 3B. At this high confidence threshold Mascot identifies 20 and 25% of the total spectra for the ETD and ETcaD data sets; the data set coverage is slightly lower for OMSSA, identifying 17 and 18% of spectra respectively. Further processing of the Mascot search results using Mascot Percolator increases the percentage of spectra identified to 32 and 35% respectively, resulting in an average gain in the number of PSMs of 50% across the two experiments. The q-value PSM plots displayed in Fig. 3 highlight the observed performances from OMSSA, Mascot, and Mascot Percolator. Detailed inspection of individual precursor charge state q-value PSM plots shown in Fig. 3 indicate that Mascot outperforms OMSSA at lower charge states, but at higher charge states OMSSA performs better. Notably, Mascot Percolator shows gains over the stand alone search methods at all charge states, resulting in a rise in ETD (ETcaD) PSMs, for 2+, 3+, and >3+ precursors, of 2311 (1760), 2096 (1828), and 2632 (2519) over Mascot and 3436 (4570), 3750 (3402), and 2092 (2044) over OMSSA at a q-value threshold of 0.01. For both the ETD and ETcaD data sets only 32 doubly charged peptides are identified with OMSSA, a low identification rate for doubly charged spectra when conducting an ETD search has been previously documented as a limitation of the engine (19). Finally, these plots show that the use of supplemental activation substantially enhances identification of 2+ charge PSMs significantly increasing the identification rate with Mascot and Mascot Percolator, this trend is not noticeable for 3+ or greater charged PSMs. In the original publication of this data, 12,193 and 11,470 PSMs are reported for the ETD and ETcaD data sets at a 1% false positive rate (27). Mascot Percolator shows an increase of 59 and 77% in the number of PSMs over those originally reported.

Bottom Line: We have previously interfaced the Mascot search engine with Percolator and demonstrated sensitivity and specificity benefits with CID data.Using a data set of CID and ETcaD spectral pairs, we find that at a 1% false discovery rate, the overlap in peptide identifications by CID and ETD is 83%, which is significantly higher than that obtained using either stand-alone Mascot (69%) or OMSSA (39%).We conclude that Mascot Percolator is a highly sensitive and accurate post-search algorithm for peptide identification and allows direct comparison of peptide identifications using multiple alternative fragmentation techniques.

View Article: PubMed Central - PubMed

Affiliation: Proteomic Mass Spectrometry, Wellcome Trust Sanger Institute, Hinxton, Cambridge.

ABSTRACT
Peptide identification using tandem mass spectrometry is a core technology in proteomics. Latest generations of mass spectrometry instruments enable the use of electron transfer dissociation (ETD) to complement collision induced dissociation (CID) for peptide fragmentation. However, a critical limitation to the use of ETD has been optimal database search software. Percolator is a post-search algorithm, which uses semi-supervised machine learning to improve the rate of peptide spectrum identifications (PSMs) together with providing reliable significance measures. We have previously interfaced the Mascot search engine with Percolator and demonstrated sensitivity and specificity benefits with CID data. Here, we report recent developments in the Mascot Percolator V2.0 software including an improved feature calculator and support for a wider range of ion series. The updated software is applied to the analysis of several CID and ETD fragmented peptide data sets. This version of Mascot Percolator increases the number of CID PSMs by up to 80% and ETD PSMs by up to 60% at a 0.01 q-value (1% false discovery rate) threshold over a standard Mascot search, notably recovering PSMs from high charge state precursor ions. The greatly increased number of PSMs and peptide coverage afforded by Mascot Percolator has enabled a fuller assessment of CID/ETD complementarity to be performed. Using a data set of CID and ETcaD spectral pairs, we find that at a 1% false discovery rate, the overlap in peptide identifications by CID and ETD is 83%, which is significantly higher than that obtained using either stand-alone Mascot (69%) or OMSSA (39%). We conclude that Mascot Percolator is a highly sensitive and accurate post-search algorithm for peptide identification and allows direct comparison of peptide identifications using multiple alternative fragmentation techniques.

Show MeSH
Related in: MedlinePlus