Limits...
Detection of co-eluted peptides using database search methods.

Alves G, Ogurtsov AY, Kwok S, Wu WW, Wang G, Shen RF, Yu YK - Biol. Direct (2008)

Bottom Line: We show that using these simulated spectra, all the database search methods will gain eventually in the number of true peptides identified by using the compound spectra of co-eluted peptides.Reviewed by Vlad Petyuk (nominated by Arcady Mushegian), King Jordan and Shamil Sunyaev.For the full reviews, please go to the Reviewers' comments section.

View Article: PubMed Central - HTML - PubMed

Affiliation: National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD, 20894, USA. alves@ncbi.nlm.nih.gov

ABSTRACT

Background: Current experimental techniques, especially those applying liquid chromatography mass spectrometry, have made high-throughput proteomic studies possible. The increase in throughput however also raises concerns on the accuracy of identification or quantification. Most experimental procedures select in a given MS scan only a few relatively most intense parent ions, each to be fragmented (MS2) separately, and most other minor co-eluted peptides that have similar chromatographic retention times are ignored and their information lost.

Results: We have computationally investigated the possibility of enhancing the information retrieval during a given LC/MS experiment by selecting the two or three most intense parent ions for simultaneous fragmentation. A set of spectra is created via superimposing a number of MS2 spectra, each can be identified by all search methods tested with high confidence, to mimick the spectra of co-eluted peptides. The generated convoluted spectra were used to evaluate the capability of several database search methods - SEQUEST, Mascot, X!Tandem, OMSSA, and RAId_DbS - in identifying true peptides from superimposed spectra of co-eluted peptides. We show that using these simulated spectra, all the database search methods will gain eventually in the number of true peptides identified by using the compound spectra of co-eluted peptides.

Open peer review: Reviewed by Vlad Petyuk (nominated by Arcady Mushegian), King Jordan and Shamil Sunyaev. For the full reviews, please go to the Reviewers' comments section.

Show MeSH

Related in: MedlinePlus

Cumulative identification ratio versus E-value cutoff for compound spectra. Panel A/B and C/D displays the cumulative identification ratio as a function of E-value cutoff for the five database search tools when analyzing compound spectra constructed by the SUM2/SUM3 method while combining the LTQ/LTQ single-peptide spectra (A/B) and unique-peptide spectra (C/D) in the co-identifiable set. The symbols CTP1, CTP2 and CTP3 correspond respectively to the cumulative number of true positives (TP) identified with E-value equal to or smaller than the specified cutoff when analyzing single-peptide spectra, compound spectra of two peptides, and compound spectra of three peptides.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2483259&req=5

Figure 7: Cumulative identification ratio versus E-value cutoff for compound spectra. Panel A/B and C/D displays the cumulative identification ratio as a function of E-value cutoff for the five database search tools when analyzing compound spectra constructed by the SUM2/SUM3 method while combining the LTQ/LTQ single-peptide spectra (A/B) and unique-peptide spectra (C/D) in the co-identifiable set. The symbols CTP1, CTP2 and CTP3 correspond respectively to the cumulative number of true positives (TP) identified with E-value equal to or smaller than the specified cutoff when analyzing single-peptide spectra, compound spectra of two peptides, and compound spectra of three peptides.

Mentions: Upon inspection of the ROC curves for the five database search methods tested, it seems that the ROC curves for the compound spectra each resulting from 2 or 3 co-eluted peptides always climb above the ROC curve for spectra each resulting from only a single peptide. This implies that there would be no loss in protein coverage by sending simultaneously the 2 or 3 most intense precursor ions to the second MS to generate a convoluted spectrum. Since the transformed E-value [37] now may serve as a common statistical standard across different search methods, one may wish to understand how the statistical significance assignment of the identified peptides is impacted by using a convoluted spectrum. To this end, we first define the cumulative identification ratio as the ratio of the cumulative number of true peptides identified from compound spectra (each resulting from 2 or 3 co-eluted peptides) to that from the co-identifiable spectra (each resulting a single peptide). We then plotted the cumulative identification ratio against the E-value cutoff (Figure 7). As shown in Figure 7, all search method eventually go above the horizontal line y = 1, indicating an increase in peptide coverage. One also observes that the ratio at low E-value is smaller than one. Such a result is expected since each compound spectrum becomes quite complex due to spectrum mixing and the higher noise level makes it harder for a true positive peptide's score to be significantly higher than the background. As a consequence, a true peptide hit here may be assigned a higher E-value.


Detection of co-eluted peptides using database search methods.

Alves G, Ogurtsov AY, Kwok S, Wu WW, Wang G, Shen RF, Yu YK - Biol. Direct (2008)

Cumulative identification ratio versus E-value cutoff for compound spectra. Panel A/B and C/D displays the cumulative identification ratio as a function of E-value cutoff for the five database search tools when analyzing compound spectra constructed by the SUM2/SUM3 method while combining the LTQ/LTQ single-peptide spectra (A/B) and unique-peptide spectra (C/D) in the co-identifiable set. The symbols CTP1, CTP2 and CTP3 correspond respectively to the cumulative number of true positives (TP) identified with E-value equal to or smaller than the specified cutoff when analyzing single-peptide spectra, compound spectra of two peptides, and compound spectra of three peptides.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2483259&req=5

Figure 7: Cumulative identification ratio versus E-value cutoff for compound spectra. Panel A/B and C/D displays the cumulative identification ratio as a function of E-value cutoff for the five database search tools when analyzing compound spectra constructed by the SUM2/SUM3 method while combining the LTQ/LTQ single-peptide spectra (A/B) and unique-peptide spectra (C/D) in the co-identifiable set. The symbols CTP1, CTP2 and CTP3 correspond respectively to the cumulative number of true positives (TP) identified with E-value equal to or smaller than the specified cutoff when analyzing single-peptide spectra, compound spectra of two peptides, and compound spectra of three peptides.
Mentions: Upon inspection of the ROC curves for the five database search methods tested, it seems that the ROC curves for the compound spectra each resulting from 2 or 3 co-eluted peptides always climb above the ROC curve for spectra each resulting from only a single peptide. This implies that there would be no loss in protein coverage by sending simultaneously the 2 or 3 most intense precursor ions to the second MS to generate a convoluted spectrum. Since the transformed E-value [37] now may serve as a common statistical standard across different search methods, one may wish to understand how the statistical significance assignment of the identified peptides is impacted by using a convoluted spectrum. To this end, we first define the cumulative identification ratio as the ratio of the cumulative number of true peptides identified from compound spectra (each resulting from 2 or 3 co-eluted peptides) to that from the co-identifiable spectra (each resulting a single peptide). We then plotted the cumulative identification ratio against the E-value cutoff (Figure 7). As shown in Figure 7, all search method eventually go above the horizontal line y = 1, indicating an increase in peptide coverage. One also observes that the ratio at low E-value is smaller than one. Such a result is expected since each compound spectrum becomes quite complex due to spectrum mixing and the higher noise level makes it harder for a true positive peptide's score to be significantly higher than the background. As a consequence, a true peptide hit here may be assigned a higher E-value.

Bottom Line: We show that using these simulated spectra, all the database search methods will gain eventually in the number of true peptides identified by using the compound spectra of co-eluted peptides.Reviewed by Vlad Petyuk (nominated by Arcady Mushegian), King Jordan and Shamil Sunyaev.For the full reviews, please go to the Reviewers' comments section.

View Article: PubMed Central - HTML - PubMed

Affiliation: National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD, 20894, USA. alves@ncbi.nlm.nih.gov

ABSTRACT

Background: Current experimental techniques, especially those applying liquid chromatography mass spectrometry, have made high-throughput proteomic studies possible. The increase in throughput however also raises concerns on the accuracy of identification or quantification. Most experimental procedures select in a given MS scan only a few relatively most intense parent ions, each to be fragmented (MS2) separately, and most other minor co-eluted peptides that have similar chromatographic retention times are ignored and their information lost.

Results: We have computationally investigated the possibility of enhancing the information retrieval during a given LC/MS experiment by selecting the two or three most intense parent ions for simultaneous fragmentation. A set of spectra is created via superimposing a number of MS2 spectra, each can be identified by all search methods tested with high confidence, to mimick the spectra of co-eluted peptides. The generated convoluted spectra were used to evaluate the capability of several database search methods - SEQUEST, Mascot, X!Tandem, OMSSA, and RAId_DbS - in identifying true peptides from superimposed spectra of co-eluted peptides. We show that using these simulated spectra, all the database search methods will gain eventually in the number of true peptides identified by using the compound spectra of co-eluted peptides.

Open peer review: Reviewed by Vlad Petyuk (nominated by Arcady Mushegian), King Jordan and Shamil Sunyaev. For the full reviews, please go to the Reviewers' comments section.

Show MeSH
Related in: MedlinePlus