Limits...
RAId_DbS: peptide identification using database searches with realistic statistics.

Alves G, Ogurtsov AY, Yu YK - Biol. Direct (2007)

Bottom Line: Another feature of RAId_DbS is its capability of detecting multiple co-eluted peptides.The peptide identification performance and statistical accuracy of RAId_DbS are assessed and compared with several other search tools.The executables and data related to RAId_DbS are freely available upon request.

View Article: PubMed Central - HTML - PubMed

Affiliation: National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD 20894, USA.

ABSTRACT

Background: The key to mass-spectrometry-based proteomics is peptide identification. A major challenge in peptide identification is to obtain realistic E-values when assigning statistical significance to candidate peptides.

Results: Using a simple scoring scheme, we propose a database search method with theoretically characterized statistics. Taking into account possible skewness in the random variable distribution and the effect of finite sampling, we provide a theoretical derivation for the tail of the score distribution. For every experimental spectrum examined, we collect the scores of peptides in the database, and find good agreement between the collected score statistics and our theoretical distribution. Using Student's t-tests, we quantify the degree of agreement between the theoretical distribution and the score statistics collected. The T-tests may be used to measure the reliability of reported statistics. When combined with reported P-value for a peptide hit using a score distribution model, this new measure prevents exaggerated statistics. Another feature of RAId_DbS is its capability of detecting multiple co-eluted peptides. The peptide identification performance and statistical accuracy of RAId_DbS are assessed and compared with several other search tools. The executables and data related to RAId_DbS are freely available upon request.

Show MeSH
Quantification of goodness of score model used for statistical significance assignment. A global study of the Mpdf accuracy using 10,000 spectra (profile mode). Panel (A) shows the histogram of the goodness number. Panel (B) shows a scattered plot of ν versus r obtained from our spectra as well as a number of curves each corresponds to a fixed PM value. Panel (C) displays the histogram of log10(PM).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2211744&req=5

Figure 4: Quantification of goodness of score model used for statistical significance assignment. A global study of the Mpdf accuracy using 10,000 spectra (profile mode). Panel (A) shows the histogram of the goodness number. Panel (B) shows a scattered plot of ν versus r obtained from our spectra as well as a number of curves each corresponds to a fixed PM value. Panel (C) displays the histogram of log10(PM).

Mentions: A global study of the Mpdf accuracy using 10, 000 spectra (profile mode) is summarized in Fig. 4. Panel (A) shows the histogram of the goodness number, panel (B) shows a scattered plot of ν versus r obtained from our spectra, and panel (C) displays the histogram of log10(PM). Also displayed in panel (B) are curves with fixed PM values. As we may see from these plots, the fitting quality of the LDpdf to our theoretical distribution is generally very good. The important message, however, is that each search method should provide the goodness of fitting so that the users can be informed and can decide whether to take the reported statistics seriously or not. We have suggested a goodness number cutoff 0.1 for accepting an Mpdf. The user, however, may choose a slightly larger number as the cutoff to reject Mpdfs that (s)he has less confidence in. As for PM, it is not necessary to employ a cutoff there. This is because a poor(large) PM will automatically make any hits found insignificant through eq. (23).


RAId_DbS: peptide identification using database searches with realistic statistics.

Alves G, Ogurtsov AY, Yu YK - Biol. Direct (2007)

Quantification of goodness of score model used for statistical significance assignment. A global study of the Mpdf accuracy using 10,000 spectra (profile mode). Panel (A) shows the histogram of the goodness number. Panel (B) shows a scattered plot of ν versus r obtained from our spectra as well as a number of curves each corresponds to a fixed PM value. Panel (C) displays the histogram of log10(PM).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2211744&req=5

Figure 4: Quantification of goodness of score model used for statistical significance assignment. A global study of the Mpdf accuracy using 10,000 spectra (profile mode). Panel (A) shows the histogram of the goodness number. Panel (B) shows a scattered plot of ν versus r obtained from our spectra as well as a number of curves each corresponds to a fixed PM value. Panel (C) displays the histogram of log10(PM).
Mentions: A global study of the Mpdf accuracy using 10, 000 spectra (profile mode) is summarized in Fig. 4. Panel (A) shows the histogram of the goodness number, panel (B) shows a scattered plot of ν versus r obtained from our spectra, and panel (C) displays the histogram of log10(PM). Also displayed in panel (B) are curves with fixed PM values. As we may see from these plots, the fitting quality of the LDpdf to our theoretical distribution is generally very good. The important message, however, is that each search method should provide the goodness of fitting so that the users can be informed and can decide whether to take the reported statistics seriously or not. We have suggested a goodness number cutoff 0.1 for accepting an Mpdf. The user, however, may choose a slightly larger number as the cutoff to reject Mpdfs that (s)he has less confidence in. As for PM, it is not necessary to employ a cutoff there. This is because a poor(large) PM will automatically make any hits found insignificant through eq. (23).

Bottom Line: Another feature of RAId_DbS is its capability of detecting multiple co-eluted peptides.The peptide identification performance and statistical accuracy of RAId_DbS are assessed and compared with several other search tools.The executables and data related to RAId_DbS are freely available upon request.

View Article: PubMed Central - HTML - PubMed

Affiliation: National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD 20894, USA.

ABSTRACT

Background: The key to mass-spectrometry-based proteomics is peptide identification. A major challenge in peptide identification is to obtain realistic E-values when assigning statistical significance to candidate peptides.

Results: Using a simple scoring scheme, we propose a database search method with theoretically characterized statistics. Taking into account possible skewness in the random variable distribution and the effect of finite sampling, we provide a theoretical derivation for the tail of the score distribution. For every experimental spectrum examined, we collect the scores of peptides in the database, and find good agreement between the collected score statistics and our theoretical distribution. Using Student's t-tests, we quantify the degree of agreement between the theoretical distribution and the score statistics collected. The T-tests may be used to measure the reliability of reported statistics. When combined with reported P-value for a peptide hit using a score distribution model, this new measure prevents exaggerated statistics. Another feature of RAId_DbS is its capability of detecting multiple co-eluted peptides. The peptide identification performance and statistical accuracy of RAId_DbS are assessed and compared with several other search tools. The executables and data related to RAId_DbS are freely available upon request.

Show MeSH