Limits...
A nonparametric model for quality control of database search results in shotgun proteomics.

Zhang J, Li J, Liu X, Xie H, Zhu Y, He F - BMC Bioinformatics (2008)

Bottom Line: However, validation of database search results creates a bottleneck in MS/MS data processing.Using the nonparametric model, a more flexible DF can be obtained, resulting in improved sensitivity and good FPR estimation.This nonparametric statistical technique is a powerful tool for tackling the complexity and diversity of datasets in shotgun proteomics.

View Article: PubMed Central - HTML - PubMed

Affiliation: College of Mechanical & Electronic Engineering and Automatization, National University of Defense Technology, Changsha, 410073, China. zhangjy@hupo.org.cn

ABSTRACT

Background: Analysis of complex samples with tandem mass spectrometry (MS/MS) has become routine in proteomic research. However, validation of database search results creates a bottleneck in MS/MS data processing. Recently, methods based on a randomized database have become popular for quality control of database search results. However, a consequent problem is the ignorance of how to combine different database search scores to improve the sensitivity of randomized database methods.

Results: In this paper, a multivariate nonlinear discriminate function (DF) based on the multivariate nonparametric density estimation technique was used to filter out false-positive database search results with a predictable false positive rate (FPR). Application of this method to control datasets of different instruments (LCQ, LTQ, and LTQ/FT) yielded an estimated FPR close to the actual FPR. As expected, the method was more sensitive when more features were used. Furthermore, the new method was shown to be more sensitive than two commonly used methods on 3 complex sample datasets and 3 control datasets.

Conclusion: Using the nonparametric model, a more flexible DF can be obtained, resulting in improved sensitivity and good FPR estimation. This nonparametric statistical technique is a powerful tool for tackling the complexity and diversity of datasets in shotgun proteomics.

Show MeSH
The mesh grids of the DF of M3 and the score distribution of the matches uniquely validated by M1~M3. The blue points in B~E represent the matches uniquely validated by M3, the red points are those of M2 and the green points are those of M1.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2267700&req=5

Figure 5: The mesh grids of the DF of M3 and the score distribution of the matches uniquely validated by M1~M3. The blue points in B~E represent the matches uniquely validated by M3, the red points are those of M2 and the green points are those of M1.

Mentions: Figure 5A shows the mesh grids of a DF of M3 (+2 charge state matches in D5, FPR = 0.01). As it appears, the matches with the smaller Xcorr, ΔCn or Sim were discarded by M3, which agrees with the experience that the matches with large scores (Xcorr, ΔCn or Sim) are more possibly correct. Figure 5B~Figure 5E illustrate the score distributions of the matches uniquely confirmed by M1~M3. It is clear that some matches with small Xcorr, ΔCn and Sim were confirmed by PeptideProphet (red points), which integrated some other parameters, such as preliminary score (Sp). M2 confirmed some matches with middle Xcorr and ΔCn but small Sim (green points). M3 confirmed many matches (4714) with relative smaller Xcorr and ΔCn but large Sim, which were discarded by M1 and M3. These results demonstrated that different filter boundaries with different parameters would generate different results with different sensitivity and integrating more complementary parameters by appropriate methods could improve the sensitivity of database search result validation.


A nonparametric model for quality control of database search results in shotgun proteomics.

Zhang J, Li J, Liu X, Xie H, Zhu Y, He F - BMC Bioinformatics (2008)

The mesh grids of the DF of M3 and the score distribution of the matches uniquely validated by M1~M3. The blue points in B~E represent the matches uniquely validated by M3, the red points are those of M2 and the green points are those of M1.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2267700&req=5

Figure 5: The mesh grids of the DF of M3 and the score distribution of the matches uniquely validated by M1~M3. The blue points in B~E represent the matches uniquely validated by M3, the red points are those of M2 and the green points are those of M1.
Mentions: Figure 5A shows the mesh grids of a DF of M3 (+2 charge state matches in D5, FPR = 0.01). As it appears, the matches with the smaller Xcorr, ΔCn or Sim were discarded by M3, which agrees with the experience that the matches with large scores (Xcorr, ΔCn or Sim) are more possibly correct. Figure 5B~Figure 5E illustrate the score distributions of the matches uniquely confirmed by M1~M3. It is clear that some matches with small Xcorr, ΔCn and Sim were confirmed by PeptideProphet (red points), which integrated some other parameters, such as preliminary score (Sp). M2 confirmed some matches with middle Xcorr and ΔCn but small Sim (green points). M3 confirmed many matches (4714) with relative smaller Xcorr and ΔCn but large Sim, which were discarded by M1 and M3. These results demonstrated that different filter boundaries with different parameters would generate different results with different sensitivity and integrating more complementary parameters by appropriate methods could improve the sensitivity of database search result validation.

Bottom Line: However, validation of database search results creates a bottleneck in MS/MS data processing.Using the nonparametric model, a more flexible DF can be obtained, resulting in improved sensitivity and good FPR estimation.This nonparametric statistical technique is a powerful tool for tackling the complexity and diversity of datasets in shotgun proteomics.

View Article: PubMed Central - HTML - PubMed

Affiliation: College of Mechanical & Electronic Engineering and Automatization, National University of Defense Technology, Changsha, 410073, China. zhangjy@hupo.org.cn

ABSTRACT

Background: Analysis of complex samples with tandem mass spectrometry (MS/MS) has become routine in proteomic research. However, validation of database search results creates a bottleneck in MS/MS data processing. Recently, methods based on a randomized database have become popular for quality control of database search results. However, a consequent problem is the ignorance of how to combine different database search scores to improve the sensitivity of randomized database methods.

Results: In this paper, a multivariate nonlinear discriminate function (DF) based on the multivariate nonparametric density estimation technique was used to filter out false-positive database search results with a predictable false positive rate (FPR). Application of this method to control datasets of different instruments (LCQ, LTQ, and LTQ/FT) yielded an estimated FPR close to the actual FPR. As expected, the method was more sensitive when more features were used. Furthermore, the new method was shown to be more sensitive than two commonly used methods on 3 complex sample datasets and 3 control datasets.

Conclusion: Using the nonparametric model, a more flexible DF can be obtained, resulting in improved sensitivity and good FPR estimation. This nonparametric statistical technique is a powerful tool for tackling the complexity and diversity of datasets in shotgun proteomics.

Show MeSH