Limits...
Score regularization for peptide identification.

He Z, Zhao H, Yu W - BMC Bioinformatics (2011)

Bottom Line: Thus, it is critical to develop new post-processing techniques that can distinguish true identifications from false identifications effectively.In this paper, we present a consistency-based PSM re-ranking method to improve the initial identification results.The score regularization method can be used as a general post-processing step for improving peptide identifications.

View Article: PubMed Central - HTML - PubMed

Affiliation: School of Software, Dalian University of Technology, Dalian, China. zyhe@dlut.edu.cn

ABSTRACT

Background: Peptide identification from tandem mass spectrometry (MS/MS) data is one of the most important problems in computational proteomics. This technique relies heavily on the accurate assessment of the quality of peptide-spectrum matches (PSMs). However, current MS technology and PSM scoring algorithm are far from perfect, leading to the generation of incorrect peptide-spectrum pairs. Thus, it is critical to develop new post-processing techniques that can distinguish true identifications from false identifications effectively.

Results: In this paper, we present a consistency-based PSM re-ranking method to improve the initial identification results. This method uses one additional assumption that two peptides belonging to the same protein should be correlated to each other. We formulate an optimization problem that embraces two objectives through regularization: the smoothing consistency among scores of correlated peptides and the fitting consistency between new scores and initial scores. This optimization problem can be solved analytically. The experimental study on several real MS/MS data sets shows that this re-ranking method improves the identification performance.

Conclusions: The score regularization method can be used as a general post-processing step for improving peptide identifications. Source codes and data sets are available at: http://bioinformatics.ust.hk/SRPI.rar.

Show MeSH
The score distribution before and after re-ranking. Left: The score distribution of true identifications and decoy identifications before re-ranking. Right: The score distribution of true identifications and decoy identifications after re-ranking. Both the initial score and updated score are normalized into the interval [0,1] with a min-max normalization procedure.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3044274&req=5

Figure 5: The score distribution before and after re-ranking. Left: The score distribution of true identifications and decoy identifications before re-ranking. Right: The score distribution of true identifications and decoy identifications after re-ranking. Both the initial score and updated score are normalized into the interval [0,1] with a min-max normalization procedure.

Mentions: We also plot the initial score distribution and the updated score distribution in Fig.5. Here we use the min-max normalization to transform both the initial identification score and the new re-ranked score into the interval [0,1]. It reveals that the consistency constraint will shrink scores in each group (true and decoy) towards their mean value. Although the consistency-based re-ranking method cannot completely separate true identifications from decoys, it does reduce the score overlap on DS2, DS3 and DS4. Note that the consistency-based re-ranking procedure is less effective on DS1 since there is a serious score overlap. Even in this case, we find that the separation between true and decoy identifications is improved at lower score region.


Score regularization for peptide identification.

He Z, Zhao H, Yu W - BMC Bioinformatics (2011)

The score distribution before and after re-ranking. Left: The score distribution of true identifications and decoy identifications before re-ranking. Right: The score distribution of true identifications and decoy identifications after re-ranking. Both the initial score and updated score are normalized into the interval [0,1] with a min-max normalization procedure.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3044274&req=5

Figure 5: The score distribution before and after re-ranking. Left: The score distribution of true identifications and decoy identifications before re-ranking. Right: The score distribution of true identifications and decoy identifications after re-ranking. Both the initial score and updated score are normalized into the interval [0,1] with a min-max normalization procedure.
Mentions: We also plot the initial score distribution and the updated score distribution in Fig.5. Here we use the min-max normalization to transform both the initial identification score and the new re-ranked score into the interval [0,1]. It reveals that the consistency constraint will shrink scores in each group (true and decoy) towards their mean value. Although the consistency-based re-ranking method cannot completely separate true identifications from decoys, it does reduce the score overlap on DS2, DS3 and DS4. Note that the consistency-based re-ranking procedure is less effective on DS1 since there is a serious score overlap. Even in this case, we find that the separation between true and decoy identifications is improved at lower score region.

Bottom Line: Thus, it is critical to develop new post-processing techniques that can distinguish true identifications from false identifications effectively.In this paper, we present a consistency-based PSM re-ranking method to improve the initial identification results.The score regularization method can be used as a general post-processing step for improving peptide identifications.

View Article: PubMed Central - HTML - PubMed

Affiliation: School of Software, Dalian University of Technology, Dalian, China. zyhe@dlut.edu.cn

ABSTRACT

Background: Peptide identification from tandem mass spectrometry (MS/MS) data is one of the most important problems in computational proteomics. This technique relies heavily on the accurate assessment of the quality of peptide-spectrum matches (PSMs). However, current MS technology and PSM scoring algorithm are far from perfect, leading to the generation of incorrect peptide-spectrum pairs. Thus, it is critical to develop new post-processing techniques that can distinguish true identifications from false identifications effectively.

Results: In this paper, we present a consistency-based PSM re-ranking method to improve the initial identification results. This method uses one additional assumption that two peptides belonging to the same protein should be correlated to each other. We formulate an optimization problem that embraces two objectives through regularization: the smoothing consistency among scores of correlated peptides and the fitting consistency between new scores and initial scores. This optimization problem can be solved analytically. The experimental study on several real MS/MS data sets shows that this re-ranking method improves the identification performance.

Conclusions: The score regularization method can be used as a general post-processing step for improving peptide identifications. Source codes and data sets are available at: http://bioinformatics.ust.hk/SRPI.rar.

Show MeSH