Limits...
Tandem mass spectrometry data quality assessment by self-convolution.

Choo KW, Tham WM - BMC Bioinformatics (2007)

Bottom Line: The quality score was defined as the ratio of the intensity at the mid point to the remaining peaks of the convolution result.The results were encouraging, revealing a high percentage of positive prediction rates for spectra with good quality scores.By pre-determining the quality of tandem MS data before subjecting them to protein identification algorithms, spurious protein predictions due to poor tandem MS data are avoided, giving scientists greater confidence in the predicted results.

View Article: PubMed Central - HTML - PubMed

Affiliation: Bioinformatics Group, Nanyang Polytechnic, 569830 Singapore, Republic Of Singapore. choo_keng_wah@nyp.edu.sg

ABSTRACT

Background: Many algorithms have been developed for deciphering the tandem mass spectrometry (MS) data sets. They can be essentially clustered into two classes. The first performs searches on theoretical mass spectrum database, while the second based itself on de novo sequencing from raw mass spectrometry data. It was noted that the quality of mass spectra affects significantly the protein identification processes in both instances. This prompted the authors to explore ways to measure the quality of MS data sets before subjecting them to the protein identification algorithms, thus allowing for more meaningful searches and increased confidence level of proteins identified.

Results: The proposed method measures the qualities of MS data sets based on the symmetric property of b- and y-ion peaks present in a MS spectrum. Self-convolution on MS data and its time-reversal copy was employed. Due to the symmetric nature of b-ions and y-ions peaks, the self-convolution result of a good spectrum would produce a highest mid point intensity peak. To reduce processing time, self-convolution was achieved using Fast Fourier Transform and its inverse transform, followed by the removal of the "DC" (Direct Current) component and the normalisation of the data set. The quality score was defined as the ratio of the intensity at the mid point to the remaining peaks of the convolution result. The method was validated using both theoretical mass spectra, with various permutations, and several real MS data sets. The results were encouraging, revealing a high percentage of positive prediction rates for spectra with good quality scores.

Conclusion: We have demonstrated in this work a method for determining the quality of tandem MS data set. By pre-determining the quality of tandem MS data before subjecting them to protein identification algorithms, spurious protein predictions due to poor tandem MS data are avoided, giving scientists greater confidence in the predicted results. We conclude that the algorithm performs well and could potentially be used as a pre-processing for all mass spectrometry based protein identification tools.

Show MeSH
Self-convolution plot for noise amplitude = 1. This figure shows the result of self-convolution when noise peaks of amplitude 1 is added to the theoretical tandem MS.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2164967&req=5

Figure 7: Self-convolution plot for noise amplitude = 1. This figure shows the result of self-convolution when noise peaks of amplitude 1 is added to the theoretical tandem MS.

Mentions: A plot of these b-ions and y-ions and the self-convolution values are shown in the Fig. 7. From this figure, we observed a high peak occurs at the mid-point of the self-convolution, where the b-ions (bn, bn-1, bn-2, ... b2) align with corresponding y-ions (y2, y3, y4, ... yn). However, it is also noted that the cumulating sum of the product of all the points steadily increases from 0 to the mid-point and reducing thereof, forming a triangle below the peaks. This is potentially damaging to the detection of the peaks especially when significant noise levels are present, compounded by low intensity of b-ions and/or y-ions peaks and missing peaks, as we will demonstrate later. To determine the effects of increasing noise levels, we change the noise level to 10 as shown below.


Tandem mass spectrometry data quality assessment by self-convolution.

Choo KW, Tham WM - BMC Bioinformatics (2007)

Self-convolution plot for noise amplitude = 1. This figure shows the result of self-convolution when noise peaks of amplitude 1 is added to the theoretical tandem MS.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2164967&req=5

Figure 7: Self-convolution plot for noise amplitude = 1. This figure shows the result of self-convolution when noise peaks of amplitude 1 is added to the theoretical tandem MS.
Mentions: A plot of these b-ions and y-ions and the self-convolution values are shown in the Fig. 7. From this figure, we observed a high peak occurs at the mid-point of the self-convolution, where the b-ions (bn, bn-1, bn-2, ... b2) align with corresponding y-ions (y2, y3, y4, ... yn). However, it is also noted that the cumulating sum of the product of all the points steadily increases from 0 to the mid-point and reducing thereof, forming a triangle below the peaks. This is potentially damaging to the detection of the peaks especially when significant noise levels are present, compounded by low intensity of b-ions and/or y-ions peaks and missing peaks, as we will demonstrate later. To determine the effects of increasing noise levels, we change the noise level to 10 as shown below.

Bottom Line: The quality score was defined as the ratio of the intensity at the mid point to the remaining peaks of the convolution result.The results were encouraging, revealing a high percentage of positive prediction rates for spectra with good quality scores.By pre-determining the quality of tandem MS data before subjecting them to protein identification algorithms, spurious protein predictions due to poor tandem MS data are avoided, giving scientists greater confidence in the predicted results.

View Article: PubMed Central - HTML - PubMed

Affiliation: Bioinformatics Group, Nanyang Polytechnic, 569830 Singapore, Republic Of Singapore. choo_keng_wah@nyp.edu.sg

ABSTRACT

Background: Many algorithms have been developed for deciphering the tandem mass spectrometry (MS) data sets. They can be essentially clustered into two classes. The first performs searches on theoretical mass spectrum database, while the second based itself on de novo sequencing from raw mass spectrometry data. It was noted that the quality of mass spectra affects significantly the protein identification processes in both instances. This prompted the authors to explore ways to measure the quality of MS data sets before subjecting them to the protein identification algorithms, thus allowing for more meaningful searches and increased confidence level of proteins identified.

Results: The proposed method measures the qualities of MS data sets based on the symmetric property of b- and y-ion peaks present in a MS spectrum. Self-convolution on MS data and its time-reversal copy was employed. Due to the symmetric nature of b-ions and y-ions peaks, the self-convolution result of a good spectrum would produce a highest mid point intensity peak. To reduce processing time, self-convolution was achieved using Fast Fourier Transform and its inverse transform, followed by the removal of the "DC" (Direct Current) component and the normalisation of the data set. The quality score was defined as the ratio of the intensity at the mid point to the remaining peaks of the convolution result. The method was validated using both theoretical mass spectra, with various permutations, and several real MS data sets. The results were encouraging, revealing a high percentage of positive prediction rates for spectra with good quality scores.

Conclusion: We have demonstrated in this work a method for determining the quality of tandem MS data set. By pre-determining the quality of tandem MS data before subjecting them to protein identification algorithms, spurious protein predictions due to poor tandem MS data are avoided, giving scientists greater confidence in the predicted results. We conclude that the algorithm performs well and could potentially be used as a pre-processing for all mass spectrometry based protein identification tools.

Show MeSH