Limits...
SAMPI: protein identification with mass spectra alignments.

Kaltenbach HM, Wilke A, Böcker S - BMC Bioinformatics (2007)

Bottom Line: A protein is digested (usually by trypsin) and its mass spectrum is compared to simulated spectra for protein sequences in a database.We prove the applicability of our approach using biological mass spectrometry data and compare our results to the standard software Mascot.Introducing more noise peaks, we are able to keep identification rates at a similar level by using the flexibility introduced by scoring schemes.

View Article: PubMed Central - HTML - PubMed

Affiliation: AG Genominformatik, Technische Fakultät, Universität Bielefeld, Bielefeld, Germany. michael@cebitec.uni-bielefeld.de

ABSTRACT

Background: Mass spectrometry based peptide mass fingerprints (PMFs) offer a fast, efficient, and robust method for protein identification. A protein is digested (usually by trypsin) and its mass spectrum is compared to simulated spectra for protein sequences in a database. However, existing tools for analyzing PMFs often suffer from missing or heuristic analysis of the significance of search results and insufficient handling of missing and additional peaks.

Results: We present an unified framework for analyzing Peptide Mass Fingerprints that offers a number of advantages over existing methods: First, comparison of mass spectra is based on a scoring function that can be custom-designed for certain applications and explicitly takes missing and additional peaks into account. The method is able to simulate almost every additive scoring scheme. Second, we present an efficient deterministic method for assessing the significance of a protein hit, independent of the underlying scoring function and sequence database. We prove the applicability of our approach using biological mass spectrometry data and compare our results to the standard software Mascot.

Conclusion: The proposed framework for analyzing Peptide Mass Fingerprints shows performance comparable to Mascot on small peak lists. Introducing more noise peaks, we are able to keep identification rates at a similar level by using the flexibility introduced by scoring schemes.

Show MeSH

Related in: MedlinePlus

Alignment score distribution. B: Solid line: Densities of empirical alignment score distribution using 10,000 randomly generated protein sequences of length 250 with SwissProt amino acid frequencies. Dashed line: Density of approximating normal distribution with parameters computed as described in the text. Both alignments for one measured spectrum and SAMPI score with parameter set B.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1851022&req=5

Figure 7: Alignment score distribution. B: Solid line: Densities of empirical alignment score distribution using 10,000 randomly generated protein sequences of length 250 with SwissProt amino acid frequencies. Dashed line: Density of approximating normal distribution with parameters computed as described in the text. Both alignments for one measured spectrum and SAMPI score with parameter set B.

Mentions: We tested the two assumptions using two different parameter sets A and B for the Gaussian score given in Table 4, and computing the alignment scores of 10,000 random amino acid sequences of length 250 and a randomly chosen measured spectrum from our dataset. We found the estimated distributions in good agreement with their empirical counterparts, as shown in Figures 6 and 7.


SAMPI: protein identification with mass spectra alignments.

Kaltenbach HM, Wilke A, Böcker S - BMC Bioinformatics (2007)

Alignment score distribution. B: Solid line: Densities of empirical alignment score distribution using 10,000 randomly generated protein sequences of length 250 with SwissProt amino acid frequencies. Dashed line: Density of approximating normal distribution with parameters computed as described in the text. Both alignments for one measured spectrum and SAMPI score with parameter set B.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1851022&req=5

Figure 7: Alignment score distribution. B: Solid line: Densities of empirical alignment score distribution using 10,000 randomly generated protein sequences of length 250 with SwissProt amino acid frequencies. Dashed line: Density of approximating normal distribution with parameters computed as described in the text. Both alignments for one measured spectrum and SAMPI score with parameter set B.
Mentions: We tested the two assumptions using two different parameter sets A and B for the Gaussian score given in Table 4, and computing the alignment scores of 10,000 random amino acid sequences of length 250 and a randomly chosen measured spectrum from our dataset. We found the estimated distributions in good agreement with their empirical counterparts, as shown in Figures 6 and 7.

Bottom Line: A protein is digested (usually by trypsin) and its mass spectrum is compared to simulated spectra for protein sequences in a database.We prove the applicability of our approach using biological mass spectrometry data and compare our results to the standard software Mascot.Introducing more noise peaks, we are able to keep identification rates at a similar level by using the flexibility introduced by scoring schemes.

View Article: PubMed Central - HTML - PubMed

Affiliation: AG Genominformatik, Technische Fakultät, Universität Bielefeld, Bielefeld, Germany. michael@cebitec.uni-bielefeld.de

ABSTRACT

Background: Mass spectrometry based peptide mass fingerprints (PMFs) offer a fast, efficient, and robust method for protein identification. A protein is digested (usually by trypsin) and its mass spectrum is compared to simulated spectra for protein sequences in a database. However, existing tools for analyzing PMFs often suffer from missing or heuristic analysis of the significance of search results and insufficient handling of missing and additional peaks.

Results: We present an unified framework for analyzing Peptide Mass Fingerprints that offers a number of advantages over existing methods: First, comparison of mass spectra is based on a scoring function that can be custom-designed for certain applications and explicitly takes missing and additional peaks into account. The method is able to simulate almost every additive scoring scheme. Second, we present an efficient deterministic method for assessing the significance of a protein hit, independent of the underlying scoring function and sequence database. We prove the applicability of our approach using biological mass spectrometry data and compare our results to the standard software Mascot.

Conclusion: The proposed framework for analyzing Peptide Mass Fingerprints shows performance comparable to Mascot on small peak lists. Introducing more noise peaks, we are able to keep identification rates at a similar level by using the flexibility introduced by scoring schemes.

Show MeSH
Related in: MedlinePlus