Limits...
LC-MSsim--a simulation software for liquid chromatography mass spectrometry data.

Schulz-Trieglaff O, Pfeifer N, Gröpl C, Kohlbacher O, Reinert K - BMC Bioinformatics (2008)

Bottom Line: The data resulting from an LC-MS experiment is huge, highly complex and noisy.Algorithm developers can match the results of feature detection and alignment algorithms against the simulated ion lists and meaningful error rates can be computed.We anticipate that LC-MSsim will be useful to the wider community to perform benchmark studies and comparisons between computational tools.

View Article: PubMed Central - HTML - PubMed

Affiliation: International Max Planck Research School for Computational Biology and Scientific Computing, Berlin, Germany. trieglaf@inf.fu-berlin.de

ABSTRACT

Background: Mass Spectrometry coupled to Liquid Chromatography (LC-MS) is commonly used to analyze the protein content of biological samples in large scale studies. The data resulting from an LC-MS experiment is huge, highly complex and noisy. Accordingly, it has sparked new developments in Bioinformatics, especially in the fields of algorithm development, statistics and software engineering. In a quantitative label-free mass spectrometry experiment, crucial steps are the detection of peptide features in the mass spectra and the alignment of samples by correcting for shifts in retention time. At the moment, it is difficult to compare the plethora of algorithms for these tasks. So far, curated benchmark data exists only for peptide identification algorithms but no data that represents a ground truth for the evaluation of feature detection, alignment and filtering algorithms.

Results: We present LC-MSsim, a simulation software for LC-ESI-MS experiments. It simulates ESI spectra on the MS level. It reads a list of proteins from a FASTA file and digests the protein mixture using a user-defined enzyme. The software creates an LC-MS data set using a predictor for the retention time of the peptides and a model for peak shapes and elution profiles of the mass spectral peaks. Our software also offers the possibility to add contaminants, to change the background noise level and includes a model for the detectability of peptides in mass spectra. After the simulation, LC-MSsim writes the simulated data to mzData, a public XML format. The software also stores the positions (monoisotopic m/z and retention time) and ion counts of the simulated ions in separate files.

Conclusion: LC-MSsim generates simulated LC-MS data sets and incorporates models for peak shapes and contaminations. Algorithm developers can match the results of feature detection and alignment algorithms against the simulated ion lists and meaningful error rates can be computed. We anticipate that LC-MSsim will be useful to the wider community to perform benchmark studies and comparisons between computational tools.

Show MeSH
Simulated mass spectrum taken from Mouse LC-MS map. A mass spectrum (retention time 2504.98) from the LC-MS map of Mouse proteins.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2577660&req=5

Figure 9: Simulated mass spectrum taken from Mouse LC-MS map. A mass spectrum (retention time 2504.98) from the LC-MS map of Mouse proteins.

Mentions: We downloaded the Mouse IPI protein sequence set (08.04.2008) and randomly selected 100 protein sequences from this set. A tryptic digest and filtering for detectability at a threshold of 0.8 resulted in 820 peptides. The chosen threshold corresponds to a False Discovery Rate of 10%. We opted for this mixture of moderate complexity to avoid a high number of overlapping peptides. Still, manual annotations of all these data sets would be tedious. In our first experiment, our goal was to determine to what extent the performance of current feature detection algorithms depends on the mass resolution of the instrument. We simulated different mass resolutions by changing the FWHM of the peptide isotopic pattern. We generated data sets for FWHM values of 0.05, 0.2, 0.5 and 0.8. A peak FWHM of 0.05 roughly corresponds to an Orbitrap instrument whereas the 0.8 results in spectra similar to typical ion trap measurements. To each data set, we added shot noise with a mean intensity of 150 and a Poisson rate of 450. This noise level was chosen such that all peptide signals would be well above the noise level. The challenge of this benchmark was to detect poorly resolved and possibly overlapping peptide signals. The full result lists are contained in the supplemental material [see Additional file 2] as well as the settings for each feature detection algorithm [see Additional file 3]. We also described our strategy to find suitable parameters for each algorithm [see Additional file 1]. Fig. 8 and Fig. 9 show the simulated LC-MS run for FWHM 0.05 and the spectrum at retention time 2504.98, respectively. All simulated LC-MS maps were uploaded to the PRIDE database and are available under the accession numbers 8161 to 8168 inclusive.


LC-MSsim--a simulation software for liquid chromatography mass spectrometry data.

Schulz-Trieglaff O, Pfeifer N, Gröpl C, Kohlbacher O, Reinert K - BMC Bioinformatics (2008)

Simulated mass spectrum taken from Mouse LC-MS map. A mass spectrum (retention time 2504.98) from the LC-MS map of Mouse proteins.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2577660&req=5

Figure 9: Simulated mass spectrum taken from Mouse LC-MS map. A mass spectrum (retention time 2504.98) from the LC-MS map of Mouse proteins.
Mentions: We downloaded the Mouse IPI protein sequence set (08.04.2008) and randomly selected 100 protein sequences from this set. A tryptic digest and filtering for detectability at a threshold of 0.8 resulted in 820 peptides. The chosen threshold corresponds to a False Discovery Rate of 10%. We opted for this mixture of moderate complexity to avoid a high number of overlapping peptides. Still, manual annotations of all these data sets would be tedious. In our first experiment, our goal was to determine to what extent the performance of current feature detection algorithms depends on the mass resolution of the instrument. We simulated different mass resolutions by changing the FWHM of the peptide isotopic pattern. We generated data sets for FWHM values of 0.05, 0.2, 0.5 and 0.8. A peak FWHM of 0.05 roughly corresponds to an Orbitrap instrument whereas the 0.8 results in spectra similar to typical ion trap measurements. To each data set, we added shot noise with a mean intensity of 150 and a Poisson rate of 450. This noise level was chosen such that all peptide signals would be well above the noise level. The challenge of this benchmark was to detect poorly resolved and possibly overlapping peptide signals. The full result lists are contained in the supplemental material [see Additional file 2] as well as the settings for each feature detection algorithm [see Additional file 3]. We also described our strategy to find suitable parameters for each algorithm [see Additional file 1]. Fig. 8 and Fig. 9 show the simulated LC-MS run for FWHM 0.05 and the spectrum at retention time 2504.98, respectively. All simulated LC-MS maps were uploaded to the PRIDE database and are available under the accession numbers 8161 to 8168 inclusive.

Bottom Line: The data resulting from an LC-MS experiment is huge, highly complex and noisy.Algorithm developers can match the results of feature detection and alignment algorithms against the simulated ion lists and meaningful error rates can be computed.We anticipate that LC-MSsim will be useful to the wider community to perform benchmark studies and comparisons between computational tools.

View Article: PubMed Central - HTML - PubMed

Affiliation: International Max Planck Research School for Computational Biology and Scientific Computing, Berlin, Germany. trieglaf@inf.fu-berlin.de

ABSTRACT

Background: Mass Spectrometry coupled to Liquid Chromatography (LC-MS) is commonly used to analyze the protein content of biological samples in large scale studies. The data resulting from an LC-MS experiment is huge, highly complex and noisy. Accordingly, it has sparked new developments in Bioinformatics, especially in the fields of algorithm development, statistics and software engineering. In a quantitative label-free mass spectrometry experiment, crucial steps are the detection of peptide features in the mass spectra and the alignment of samples by correcting for shifts in retention time. At the moment, it is difficult to compare the plethora of algorithms for these tasks. So far, curated benchmark data exists only for peptide identification algorithms but no data that represents a ground truth for the evaluation of feature detection, alignment and filtering algorithms.

Results: We present LC-MSsim, a simulation software for LC-ESI-MS experiments. It simulates ESI spectra on the MS level. It reads a list of proteins from a FASTA file and digests the protein mixture using a user-defined enzyme. The software creates an LC-MS data set using a predictor for the retention time of the peptides and a model for peak shapes and elution profiles of the mass spectral peaks. Our software also offers the possibility to add contaminants, to change the background noise level and includes a model for the detectability of peptides in mass spectra. After the simulation, LC-MSsim writes the simulated data to mzData, a public XML format. The software also stores the positions (monoisotopic m/z and retention time) and ion counts of the simulated ions in separate files.

Conclusion: LC-MSsim generates simulated LC-MS data sets and incorporates models for peak shapes and contaminations. Algorithm developers can match the results of feature detection and alignment algorithms against the simulated ion lists and meaningful error rates can be computed. We anticipate that LC-MSsim will be useful to the wider community to perform benchmark studies and comparisons between computational tools.

Show MeSH