Limits...
LC-MSsim--a simulation software for liquid chromatography mass spectrometry data.

Schulz-Trieglaff O, Pfeifer N, Gröpl C, Kohlbacher O, Reinert K - BMC Bioinformatics (2008)

Bottom Line: The data resulting from an LC-MS experiment is huge, highly complex and noisy.Algorithm developers can match the results of feature detection and alignment algorithms against the simulated ion lists and meaningful error rates can be computed.We anticipate that LC-MSsim will be useful to the wider community to perform benchmark studies and comparisons between computational tools.

View Article: PubMed Central - HTML - PubMed

Affiliation: International Max Planck Research School for Computational Biology and Scientific Computing, Berlin, Germany. trieglaf@inf.fu-berlin.de

ABSTRACT

Background: Mass Spectrometry coupled to Liquid Chromatography (LC-MS) is commonly used to analyze the protein content of biological samples in large scale studies. The data resulting from an LC-MS experiment is huge, highly complex and noisy. Accordingly, it has sparked new developments in Bioinformatics, especially in the fields of algorithm development, statistics and software engineering. In a quantitative label-free mass spectrometry experiment, crucial steps are the detection of peptide features in the mass spectra and the alignment of samples by correcting for shifts in retention time. At the moment, it is difficult to compare the plethora of algorithms for these tasks. So far, curated benchmark data exists only for peptide identification algorithms but no data that represents a ground truth for the evaluation of feature detection, alignment and filtering algorithms.

Results: We present LC-MSsim, a simulation software for LC-ESI-MS experiments. It simulates ESI spectra on the MS level. It reads a list of proteins from a FASTA file and digests the protein mixture using a user-defined enzyme. The software creates an LC-MS data set using a predictor for the retention time of the peptides and a model for peak shapes and elution profiles of the mass spectral peaks. Our software also offers the possibility to add contaminants, to change the background noise level and includes a model for the detectability of peptides in mass spectra. After the simulation, LC-MSsim writes the simulated data to mzData, a public XML format. The software also stores the positions (monoisotopic m/z and retention time) and ion counts of the simulated ions in separate files.

Conclusion: LC-MSsim generates simulated LC-MS data sets and incorporates models for peak shapes and contaminations. Algorithm developers can match the results of feature detection and alignment algorithms against the simulated ion lists and meaningful error rates can be computed. We anticipate that LC-MSsim will be useful to the wider community to perform benchmark studies and comparisons between computational tools.

Show MeSH

Related in: MedlinePlus

Running times of the peptide feature detection algorithms. Running times of all feature detection algorithms on the four data sets with different mass resolutions. Although the running times are comparable, the software Superhirn is the fastest.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2577660&req=5

Figure 10: Running times of the peptide feature detection algorithms. Running times of all feature detection algorithms on the four data sets with different mass resolutions. Although the running times are comparable, the software Superhirn is the fastest.

Mentions: Fig. 10 displays the running times of all algorithms on each data set. The time measurements were performed on a 3.2 GHz Intel Xeon CPU with 3 GB memory running Debian or Windows Server 2003 R2 (in the case of Decon2LS, mzMine and msInspect). Note that the running times of Decon2LS and MZmine are approximate results only, since both tools are GUI-based and therefore do not allow direct time measurements. Superhirn stands out as the fastest algorithm, whereas all other tools yield similar running times. Decon2LS is a bit slower than the rest, but not significantly. To summarize, different algorithms have different strengths: some recover nearly all true signals even under poor conditions but at the expense of large numbers of false positive hits. One might argue that many of this false positive signals could be removed by removing features of low intensity or of unlikely masses. But this clearly has its disadvantages if we examine complex mixtures with large dynamic ranges and many compounds at low intensities.


LC-MSsim--a simulation software for liquid chromatography mass spectrometry data.

Schulz-Trieglaff O, Pfeifer N, Gröpl C, Kohlbacher O, Reinert K - BMC Bioinformatics (2008)

Running times of the peptide feature detection algorithms. Running times of all feature detection algorithms on the four data sets with different mass resolutions. Although the running times are comparable, the software Superhirn is the fastest.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2577660&req=5

Figure 10: Running times of the peptide feature detection algorithms. Running times of all feature detection algorithms on the four data sets with different mass resolutions. Although the running times are comparable, the software Superhirn is the fastest.
Mentions: Fig. 10 displays the running times of all algorithms on each data set. The time measurements were performed on a 3.2 GHz Intel Xeon CPU with 3 GB memory running Debian or Windows Server 2003 R2 (in the case of Decon2LS, mzMine and msInspect). Note that the running times of Decon2LS and MZmine are approximate results only, since both tools are GUI-based and therefore do not allow direct time measurements. Superhirn stands out as the fastest algorithm, whereas all other tools yield similar running times. Decon2LS is a bit slower than the rest, but not significantly. To summarize, different algorithms have different strengths: some recover nearly all true signals even under poor conditions but at the expense of large numbers of false positive hits. One might argue that many of this false positive signals could be removed by removing features of low intensity or of unlikely masses. But this clearly has its disadvantages if we examine complex mixtures with large dynamic ranges and many compounds at low intensities.

Bottom Line: The data resulting from an LC-MS experiment is huge, highly complex and noisy.Algorithm developers can match the results of feature detection and alignment algorithms against the simulated ion lists and meaningful error rates can be computed.We anticipate that LC-MSsim will be useful to the wider community to perform benchmark studies and comparisons between computational tools.

View Article: PubMed Central - HTML - PubMed

Affiliation: International Max Planck Research School for Computational Biology and Scientific Computing, Berlin, Germany. trieglaf@inf.fu-berlin.de

ABSTRACT

Background: Mass Spectrometry coupled to Liquid Chromatography (LC-MS) is commonly used to analyze the protein content of biological samples in large scale studies. The data resulting from an LC-MS experiment is huge, highly complex and noisy. Accordingly, it has sparked new developments in Bioinformatics, especially in the fields of algorithm development, statistics and software engineering. In a quantitative label-free mass spectrometry experiment, crucial steps are the detection of peptide features in the mass spectra and the alignment of samples by correcting for shifts in retention time. At the moment, it is difficult to compare the plethora of algorithms for these tasks. So far, curated benchmark data exists only for peptide identification algorithms but no data that represents a ground truth for the evaluation of feature detection, alignment and filtering algorithms.

Results: We present LC-MSsim, a simulation software for LC-ESI-MS experiments. It simulates ESI spectra on the MS level. It reads a list of proteins from a FASTA file and digests the protein mixture using a user-defined enzyme. The software creates an LC-MS data set using a predictor for the retention time of the peptides and a model for peak shapes and elution profiles of the mass spectral peaks. Our software also offers the possibility to add contaminants, to change the background noise level and includes a model for the detectability of peptides in mass spectra. After the simulation, LC-MSsim writes the simulated data to mzData, a public XML format. The software also stores the positions (monoisotopic m/z and retention time) and ion counts of the simulated ions in separate files.

Conclusion: LC-MSsim generates simulated LC-MS data sets and incorporates models for peak shapes and contaminations. Algorithm developers can match the results of feature detection and alignment algorithms against the simulated ion lists and meaningful error rates can be computed. We anticipate that LC-MSsim will be useful to the wider community to perform benchmark studies and comparisons between computational tools.

Show MeSH
Related in: MedlinePlus