Limits...
Identifying and quantifying metabolites by scoring peaks of GC-MS data.

Aggio RB, Mayor A, Reade S, Probert CS, Ruggiero K - BMC Bioinformatics (2014)

Bottom Line: However, AMDIS generates a high number of false-positives and does not have an interface amenable for high-throughput data analysis.Although additional computational tools have been developed for processing AMDIS results and to perform normalisations and statistical analysis of metabolomics data, there is not yet a single free software or package able to reliably identify and quantify metabolites analysed by GC-MS.Here we present an algorithm implemented in a R package, which allows users to construct flexible pipelines and analyse metabolomics data in a high-throughput manner.

View Article: PubMed Central - PubMed

Affiliation: The University of Auckland, 3A Symonds Street, Auckland, 1142, New Zealand. ragg005@aucklanduni.ac.nz.

ABSTRACT

Background: Metabolomics is one of most recent omics technologies. It has been applied on fields such as food science, nutrition, drug discovery and systems biology. For this, gas chromatography-mass spectrometry (GC-MS) has been largely applied and many computational tools have been developed to support the analysis of metabolomics data. Among them, AMDIS is perhaps the most used tool for identifying and quantifying metabolites. However, AMDIS generates a high number of false-positives and does not have an interface amenable for high-throughput data analysis. Although additional computational tools have been developed for processing AMDIS results and to perform normalisations and statistical analysis of metabolomics data, there is not yet a single free software or package able to reliably identify and quantify metabolites analysed by GC-MS.

Results: Here we introduce a new algorithm, PScore, able to score peaks according to their likelihood of representing metabolites defined in a mass spectral library. We implemented PScore in a R package called MetaBox and evaluated the applicability and potential of MetaBox by comparing its performance against AMDIS results when analysing volatile organic compounds (VOC) from standard mixtures of metabolites and from female and male mice faecal samples. MetaBox reported lower percentages of false positives and false negatives, and was able to report a higher number of potential biomarkers associated to the metabolism of female and male mice.

Conclusions: Identification and quantification of metabolites is among the most critical and time-consuming steps in GC-MS metabolome analysis. Here we present an algorithm implemented in a R package, which allows users to construct flexible pipelines and analyse metabolomics data in a high-throughput manner.

Show MeSH
Average percentages of false positives and false negatives. A standard mixture containing 13 metabolites was divided in 10 aliquots and analysed by GC-MS. Each sample was then processed by MetaBox and AMDIS using match factors of 70, 80 and 90. Shown are the average percentages, plus error bars representing two times the standard error, of false positives and false negatives produced by each tool. False positives are compounds that are misidentified, while false negatives are unidentified compounds that are present in the standard mixtures.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4307155&req=5

Fig2: Average percentages of false positives and false negatives. A standard mixture containing 13 metabolites was divided in 10 aliquots and analysed by GC-MS. Each sample was then processed by MetaBox and AMDIS using match factors of 70, 80 and 90. Shown are the average percentages, plus error bars representing two times the standard error, of false positives and false negatives produced by each tool. False positives are compounds that are misidentified, while false negatives are unidentified compounds that are present in the standard mixtures.

Mentions: To enable the comparison of AMDIS’s and MetaBox’s efficacies in metabolite identification, we calculated the percentages of false positives and false negatives reported by each algorithm when analysing 10 samples of a standard mixture of metabolites (i.e. 5 samples of 50 μL and 5 of 100 μL), using match factors of f=70,80 and 90 for AMDIS; and match factor of f=70 and score cut of 13 for MetaBox. Every compound reported by AMDIS was considered in the analysis, including multiple identifications for a single RT. For f=70, AMDIS reported an average ± SE (n=10) of 32.8% ± 1.8% of false positives and an average of 6.9% ± 0.8% of false negatives. f=80 and 90 resulted in 30.3% ± 1.9% and 27.8% ± 1.0% of false positives, respectively, and 6.2% ± 1.0% and 4.6% ± 1.3% of false negatives, respectively (Figure 2). MetaBox performed overwhelming better than AMDIS, reporting no false positives and no false negatives.Figure 2


Identifying and quantifying metabolites by scoring peaks of GC-MS data.

Aggio RB, Mayor A, Reade S, Probert CS, Ruggiero K - BMC Bioinformatics (2014)

Average percentages of false positives and false negatives. A standard mixture containing 13 metabolites was divided in 10 aliquots and analysed by GC-MS. Each sample was then processed by MetaBox and AMDIS using match factors of 70, 80 and 90. Shown are the average percentages, plus error bars representing two times the standard error, of false positives and false negatives produced by each tool. False positives are compounds that are misidentified, while false negatives are unidentified compounds that are present in the standard mixtures.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4307155&req=5

Fig2: Average percentages of false positives and false negatives. A standard mixture containing 13 metabolites was divided in 10 aliquots and analysed by GC-MS. Each sample was then processed by MetaBox and AMDIS using match factors of 70, 80 and 90. Shown are the average percentages, plus error bars representing two times the standard error, of false positives and false negatives produced by each tool. False positives are compounds that are misidentified, while false negatives are unidentified compounds that are present in the standard mixtures.
Mentions: To enable the comparison of AMDIS’s and MetaBox’s efficacies in metabolite identification, we calculated the percentages of false positives and false negatives reported by each algorithm when analysing 10 samples of a standard mixture of metabolites (i.e. 5 samples of 50 μL and 5 of 100 μL), using match factors of f=70,80 and 90 for AMDIS; and match factor of f=70 and score cut of 13 for MetaBox. Every compound reported by AMDIS was considered in the analysis, including multiple identifications for a single RT. For f=70, AMDIS reported an average ± SE (n=10) of 32.8% ± 1.8% of false positives and an average of 6.9% ± 0.8% of false negatives. f=80 and 90 resulted in 30.3% ± 1.9% and 27.8% ± 1.0% of false positives, respectively, and 6.2% ± 1.0% and 4.6% ± 1.3% of false negatives, respectively (Figure 2). MetaBox performed overwhelming better than AMDIS, reporting no false positives and no false negatives.Figure 2

Bottom Line: However, AMDIS generates a high number of false-positives and does not have an interface amenable for high-throughput data analysis.Although additional computational tools have been developed for processing AMDIS results and to perform normalisations and statistical analysis of metabolomics data, there is not yet a single free software or package able to reliably identify and quantify metabolites analysed by GC-MS.Here we present an algorithm implemented in a R package, which allows users to construct flexible pipelines and analyse metabolomics data in a high-throughput manner.

View Article: PubMed Central - PubMed

Affiliation: The University of Auckland, 3A Symonds Street, Auckland, 1142, New Zealand. ragg005@aucklanduni.ac.nz.

ABSTRACT

Background: Metabolomics is one of most recent omics technologies. It has been applied on fields such as food science, nutrition, drug discovery and systems biology. For this, gas chromatography-mass spectrometry (GC-MS) has been largely applied and many computational tools have been developed to support the analysis of metabolomics data. Among them, AMDIS is perhaps the most used tool for identifying and quantifying metabolites. However, AMDIS generates a high number of false-positives and does not have an interface amenable for high-throughput data analysis. Although additional computational tools have been developed for processing AMDIS results and to perform normalisations and statistical analysis of metabolomics data, there is not yet a single free software or package able to reliably identify and quantify metabolites analysed by GC-MS.

Results: Here we introduce a new algorithm, PScore, able to score peaks according to their likelihood of representing metabolites defined in a mass spectral library. We implemented PScore in a R package called MetaBox and evaluated the applicability and potential of MetaBox by comparing its performance against AMDIS results when analysing volatile organic compounds (VOC) from standard mixtures of metabolites and from female and male mice faecal samples. MetaBox reported lower percentages of false positives and false negatives, and was able to report a higher number of potential biomarkers associated to the metabolism of female and male mice.

Conclusions: Identification and quantification of metabolites is among the most critical and time-consuming steps in GC-MS metabolome analysis. Here we present an algorithm implemented in a R package, which allows users to construct flexible pipelines and analyse metabolomics data in a high-throughput manner.

Show MeSH