Limits...
Metabolite signal identification in accurate mass metabolomics data with MZedDB, an interactive m/z annotation tool utilising predicted ionisation behaviour 'rules'.

Draper J, Enot DP, Parker D, Beckmann M, Snowdon S, Lin W, Zubair H - BMC Bioinformatics (2009)

Bottom Line: In reality the annotation process is confounded by the fact that many ionisation products will be not only molecular isotopes but also salt/solvent adducts and neutral loss fragments of original metabolites.We conclude that although ultra-high accurate mass instruments provide major insight into the chemical diversity of biological extracts, the facile annotation of a large proportion of signals is not possible by simple, automated query of current databases using computed molecular formulae.Parameterising MZedDB to take into account predicted ionisation behaviour and the biological source of any sample improves greatly both the frequency and accuracy of potential annotation 'hits' in ESI-MS data.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Biological Environmental and Rural Sciences, Aberystwyth University, Aberystwyth SY23 3DA, UK. jhd@aber.ac.uk

ABSTRACT

Background: Metabolomics experiments using Mass Spectrometry (MS) technology measure the mass to charge ratio (m/z) and intensity of ionised molecules in crude extracts of complex biological samples to generate high dimensional metabolite 'fingerprint' or metabolite 'profile' data. High resolution MS instruments perform routinely with a mass accuracy of < 5 ppm (parts per million) thus providing potentially a direct method for signal putative annotation using databases containing metabolite mass information. Most database interfaces support only simple queries with the default assumption that molecules either gain or lose a single proton when ionised. In reality the annotation process is confounded by the fact that many ionisation products will be not only molecular isotopes but also salt/solvent adducts and neutral loss fragments of original metabolites. This report describes an annotation strategy that will allow searching based on all potential ionisation products predicted to form during electrospray ionisation (ESI).

Results: Metabolite 'structures' harvested from publicly accessible databases were converted into a common format to generate a comprehensive archive in MZedDB. 'Rules' were derived from chemical information that allowed MZedDB to generate a list of adducts and neutral loss fragments putatively able to form for each structure and calculate, on the fly, the exact molecular weight of every potential ionisation product to provide targets for annotation searches based on accurate mass. We demonstrate that data matrices representing populations of ionisation products generated from different biological matrices contain a large proportion (sometimes > 50%) of molecular isotopes, salt adducts and neutral loss fragments. Correlation analysis of ESI-MS data features confirmed the predicted relationships of m/z signals. An integrated isotope enumerator in MZedDB allowed verification of exact isotopic pattern distributions to corroborate experimental data.

Conclusion: We conclude that although ultra-high accurate mass instruments provide major insight into the chemical diversity of biological extracts, the facile annotation of a large proportion of signals is not possible by simple, automated query of current databases using computed molecular formulae. Parameterising MZedDB to take into account predicted ionisation behaviour and the biological source of any sample improves greatly both the frequency and accuracy of potential annotation 'hits' in ESI-MS data.

Show MeSH

Related in: MedlinePlus

Correlation analysis and mathematical relationships of explanatory signals discriminating healthy from diseased Brachypodium leaves. The left-hand panel displays the results of feature selection (all < P = 0.001, in descending rank order) in Random Forest classification models comparing FT-ICR-MS spectra of control Brachypodium distachyon leaves and plants 96 hours after challenge with a virulent strain of the rice blast fungus. The right hand panel shows example correlation clusters after a hierarchical cluster analysis (HCA) of the metabolome features (shown colour coded) in the left hand panel. The Pearson correlation coefficients are indicated for all combinations of ions in each cluster and the boxes below indicate accurate mass differences, predicted relationships and an annotation guide.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2721842&req=5

Figure 3: Correlation analysis and mathematical relationships of explanatory signals discriminating healthy from diseased Brachypodium leaves. The left-hand panel displays the results of feature selection (all < P = 0.001, in descending rank order) in Random Forest classification models comparing FT-ICR-MS spectra of control Brachypodium distachyon leaves and plants 96 hours after challenge with a virulent strain of the rice blast fungus. The right hand panel shows example correlation clusters after a hierarchical cluster analysis (HCA) of the metabolome features (shown colour coded) in the left hand panel. The Pearson correlation coefficients are indicated for all combinations of ions in each cluster and the boxes below indicate accurate mass differences, predicted relationships and an annotation guide.

Mentions: Signals derived from the same parent metabolite will not only exhibit strict mathematical relationships, but, when two relatively similar matrices are compared, the behaviour of related ions should also be correlated in terms of their intensity relationships. The left hand panel of Figure 3 shows a ranked list of the top (p = < 0.0001) 'explanatory' positive ion m/z signals discriminating healthy from infected Brachypodium distachyon leaves 96 hours after infection with the rice blast fungus [43]. A hierarchical cluster analysis revealed that many of the signals fell into small clusters (colour coded) of highly correlated m/z (right hand panel of Figure 3). A simple calculation of the accurate mass differences between individual pairs of correlated signals indicates their likely relationships allowing any annotation suggestions to focus on potentially the correct ionisation product. For example, annotation of signals present in Cluster 2 should focus on [M+Na]1+ or [M+K]1+ adducts which are likely to be derived from proline betaine (M = 143.0946). Notably the potential ionisation products with masses of 183.061220 and 167.087280, which are both predicted to be isotopes, had no matches in MZedDB (even at 20 ppm) and so would have been uninformative if pursued further.


Metabolite signal identification in accurate mass metabolomics data with MZedDB, an interactive m/z annotation tool utilising predicted ionisation behaviour 'rules'.

Draper J, Enot DP, Parker D, Beckmann M, Snowdon S, Lin W, Zubair H - BMC Bioinformatics (2009)

Correlation analysis and mathematical relationships of explanatory signals discriminating healthy from diseased Brachypodium leaves. The left-hand panel displays the results of feature selection (all < P = 0.001, in descending rank order) in Random Forest classification models comparing FT-ICR-MS spectra of control Brachypodium distachyon leaves and plants 96 hours after challenge with a virulent strain of the rice blast fungus. The right hand panel shows example correlation clusters after a hierarchical cluster analysis (HCA) of the metabolome features (shown colour coded) in the left hand panel. The Pearson correlation coefficients are indicated for all combinations of ions in each cluster and the boxes below indicate accurate mass differences, predicted relationships and an annotation guide.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2721842&req=5

Figure 3: Correlation analysis and mathematical relationships of explanatory signals discriminating healthy from diseased Brachypodium leaves. The left-hand panel displays the results of feature selection (all < P = 0.001, in descending rank order) in Random Forest classification models comparing FT-ICR-MS spectra of control Brachypodium distachyon leaves and plants 96 hours after challenge with a virulent strain of the rice blast fungus. The right hand panel shows example correlation clusters after a hierarchical cluster analysis (HCA) of the metabolome features (shown colour coded) in the left hand panel. The Pearson correlation coefficients are indicated for all combinations of ions in each cluster and the boxes below indicate accurate mass differences, predicted relationships and an annotation guide.
Mentions: Signals derived from the same parent metabolite will not only exhibit strict mathematical relationships, but, when two relatively similar matrices are compared, the behaviour of related ions should also be correlated in terms of their intensity relationships. The left hand panel of Figure 3 shows a ranked list of the top (p = < 0.0001) 'explanatory' positive ion m/z signals discriminating healthy from infected Brachypodium distachyon leaves 96 hours after infection with the rice blast fungus [43]. A hierarchical cluster analysis revealed that many of the signals fell into small clusters (colour coded) of highly correlated m/z (right hand panel of Figure 3). A simple calculation of the accurate mass differences between individual pairs of correlated signals indicates their likely relationships allowing any annotation suggestions to focus on potentially the correct ionisation product. For example, annotation of signals present in Cluster 2 should focus on [M+Na]1+ or [M+K]1+ adducts which are likely to be derived from proline betaine (M = 143.0946). Notably the potential ionisation products with masses of 183.061220 and 167.087280, which are both predicted to be isotopes, had no matches in MZedDB (even at 20 ppm) and so would have been uninformative if pursued further.

Bottom Line: In reality the annotation process is confounded by the fact that many ionisation products will be not only molecular isotopes but also salt/solvent adducts and neutral loss fragments of original metabolites.We conclude that although ultra-high accurate mass instruments provide major insight into the chemical diversity of biological extracts, the facile annotation of a large proportion of signals is not possible by simple, automated query of current databases using computed molecular formulae.Parameterising MZedDB to take into account predicted ionisation behaviour and the biological source of any sample improves greatly both the frequency and accuracy of potential annotation 'hits' in ESI-MS data.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Biological Environmental and Rural Sciences, Aberystwyth University, Aberystwyth SY23 3DA, UK. jhd@aber.ac.uk

ABSTRACT

Background: Metabolomics experiments using Mass Spectrometry (MS) technology measure the mass to charge ratio (m/z) and intensity of ionised molecules in crude extracts of complex biological samples to generate high dimensional metabolite 'fingerprint' or metabolite 'profile' data. High resolution MS instruments perform routinely with a mass accuracy of < 5 ppm (parts per million) thus providing potentially a direct method for signal putative annotation using databases containing metabolite mass information. Most database interfaces support only simple queries with the default assumption that molecules either gain or lose a single proton when ionised. In reality the annotation process is confounded by the fact that many ionisation products will be not only molecular isotopes but also salt/solvent adducts and neutral loss fragments of original metabolites. This report describes an annotation strategy that will allow searching based on all potential ionisation products predicted to form during electrospray ionisation (ESI).

Results: Metabolite 'structures' harvested from publicly accessible databases were converted into a common format to generate a comprehensive archive in MZedDB. 'Rules' were derived from chemical information that allowed MZedDB to generate a list of adducts and neutral loss fragments putatively able to form for each structure and calculate, on the fly, the exact molecular weight of every potential ionisation product to provide targets for annotation searches based on accurate mass. We demonstrate that data matrices representing populations of ionisation products generated from different biological matrices contain a large proportion (sometimes > 50%) of molecular isotopes, salt adducts and neutral loss fragments. Correlation analysis of ESI-MS data features confirmed the predicted relationships of m/z signals. An integrated isotope enumerator in MZedDB allowed verification of exact isotopic pattern distributions to corroborate experimental data.

Conclusion: We conclude that although ultra-high accurate mass instruments provide major insight into the chemical diversity of biological extracts, the facile annotation of a large proportion of signals is not possible by simple, automated query of current databases using computed molecular formulae. Parameterising MZedDB to take into account predicted ionisation behaviour and the biological source of any sample improves greatly both the frequency and accuracy of potential annotation 'hits' in ESI-MS data.

Show MeSH
Related in: MedlinePlus