Limits...
Metabolite signal identification in accurate mass metabolomics data with MZedDB, an interactive m/z annotation tool utilising predicted ionisation behaviour 'rules'.

Draper J, Enot DP, Parker D, Beckmann M, Snowdon S, Lin W, Zubair H - BMC Bioinformatics (2009)

Bottom Line: In reality the annotation process is confounded by the fact that many ionisation products will be not only molecular isotopes but also salt/solvent adducts and neutral loss fragments of original metabolites.We conclude that although ultra-high accurate mass instruments provide major insight into the chemical diversity of biological extracts, the facile annotation of a large proportion of signals is not possible by simple, automated query of current databases using computed molecular formulae.Parameterising MZedDB to take into account predicted ionisation behaviour and the biological source of any sample improves greatly both the frequency and accuracy of potential annotation 'hits' in ESI-MS data.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Biological Environmental and Rural Sciences, Aberystwyth University, Aberystwyth SY23 3DA, UK. jhd@aber.ac.uk

ABSTRACT

Background: Metabolomics experiments using Mass Spectrometry (MS) technology measure the mass to charge ratio (m/z) and intensity of ionised molecules in crude extracts of complex biological samples to generate high dimensional metabolite 'fingerprint' or metabolite 'profile' data. High resolution MS instruments perform routinely with a mass accuracy of < 5 ppm (parts per million) thus providing potentially a direct method for signal putative annotation using databases containing metabolite mass information. Most database interfaces support only simple queries with the default assumption that molecules either gain or lose a single proton when ionised. In reality the annotation process is confounded by the fact that many ionisation products will be not only molecular isotopes but also salt/solvent adducts and neutral loss fragments of original metabolites. This report describes an annotation strategy that will allow searching based on all potential ionisation products predicted to form during electrospray ionisation (ESI).

Results: Metabolite 'structures' harvested from publicly accessible databases were converted into a common format to generate a comprehensive archive in MZedDB. 'Rules' were derived from chemical information that allowed MZedDB to generate a list of adducts and neutral loss fragments putatively able to form for each structure and calculate, on the fly, the exact molecular weight of every potential ionisation product to provide targets for annotation searches based on accurate mass. We demonstrate that data matrices representing populations of ionisation products generated from different biological matrices contain a large proportion (sometimes > 50%) of molecular isotopes, salt adducts and neutral loss fragments. Correlation analysis of ESI-MS data features confirmed the predicted relationships of m/z signals. An integrated isotope enumerator in MZedDB allowed verification of exact isotopic pattern distributions to corroborate experimental data.

Conclusion: We conclude that although ultra-high accurate mass instruments provide major insight into the chemical diversity of biological extracts, the facile annotation of a large proportion of signals is not possible by simple, automated query of current databases using computed molecular formulae. Parameterising MZedDB to take into account predicted ionisation behaviour and the biological source of any sample improves greatly both the frequency and accuracy of potential annotation 'hits' in ESI-MS data.

Show MeSH
Metabolite data representations in several web-accessible metabolite databases. (a) Accurate mass information relating to succinic acid in several large databases (see legend to Table 1 for abbreviations). (b) three structurally diverse entries for choline in PubChem.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2721842&req=5

Figure 1: Metabolite data representations in several web-accessible metabolite databases. (a) Accurate mass information relating to succinic acid in several large databases (see legend to Table 1 for abbreviations). (b) three structurally diverse entries for choline in PubChem.

Mentions: Information on atomic mass was much more varied (Table 1 and Figure 1A); for example databases such as MetaCyc (in this case AraCyc) did not provide accurate mass data. Accurate mass information was presented in different databases as either the average molecular weight or mono-isotopic molecular weight, ranging from 4 to 7 decimal places. Annotation success increases more or less linearly with mass accuracy [19]; with Oribitrap and FT-ICR-MS capable of operating at or above 100,000 mass resolution then mono-isotopic mass information down to 4–5 decimal places will be required to optimise annotation success. Additionally, in several databases (particularly the large PubChem and ChemSpider repositories) metabolite information was not always represented as a single neutral charged molecule which will potentially complicate most automated annotation procedures which assume a signal is derived from a single molecular entity composed of pre-selected common atoms (e.g. C, O, N, H, S); an example is shown in Figure 1B of choline which is represented in ionic form on its own, or together with separate common or more exotic salts. Based on this analysis it was decided that a comprehensive coverage of natural metabolites could be achieved by downloading molecular information from the targeted repositories and then processing (see Methods section for details) all chemical entries to remove salts (i.e. keeping the largest component) and to remove molecules with less than 6 atoms or exotic elements. When possible all charged entities were converted to neutral compounds by addition or removal of hydrogen. The processed molecular information was then represented as SMILES, each of which had a unique identifier code in MZedDB and a hyperlink to the entry in the database of origin.


Metabolite signal identification in accurate mass metabolomics data with MZedDB, an interactive m/z annotation tool utilising predicted ionisation behaviour 'rules'.

Draper J, Enot DP, Parker D, Beckmann M, Snowdon S, Lin W, Zubair H - BMC Bioinformatics (2009)

Metabolite data representations in several web-accessible metabolite databases. (a) Accurate mass information relating to succinic acid in several large databases (see legend to Table 1 for abbreviations). (b) three structurally diverse entries for choline in PubChem.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2721842&req=5

Figure 1: Metabolite data representations in several web-accessible metabolite databases. (a) Accurate mass information relating to succinic acid in several large databases (see legend to Table 1 for abbreviations). (b) three structurally diverse entries for choline in PubChem.
Mentions: Information on atomic mass was much more varied (Table 1 and Figure 1A); for example databases such as MetaCyc (in this case AraCyc) did not provide accurate mass data. Accurate mass information was presented in different databases as either the average molecular weight or mono-isotopic molecular weight, ranging from 4 to 7 decimal places. Annotation success increases more or less linearly with mass accuracy [19]; with Oribitrap and FT-ICR-MS capable of operating at or above 100,000 mass resolution then mono-isotopic mass information down to 4–5 decimal places will be required to optimise annotation success. Additionally, in several databases (particularly the large PubChem and ChemSpider repositories) metabolite information was not always represented as a single neutral charged molecule which will potentially complicate most automated annotation procedures which assume a signal is derived from a single molecular entity composed of pre-selected common atoms (e.g. C, O, N, H, S); an example is shown in Figure 1B of choline which is represented in ionic form on its own, or together with separate common or more exotic salts. Based on this analysis it was decided that a comprehensive coverage of natural metabolites could be achieved by downloading molecular information from the targeted repositories and then processing (see Methods section for details) all chemical entries to remove salts (i.e. keeping the largest component) and to remove molecules with less than 6 atoms or exotic elements. When possible all charged entities were converted to neutral compounds by addition or removal of hydrogen. The processed molecular information was then represented as SMILES, each of which had a unique identifier code in MZedDB and a hyperlink to the entry in the database of origin.

Bottom Line: In reality the annotation process is confounded by the fact that many ionisation products will be not only molecular isotopes but also salt/solvent adducts and neutral loss fragments of original metabolites.We conclude that although ultra-high accurate mass instruments provide major insight into the chemical diversity of biological extracts, the facile annotation of a large proportion of signals is not possible by simple, automated query of current databases using computed molecular formulae.Parameterising MZedDB to take into account predicted ionisation behaviour and the biological source of any sample improves greatly both the frequency and accuracy of potential annotation 'hits' in ESI-MS data.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Biological Environmental and Rural Sciences, Aberystwyth University, Aberystwyth SY23 3DA, UK. jhd@aber.ac.uk

ABSTRACT

Background: Metabolomics experiments using Mass Spectrometry (MS) technology measure the mass to charge ratio (m/z) and intensity of ionised molecules in crude extracts of complex biological samples to generate high dimensional metabolite 'fingerprint' or metabolite 'profile' data. High resolution MS instruments perform routinely with a mass accuracy of < 5 ppm (parts per million) thus providing potentially a direct method for signal putative annotation using databases containing metabolite mass information. Most database interfaces support only simple queries with the default assumption that molecules either gain or lose a single proton when ionised. In reality the annotation process is confounded by the fact that many ionisation products will be not only molecular isotopes but also salt/solvent adducts and neutral loss fragments of original metabolites. This report describes an annotation strategy that will allow searching based on all potential ionisation products predicted to form during electrospray ionisation (ESI).

Results: Metabolite 'structures' harvested from publicly accessible databases were converted into a common format to generate a comprehensive archive in MZedDB. 'Rules' were derived from chemical information that allowed MZedDB to generate a list of adducts and neutral loss fragments putatively able to form for each structure and calculate, on the fly, the exact molecular weight of every potential ionisation product to provide targets for annotation searches based on accurate mass. We demonstrate that data matrices representing populations of ionisation products generated from different biological matrices contain a large proportion (sometimes > 50%) of molecular isotopes, salt adducts and neutral loss fragments. Correlation analysis of ESI-MS data features confirmed the predicted relationships of m/z signals. An integrated isotope enumerator in MZedDB allowed verification of exact isotopic pattern distributions to corroborate experimental data.

Conclusion: We conclude that although ultra-high accurate mass instruments provide major insight into the chemical diversity of biological extracts, the facile annotation of a large proportion of signals is not possible by simple, automated query of current databases using computed molecular formulae. Parameterising MZedDB to take into account predicted ionisation behaviour and the biological source of any sample improves greatly both the frequency and accuracy of potential annotation 'hits' in ESI-MS data.

Show MeSH