Limits...
Metabolite signal identification in accurate mass metabolomics data with MZedDB, an interactive m/z annotation tool utilising predicted ionisation behaviour 'rules'.

Draper J, Enot DP, Parker D, Beckmann M, Snowdon S, Lin W, Zubair H - BMC Bioinformatics (2009)

Bottom Line: In reality the annotation process is confounded by the fact that many ionisation products will be not only molecular isotopes but also salt/solvent adducts and neutral loss fragments of original metabolites.We conclude that although ultra-high accurate mass instruments provide major insight into the chemical diversity of biological extracts, the facile annotation of a large proportion of signals is not possible by simple, automated query of current databases using computed molecular formulae.Parameterising MZedDB to take into account predicted ionisation behaviour and the biological source of any sample improves greatly both the frequency and accuracy of potential annotation 'hits' in ESI-MS data.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Biological Environmental and Rural Sciences, Aberystwyth University, Aberystwyth SY23 3DA, UK. jhd@aber.ac.uk

ABSTRACT

Background: Metabolomics experiments using Mass Spectrometry (MS) technology measure the mass to charge ratio (m/z) and intensity of ionised molecules in crude extracts of complex biological samples to generate high dimensional metabolite 'fingerprint' or metabolite 'profile' data. High resolution MS instruments perform routinely with a mass accuracy of < 5 ppm (parts per million) thus providing potentially a direct method for signal putative annotation using databases containing metabolite mass information. Most database interfaces support only simple queries with the default assumption that molecules either gain or lose a single proton when ionised. In reality the annotation process is confounded by the fact that many ionisation products will be not only molecular isotopes but also salt/solvent adducts and neutral loss fragments of original metabolites. This report describes an annotation strategy that will allow searching based on all potential ionisation products predicted to form during electrospray ionisation (ESI).

Results: Metabolite 'structures' harvested from publicly accessible databases were converted into a common format to generate a comprehensive archive in MZedDB. 'Rules' were derived from chemical information that allowed MZedDB to generate a list of adducts and neutral loss fragments putatively able to form for each structure and calculate, on the fly, the exact molecular weight of every potential ionisation product to provide targets for annotation searches based on accurate mass. We demonstrate that data matrices representing populations of ionisation products generated from different biological matrices contain a large proportion (sometimes > 50%) of molecular isotopes, salt adducts and neutral loss fragments. Correlation analysis of ESI-MS data features confirmed the predicted relationships of m/z signals. An integrated isotope enumerator in MZedDB allowed verification of exact isotopic pattern distributions to corroborate experimental data.

Conclusion: We conclude that although ultra-high accurate mass instruments provide major insight into the chemical diversity of biological extracts, the facile annotation of a large proportion of signals is not possible by simple, automated query of current databases using computed molecular formulae. Parameterising MZedDB to take into account predicted ionisation behaviour and the biological source of any sample improves greatly both the frequency and accuracy of potential annotation 'hits' in ESI-MS data.

Show MeSH
Investigation of mathematically related signals in a sample matrix. (A) A typical predicted cluster of mathematically related ions from the full matrix of signals derived from FT-ICR-MS analysis of infected Brachypodium distachyon plants with potassium adduct highlighted. Relative intensity ratios of predicted isotopes are highlighted in yellow. (B) Adducts table output following MZedDB PIP search (positive ion) for m/z 156.0421. (C) Isotope ratio predictions table output from MZedDB for m/z 156.0421 with isotopes shown in Figure 4A highlighted in yellow. (D) MZedDB output following a PIP search with the molecular formula C5H11KNO2 of All databases (left panel) used to construct MZedDB, or following restriction of search to just grasses database entries in KEGG (right panel). Inset shows structure of betaine and valine.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2721842&req=5

Figure 4: Investigation of mathematically related signals in a sample matrix. (A) A typical predicted cluster of mathematically related ions from the full matrix of signals derived from FT-ICR-MS analysis of infected Brachypodium distachyon plants with potassium adduct highlighted. Relative intensity ratios of predicted isotopes are highlighted in yellow. (B) Adducts table output following MZedDB PIP search (positive ion) for m/z 156.0421. (C) Isotope ratio predictions table output from MZedDB for m/z 156.0421 with isotopes shown in Figure 4A highlighted in yellow. (D) MZedDB output following a PIP search with the molecular formula C5H11KNO2 of All databases (left panel) used to construct MZedDB, or following restriction of search to just grasses database entries in KEGG (right panel). Inset shows structure of betaine and valine.

Mentions: Mass spectrometers capable of high accurate mass measurement often have instrument software dedicated to the identification of molecular isotopes and common adducts in spectra. In most cases only single, simple relationships (such as M_M+1 or M_Na+) are searched for. In addition to a standard Isotope ratio calculator as part of MZedDB development we have developed an 'adduct calculator' (operating in the R environment) which may be tailored to search for masses linked to any number of combinations of adducts, isotopes and neutral losses [40]. Using pre-determined (e.g. 0.001 amu) thresholds arithmetically related signals within a single biological matrix can be tentatively placed into clusters that are potentially all derived from a single parent molecule. Figure 4A demonstrates a typical predicted cluster of mathematically related ions in the full matrix of signals derived from FT-ICR-MS analysis of the model grass Brachypodium distachyon and the rice blast fungal pathogen [43]. In this instance the cluster centres on m/z 156.0421 which is predicted to be a potassium adduct of m/z 118.0862. MZedDB can be used to query the likelihood that the m/z species highlighted in this cluster of signals are predicted to be derived from a single metabolite based on the PIP 'rules' used to construct the database. Figure 4B shows the output of a PIP search (positive ion) for m/z 156.0421 in which two salt adducts ([M+Na]1+ = 140.0682 and [M+K]1+ = 156.0421) as well as a neutral loss of water ([M-H2O+H]1+ = 100.0756) are predicted to be possible in addition to the parental pseudo-ion ([M+H]1+ = 118.0862). Further investigation using the isotope calculator confirmed that signals m/z 157.0455 and m/z 158.0402 (highlighted in yellow in Figure 4C) had the highest probability of being isotopes of m/z 156.0421 and additionally were present at the correct relative intensities (see last column Table 5A). A PIP search of All databases used to construct MZedDB with the molecular formula C5H11KNO2 gave 16 entries corresponding to 10 metabolites; restriction of the database searches to just Grass potentially annotated this cluster of signals as being derived from either betaine or valine (Figure 4D).


Metabolite signal identification in accurate mass metabolomics data with MZedDB, an interactive m/z annotation tool utilising predicted ionisation behaviour 'rules'.

Draper J, Enot DP, Parker D, Beckmann M, Snowdon S, Lin W, Zubair H - BMC Bioinformatics (2009)

Investigation of mathematically related signals in a sample matrix. (A) A typical predicted cluster of mathematically related ions from the full matrix of signals derived from FT-ICR-MS analysis of infected Brachypodium distachyon plants with potassium adduct highlighted. Relative intensity ratios of predicted isotopes are highlighted in yellow. (B) Adducts table output following MZedDB PIP search (positive ion) for m/z 156.0421. (C) Isotope ratio predictions table output from MZedDB for m/z 156.0421 with isotopes shown in Figure 4A highlighted in yellow. (D) MZedDB output following a PIP search with the molecular formula C5H11KNO2 of All databases (left panel) used to construct MZedDB, or following restriction of search to just grasses database entries in KEGG (right panel). Inset shows structure of betaine and valine.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2721842&req=5

Figure 4: Investigation of mathematically related signals in a sample matrix. (A) A typical predicted cluster of mathematically related ions from the full matrix of signals derived from FT-ICR-MS analysis of infected Brachypodium distachyon plants with potassium adduct highlighted. Relative intensity ratios of predicted isotopes are highlighted in yellow. (B) Adducts table output following MZedDB PIP search (positive ion) for m/z 156.0421. (C) Isotope ratio predictions table output from MZedDB for m/z 156.0421 with isotopes shown in Figure 4A highlighted in yellow. (D) MZedDB output following a PIP search with the molecular formula C5H11KNO2 of All databases (left panel) used to construct MZedDB, or following restriction of search to just grasses database entries in KEGG (right panel). Inset shows structure of betaine and valine.
Mentions: Mass spectrometers capable of high accurate mass measurement often have instrument software dedicated to the identification of molecular isotopes and common adducts in spectra. In most cases only single, simple relationships (such as M_M+1 or M_Na+) are searched for. In addition to a standard Isotope ratio calculator as part of MZedDB development we have developed an 'adduct calculator' (operating in the R environment) which may be tailored to search for masses linked to any number of combinations of adducts, isotopes and neutral losses [40]. Using pre-determined (e.g. 0.001 amu) thresholds arithmetically related signals within a single biological matrix can be tentatively placed into clusters that are potentially all derived from a single parent molecule. Figure 4A demonstrates a typical predicted cluster of mathematically related ions in the full matrix of signals derived from FT-ICR-MS analysis of the model grass Brachypodium distachyon and the rice blast fungal pathogen [43]. In this instance the cluster centres on m/z 156.0421 which is predicted to be a potassium adduct of m/z 118.0862. MZedDB can be used to query the likelihood that the m/z species highlighted in this cluster of signals are predicted to be derived from a single metabolite based on the PIP 'rules' used to construct the database. Figure 4B shows the output of a PIP search (positive ion) for m/z 156.0421 in which two salt adducts ([M+Na]1+ = 140.0682 and [M+K]1+ = 156.0421) as well as a neutral loss of water ([M-H2O+H]1+ = 100.0756) are predicted to be possible in addition to the parental pseudo-ion ([M+H]1+ = 118.0862). Further investigation using the isotope calculator confirmed that signals m/z 157.0455 and m/z 158.0402 (highlighted in yellow in Figure 4C) had the highest probability of being isotopes of m/z 156.0421 and additionally were present at the correct relative intensities (see last column Table 5A). A PIP search of All databases used to construct MZedDB with the molecular formula C5H11KNO2 gave 16 entries corresponding to 10 metabolites; restriction of the database searches to just Grass potentially annotated this cluster of signals as being derived from either betaine or valine (Figure 4D).

Bottom Line: In reality the annotation process is confounded by the fact that many ionisation products will be not only molecular isotopes but also salt/solvent adducts and neutral loss fragments of original metabolites.We conclude that although ultra-high accurate mass instruments provide major insight into the chemical diversity of biological extracts, the facile annotation of a large proportion of signals is not possible by simple, automated query of current databases using computed molecular formulae.Parameterising MZedDB to take into account predicted ionisation behaviour and the biological source of any sample improves greatly both the frequency and accuracy of potential annotation 'hits' in ESI-MS data.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Biological Environmental and Rural Sciences, Aberystwyth University, Aberystwyth SY23 3DA, UK. jhd@aber.ac.uk

ABSTRACT

Background: Metabolomics experiments using Mass Spectrometry (MS) technology measure the mass to charge ratio (m/z) and intensity of ionised molecules in crude extracts of complex biological samples to generate high dimensional metabolite 'fingerprint' or metabolite 'profile' data. High resolution MS instruments perform routinely with a mass accuracy of < 5 ppm (parts per million) thus providing potentially a direct method for signal putative annotation using databases containing metabolite mass information. Most database interfaces support only simple queries with the default assumption that molecules either gain or lose a single proton when ionised. In reality the annotation process is confounded by the fact that many ionisation products will be not only molecular isotopes but also salt/solvent adducts and neutral loss fragments of original metabolites. This report describes an annotation strategy that will allow searching based on all potential ionisation products predicted to form during electrospray ionisation (ESI).

Results: Metabolite 'structures' harvested from publicly accessible databases were converted into a common format to generate a comprehensive archive in MZedDB. 'Rules' were derived from chemical information that allowed MZedDB to generate a list of adducts and neutral loss fragments putatively able to form for each structure and calculate, on the fly, the exact molecular weight of every potential ionisation product to provide targets for annotation searches based on accurate mass. We demonstrate that data matrices representing populations of ionisation products generated from different biological matrices contain a large proportion (sometimes > 50%) of molecular isotopes, salt adducts and neutral loss fragments. Correlation analysis of ESI-MS data features confirmed the predicted relationships of m/z signals. An integrated isotope enumerator in MZedDB allowed verification of exact isotopic pattern distributions to corroborate experimental data.

Conclusion: We conclude that although ultra-high accurate mass instruments provide major insight into the chemical diversity of biological extracts, the facile annotation of a large proportion of signals is not possible by simple, automated query of current databases using computed molecular formulae. Parameterising MZedDB to take into account predicted ionisation behaviour and the biological source of any sample improves greatly both the frequency and accuracy of potential annotation 'hits' in ESI-MS data.

Show MeSH