Limits...
Mining the unknown: a systems approach to metabolite identification combining genetic and metabolic information.

Krumsiek J, Suhre K, Evans AM, Mitchell MW, Mohney RP, Milburn MV, Wägele B, Römisch-Margl W, Illig T, Adamski J, Gieger C, Theis FJ, Kastenmüller G - PLoS Genet. (2012)

Bottom Line: Overlaying the inferred genetic associations, metabolic networks, and knowledge-based pathway information, we derive testable hypotheses on the biochemical identities of 106 unknown metabolites.As a proof of principle, we experimentally confirm nine concrete predictions.Our approach is generic in nature and can be directly transferred to metabolomics data from different experimental platforms.

View Article: PubMed Central - PubMed

Affiliation: Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, Neuherberg, Germany.

ABSTRACT
Recent genome-wide association studies (GWAS) with metabolomics data linked genetic variation in the human genome to differences in individual metabolite levels. A strong relevance of this metabolic individuality for biomedical and pharmaceutical research has been reported. However, a considerable amount of the molecules currently quantified by modern metabolomics techniques are chemically unidentified. The identification of these "unknown metabolites" is still a demanding and intricate task, limiting their usability as functional markers of metabolic processes. As a consequence, previous GWAS largely ignored unknown metabolites as metabolic traits for the analysis. Here we present a systems-level approach that combines genome-wide association analysis and Gaussian graphical modeling with metabolomics to predict the identity of the unknown metabolites. We apply our method to original data of 517 metabolic traits, of which 225 are unknowns, and genotyping information on 655,658 genetic variants, measured in 1,768 human blood samples. We report previously undescribed genotype-metabotype associations for six distinct gene loci (SLC22A2, COMT, CYP3A5, CYP2C18, GBA3, UGT3A1) and one locus not related to any known gene (rs12413935). Overlaying the inferred genetic associations, metabolic networks, and knowledge-based pathway information, we derive testable hypotheses on the biochemical identities of 106 unknown metabolites. As a proof of principle, we experimentally confirm nine concrete predictions. We demonstrate the benefit of our method for the functional interpretation of previous metabolomics biomarker studies on liver detoxification, hypertension, and insulin resistance. Our approach is generic in nature and can be directly transferred to metabolomics data from different experimental platforms.

Show MeSH

Related in: MedlinePlus

Semi-automatic prediction of unknown metabolite identities.A: Examples of how to determine pathway classifications based on the functional annotations of GGM and GWAS hits. We present two metabolites, X-11421 and X-11244, whose GGM and GWAS associations clearly point into carnitine and steroid metabolism, respectively. B: Overview of unknowns functionally annotated by both GGMs and the GWAS approach. ‘GGM’ refers to an unknown metabolite which is three or less steps away from a known metabolite in the GGM, whereas ‘direct GGM’ represents direct neighbors in the network. C: Pathway predictions for the 16 unknowns with both direct GGM and GWAs annotations. Unknowns marked with a star were subjected to in-depth analysis followed by experimental validation in the following.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3475673&req=5

pgen-1003005-g004: Semi-automatic prediction of unknown metabolite identities.A: Examples of how to determine pathway classifications based on the functional annotations of GGM and GWAS hits. We present two metabolites, X-11421 and X-11244, whose GGM and GWAS associations clearly point into carnitine and steroid metabolism, respectively. B: Overview of unknowns functionally annotated by both GGMs and the GWAS approach. ‘GGM’ refers to an unknown metabolite which is three or less steps away from a known metabolite in the GGM, whereas ‘direct GGM’ represents direct neighbors in the network. C: Pathway predictions for the 16 unknowns with both direct GGM and GWAs annotations. Unknowns marked with a star were subjected to in-depth analysis followed by experimental validation in the following.

Mentions: We combined functional annotations for both GGM neighbors and GWAS hits for each unknown in order to derive specific pathway classifications. For unknowns that did not have a known metabolite neighbor in the GGM, we also investigated the 2- and 3-neighborhoods. Since these hits certainly represent weaker evidence than a direct GGM neighbor, we distinguish between ‘GGM hit’ and ‘direct GGM hit’ in the following. Functional annotations were obtained from three sources: (1) The sub-pathway assignment provided for each known metabolite in the GGM neighborhood, (2) the GO functional terms for the associated gene of all genome-wide significant GWAS hits, and (3) the KEGG pathways on which the associated genes lie. To the best of our knowledge, there is presently no consistent mapping between annotations from the different data sources available for both metabolites and genes, so we here had to perform the only non-automatic step in the analysis: By manual interpretation of different functional classes (Figure 4A), we derive a single consensus pathway annotation for a total of 106 of the unknown metabolites (Figure 4B). For 98 unknowns, we obtained annotations from the GGM network, with 74 of these hits representing direct GGM hits. From the 28 genetic hits introduced above, 27 were in a genetic region with gene annotation. Overlaying the direct edge GGM set and the GWAS set, we obtained 16 unknowns with both biochemical and genetic evidence (Figure 4C). A list of all functional evidence along with the respective predictions can be found in Table S1.


Mining the unknown: a systems approach to metabolite identification combining genetic and metabolic information.

Krumsiek J, Suhre K, Evans AM, Mitchell MW, Mohney RP, Milburn MV, Wägele B, Römisch-Margl W, Illig T, Adamski J, Gieger C, Theis FJ, Kastenmüller G - PLoS Genet. (2012)

Semi-automatic prediction of unknown metabolite identities.A: Examples of how to determine pathway classifications based on the functional annotations of GGM and GWAS hits. We present two metabolites, X-11421 and X-11244, whose GGM and GWAS associations clearly point into carnitine and steroid metabolism, respectively. B: Overview of unknowns functionally annotated by both GGMs and the GWAS approach. ‘GGM’ refers to an unknown metabolite which is three or less steps away from a known metabolite in the GGM, whereas ‘direct GGM’ represents direct neighbors in the network. C: Pathway predictions for the 16 unknowns with both direct GGM and GWAs annotations. Unknowns marked with a star were subjected to in-depth analysis followed by experimental validation in the following.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3475673&req=5

pgen-1003005-g004: Semi-automatic prediction of unknown metabolite identities.A: Examples of how to determine pathway classifications based on the functional annotations of GGM and GWAS hits. We present two metabolites, X-11421 and X-11244, whose GGM and GWAS associations clearly point into carnitine and steroid metabolism, respectively. B: Overview of unknowns functionally annotated by both GGMs and the GWAS approach. ‘GGM’ refers to an unknown metabolite which is three or less steps away from a known metabolite in the GGM, whereas ‘direct GGM’ represents direct neighbors in the network. C: Pathway predictions for the 16 unknowns with both direct GGM and GWAs annotations. Unknowns marked with a star were subjected to in-depth analysis followed by experimental validation in the following.
Mentions: We combined functional annotations for both GGM neighbors and GWAS hits for each unknown in order to derive specific pathway classifications. For unknowns that did not have a known metabolite neighbor in the GGM, we also investigated the 2- and 3-neighborhoods. Since these hits certainly represent weaker evidence than a direct GGM neighbor, we distinguish between ‘GGM hit’ and ‘direct GGM hit’ in the following. Functional annotations were obtained from three sources: (1) The sub-pathway assignment provided for each known metabolite in the GGM neighborhood, (2) the GO functional terms for the associated gene of all genome-wide significant GWAS hits, and (3) the KEGG pathways on which the associated genes lie. To the best of our knowledge, there is presently no consistent mapping between annotations from the different data sources available for both metabolites and genes, so we here had to perform the only non-automatic step in the analysis: By manual interpretation of different functional classes (Figure 4A), we derive a single consensus pathway annotation for a total of 106 of the unknown metabolites (Figure 4B). For 98 unknowns, we obtained annotations from the GGM network, with 74 of these hits representing direct GGM hits. From the 28 genetic hits introduced above, 27 were in a genetic region with gene annotation. Overlaying the direct edge GGM set and the GWAS set, we obtained 16 unknowns with both biochemical and genetic evidence (Figure 4C). A list of all functional evidence along with the respective predictions can be found in Table S1.

Bottom Line: Overlaying the inferred genetic associations, metabolic networks, and knowledge-based pathway information, we derive testable hypotheses on the biochemical identities of 106 unknown metabolites.As a proof of principle, we experimentally confirm nine concrete predictions.Our approach is generic in nature and can be directly transferred to metabolomics data from different experimental platforms.

View Article: PubMed Central - PubMed

Affiliation: Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, Neuherberg, Germany.

ABSTRACT
Recent genome-wide association studies (GWAS) with metabolomics data linked genetic variation in the human genome to differences in individual metabolite levels. A strong relevance of this metabolic individuality for biomedical and pharmaceutical research has been reported. However, a considerable amount of the molecules currently quantified by modern metabolomics techniques are chemically unidentified. The identification of these "unknown metabolites" is still a demanding and intricate task, limiting their usability as functional markers of metabolic processes. As a consequence, previous GWAS largely ignored unknown metabolites as metabolic traits for the analysis. Here we present a systems-level approach that combines genome-wide association analysis and Gaussian graphical modeling with metabolomics to predict the identity of the unknown metabolites. We apply our method to original data of 517 metabolic traits, of which 225 are unknowns, and genotyping information on 655,658 genetic variants, measured in 1,768 human blood samples. We report previously undescribed genotype-metabotype associations for six distinct gene loci (SLC22A2, COMT, CYP3A5, CYP2C18, GBA3, UGT3A1) and one locus not related to any known gene (rs12413935). Overlaying the inferred genetic associations, metabolic networks, and knowledge-based pathway information, we derive testable hypotheses on the biochemical identities of 106 unknown metabolites. As a proof of principle, we experimentally confirm nine concrete predictions. We demonstrate the benefit of our method for the functional interpretation of previous metabolomics biomarker studies on liver detoxification, hypertension, and insulin resistance. Our approach is generic in nature and can be directly transferred to metabolomics data from different experimental platforms.

Show MeSH
Related in: MedlinePlus