Limits...
Mining the unknown: a systems approach to metabolite identification combining genetic and metabolic information.

Krumsiek J, Suhre K, Evans AM, Mitchell MW, Mohney RP, Milburn MV, Wägele B, Römisch-Margl W, Illig T, Adamski J, Gieger C, Theis FJ, Kastenmüller G - PLoS Genet. (2012)

Bottom Line: Overlaying the inferred genetic associations, metabolic networks, and knowledge-based pathway information, we derive testable hypotheses on the biochemical identities of 106 unknown metabolites.As a proof of principle, we experimentally confirm nine concrete predictions.Our approach is generic in nature and can be directly transferred to metabolomics data from different experimental platforms.

View Article: PubMed Central - PubMed

Affiliation: Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, Neuherberg, Germany.

ABSTRACT
Recent genome-wide association studies (GWAS) with metabolomics data linked genetic variation in the human genome to differences in individual metabolite levels. A strong relevance of this metabolic individuality for biomedical and pharmaceutical research has been reported. However, a considerable amount of the molecules currently quantified by modern metabolomics techniques are chemically unidentified. The identification of these "unknown metabolites" is still a demanding and intricate task, limiting their usability as functional markers of metabolic processes. As a consequence, previous GWAS largely ignored unknown metabolites as metabolic traits for the analysis. Here we present a systems-level approach that combines genome-wide association analysis and Gaussian graphical modeling with metabolomics to predict the identity of the unknown metabolites. We apply our method to original data of 517 metabolic traits, of which 225 are unknowns, and genotyping information on 655,658 genetic variants, measured in 1,768 human blood samples. We report previously undescribed genotype-metabotype associations for six distinct gene loci (SLC22A2, COMT, CYP3A5, CYP2C18, GBA3, UGT3A1) and one locus not related to any known gene (rs12413935). Overlaying the inferred genetic associations, metabolic networks, and knowledge-based pathway information, we derive testable hypotheses on the biochemical identities of 106 unknown metabolites. As a proof of principle, we experimentally confirm nine concrete predictions. We demonstrate the benefit of our method for the functional interpretation of previous metabolomics biomarker studies on liver detoxification, hypertension, and insulin resistance. Our approach is generic in nature and can be directly transferred to metabolomics data from different experimental platforms.

Show MeSH

Related in: MedlinePlus

Manhattan plot of genetic association.The strength of association for known (bottom) and unknown (top) metabolites is indicated as the negative logarithm of the p-value for the linear model (see Methods). Only metabolite-SNP associations with p-values below 10−6 are plotted (grey circles). Triangles represent metabolite-SNP associations with p-values below 10−40. Horizontal lines indicate the threshold for genome-wide significance ( = 1.6×10−10 corresponding to α = 0.05 after Bonferroni correction); red vertical dashes indicate loci at which this threshold is attained.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3475673&req=5

pgen-1003005-g002: Manhattan plot of genetic association.The strength of association for known (bottom) and unknown (top) metabolites is indicated as the negative logarithm of the p-value for the linear model (see Methods). Only metabolite-SNP associations with p-values below 10−6 are plotted (grey circles). Triangles represent metabolite-SNP associations with p-values below 10−40. Horizontal lines indicate the threshold for genome-wide significance ( = 1.6×10−10 corresponding to α = 0.05 after Bonferroni correction); red vertical dashes indicate loci at which this threshold is attained.

Mentions: In total, we observe 34 distinct loci that display metabolite associations at a genome-wide significance level (Figure 2 and Dataset S1). Out of these 34 loci, 15 associate with at least one unknown compound. For 12 loci, an unknown compound constitutes the strongest association of all tested compounds. From the 213 unknown metabolites analyzed (see Methods for the determination of this metabolite subset), 28 show at least one genome-wide significant hit. These 28 associations at the 15 loci are presented in Table 1 along with all previously described GWAS hits to metabolic traits or other endpoints. Associating traits were determined from the GWAS catalog [26] for SNPs in LD (r2≥0.5) with the respective lead SNP. Seven of the 15 loci (SLC22A2, COMT, CYP3A5, CYP2C18, GBA3, UGT3A1, rs12413935) have not been described in GWAS with metabolic traits before and thus represent new genetic loci of metabolic individuality. Interestingly, genetic variants in strong LD with CYP2C18 have been reported to associate with warfarin maintenance dose [27].


Mining the unknown: a systems approach to metabolite identification combining genetic and metabolic information.

Krumsiek J, Suhre K, Evans AM, Mitchell MW, Mohney RP, Milburn MV, Wägele B, Römisch-Margl W, Illig T, Adamski J, Gieger C, Theis FJ, Kastenmüller G - PLoS Genet. (2012)

Manhattan plot of genetic association.The strength of association for known (bottom) and unknown (top) metabolites is indicated as the negative logarithm of the p-value for the linear model (see Methods). Only metabolite-SNP associations with p-values below 10−6 are plotted (grey circles). Triangles represent metabolite-SNP associations with p-values below 10−40. Horizontal lines indicate the threshold for genome-wide significance ( = 1.6×10−10 corresponding to α = 0.05 after Bonferroni correction); red vertical dashes indicate loci at which this threshold is attained.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3475673&req=5

pgen-1003005-g002: Manhattan plot of genetic association.The strength of association for known (bottom) and unknown (top) metabolites is indicated as the negative logarithm of the p-value for the linear model (see Methods). Only metabolite-SNP associations with p-values below 10−6 are plotted (grey circles). Triangles represent metabolite-SNP associations with p-values below 10−40. Horizontal lines indicate the threshold for genome-wide significance ( = 1.6×10−10 corresponding to α = 0.05 after Bonferroni correction); red vertical dashes indicate loci at which this threshold is attained.
Mentions: In total, we observe 34 distinct loci that display metabolite associations at a genome-wide significance level (Figure 2 and Dataset S1). Out of these 34 loci, 15 associate with at least one unknown compound. For 12 loci, an unknown compound constitutes the strongest association of all tested compounds. From the 213 unknown metabolites analyzed (see Methods for the determination of this metabolite subset), 28 show at least one genome-wide significant hit. These 28 associations at the 15 loci are presented in Table 1 along with all previously described GWAS hits to metabolic traits or other endpoints. Associating traits were determined from the GWAS catalog [26] for SNPs in LD (r2≥0.5) with the respective lead SNP. Seven of the 15 loci (SLC22A2, COMT, CYP3A5, CYP2C18, GBA3, UGT3A1, rs12413935) have not been described in GWAS with metabolic traits before and thus represent new genetic loci of metabolic individuality. Interestingly, genetic variants in strong LD with CYP2C18 have been reported to associate with warfarin maintenance dose [27].

Bottom Line: Overlaying the inferred genetic associations, metabolic networks, and knowledge-based pathway information, we derive testable hypotheses on the biochemical identities of 106 unknown metabolites.As a proof of principle, we experimentally confirm nine concrete predictions.Our approach is generic in nature and can be directly transferred to metabolomics data from different experimental platforms.

View Article: PubMed Central - PubMed

Affiliation: Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, Neuherberg, Germany.

ABSTRACT
Recent genome-wide association studies (GWAS) with metabolomics data linked genetic variation in the human genome to differences in individual metabolite levels. A strong relevance of this metabolic individuality for biomedical and pharmaceutical research has been reported. However, a considerable amount of the molecules currently quantified by modern metabolomics techniques are chemically unidentified. The identification of these "unknown metabolites" is still a demanding and intricate task, limiting their usability as functional markers of metabolic processes. As a consequence, previous GWAS largely ignored unknown metabolites as metabolic traits for the analysis. Here we present a systems-level approach that combines genome-wide association analysis and Gaussian graphical modeling with metabolomics to predict the identity of the unknown metabolites. We apply our method to original data of 517 metabolic traits, of which 225 are unknowns, and genotyping information on 655,658 genetic variants, measured in 1,768 human blood samples. We report previously undescribed genotype-metabotype associations for six distinct gene loci (SLC22A2, COMT, CYP3A5, CYP2C18, GBA3, UGT3A1) and one locus not related to any known gene (rs12413935). Overlaying the inferred genetic associations, metabolic networks, and knowledge-based pathway information, we derive testable hypotheses on the biochemical identities of 106 unknown metabolites. As a proof of principle, we experimentally confirm nine concrete predictions. We demonstrate the benefit of our method for the functional interpretation of previous metabolomics biomarker studies on liver detoxification, hypertension, and insulin resistance. Our approach is generic in nature and can be directly transferred to metabolomics data from different experimental platforms.

Show MeSH
Related in: MedlinePlus