Limits...
Mining the unknown: a systems approach to metabolite identification combining genetic and metabolic information.

Krumsiek J, Suhre K, Evans AM, Mitchell MW, Mohney RP, Milburn MV, Wägele B, Römisch-Margl W, Illig T, Adamski J, Gieger C, Theis FJ, Kastenmüller G - PLoS Genet. (2012)

Bottom Line: Overlaying the inferred genetic associations, metabolic networks, and knowledge-based pathway information, we derive testable hypotheses on the biochemical identities of 106 unknown metabolites.As a proof of principle, we experimentally confirm nine concrete predictions.Our approach is generic in nature and can be directly transferred to metabolomics data from different experimental platforms.

View Article: PubMed Central - PubMed

Affiliation: Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, Neuherberg, Germany.

ABSTRACT
Recent genome-wide association studies (GWAS) with metabolomics data linked genetic variation in the human genome to differences in individual metabolite levels. A strong relevance of this metabolic individuality for biomedical and pharmaceutical research has been reported. However, a considerable amount of the molecules currently quantified by modern metabolomics techniques are chemically unidentified. The identification of these "unknown metabolites" is still a demanding and intricate task, limiting their usability as functional markers of metabolic processes. As a consequence, previous GWAS largely ignored unknown metabolites as metabolic traits for the analysis. Here we present a systems-level approach that combines genome-wide association analysis and Gaussian graphical modeling with metabolomics to predict the identity of the unknown metabolites. We apply our method to original data of 517 metabolic traits, of which 225 are unknowns, and genotyping information on 655,658 genetic variants, measured in 1,768 human blood samples. We report previously undescribed genotype-metabotype associations for six distinct gene loci (SLC22A2, COMT, CYP3A5, CYP2C18, GBA3, UGT3A1) and one locus not related to any known gene (rs12413935). Overlaying the inferred genetic associations, metabolic networks, and knowledge-based pathway information, we derive testable hypotheses on the biochemical identities of 106 unknown metabolites. As a proof of principle, we experimentally confirm nine concrete predictions. We demonstrate the benefit of our method for the functional interpretation of previous metabolomics biomarker studies on liver detoxification, hypertension, and insulin resistance. Our approach is generic in nature and can be directly transferred to metabolomics data from different experimental platforms.

Show MeSH

Related in: MedlinePlus

Gaussian graphical modeling.GGMs embed unknown metabolites into their biochemical context. A: Complete network presentation of partial correlations that are significantly different from zero at α = 0.05 after Bonferroni correction. The unknown metabolites are spread over the entire network and are involved in various metabolic pathways. B–D: Selected high-scoring sub-networks. We observe that GGM edges directly correspond to chemical reactions which alter specific chemical groups (e.g. carbonyl groups and methyl groups). Solid lines denote positive partial correlation. Dashed lines indicate negative partial correlations. Line widths represent partial correlation strengths.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3475673&req=5

pgen-1003005-g003: Gaussian graphical modeling.GGMs embed unknown metabolites into their biochemical context. A: Complete network presentation of partial correlations that are significantly different from zero at α = 0.05 after Bonferroni correction. The unknown metabolites are spread over the entire network and are involved in various metabolic pathways. B–D: Selected high-scoring sub-networks. We observe that GGM edges directly correspond to chemical reactions which alter specific chemical groups (e.g. carbonyl groups and methyl groups). Solid lines denote positive partial correlation. Dashed lines indicate negative partial correlations. Line widths represent partial correlation strengths.

Mentions: In the second step of our analysis we focused solely on intrinsic relations between the measured metabolites and, in particular, on associations between known and unknown compounds. To this end, we applied Gaussian graphical models (GGMs), which we have previously shown to be able to reconstruct pathways involving directly related metabolites from cross-sectional blood serum metabolomics data [21], [22]. GGMs are based on partial correlation coefficients, that is, correlations between pairs of metabolites corrected for the effects of all remaining metabolites. Each known metabolite is annotated with a “super-pathway” corresponding to its general metabolic class, and a “sub-pathway” representing more specific metabolic pathways (see Dataset S1). In order to obtain a dataset that is independent of our genetic analysis, and to avoid circular arguments, co-variations in metabolite concentrations that are due to association with genetic variants (SNPs) were specifically removed from the data (see GGM methods for further details). A partial correlation was included in the model if it was significantly different from zero with α = 0.05 after Bonferroni correction, yielding a corrected significance level of  = 7.9×10−7 and an absolute partial correlation cutoff of ζ = 0.178. The resulting GGM consists of a total of 399 out of 62,835 theoretically possible edges (0.64% connectivity, Figure 3A). In line with our previous observations [21], metabolites tend to be strongly connected within their respective metabolic class, while links between different classes are rare (see Text S1). Inspecting the GGM in detail, we observe that the unknowns are tightly integrated within the network and connected to known compounds of various metabolic classes. This is reflected both in the overall network (Figure 3A, Text S1) and in the top list of high-scoring GGM edges (Table 2), where 18 of the 30 strongest partial correlations comprise at least one unknown metabolite. The highest partial correlation in the dataset actually involves a known-unknown metabolite pair, namely 3-indoxylsulfate and the unknown metabolite X-12405 (ζ = 0.840). For pairs of known metabolites, we consistently observe associations of biochemically related metabolites from various metabolic pathways, such as the metabolites inosine and guanosine (ζ = 0.798), which are involved in nucleotide metabolism, or androsterone sulfate and epiandrosterone sulfate (ζ = 0.755), which represent related steroid hormone metabolites. Other pathways with related metabolite pairs include amino acid metabolism, lipid metabolism, bile acid metabolism, and xanthine metabolism. Following our line of reasoning, correlating pairs of a known and an unknown metabolite then directly point to specific pathways of cellular metabolism on which the unknown metabolite may lie. The investigation of the sub-network structure around the unknown compounds provides additional biochemical context for that compound.


Mining the unknown: a systems approach to metabolite identification combining genetic and metabolic information.

Krumsiek J, Suhre K, Evans AM, Mitchell MW, Mohney RP, Milburn MV, Wägele B, Römisch-Margl W, Illig T, Adamski J, Gieger C, Theis FJ, Kastenmüller G - PLoS Genet. (2012)

Gaussian graphical modeling.GGMs embed unknown metabolites into their biochemical context. A: Complete network presentation of partial correlations that are significantly different from zero at α = 0.05 after Bonferroni correction. The unknown metabolites are spread over the entire network and are involved in various metabolic pathways. B–D: Selected high-scoring sub-networks. We observe that GGM edges directly correspond to chemical reactions which alter specific chemical groups (e.g. carbonyl groups and methyl groups). Solid lines denote positive partial correlation. Dashed lines indicate negative partial correlations. Line widths represent partial correlation strengths.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3475673&req=5

pgen-1003005-g003: Gaussian graphical modeling.GGMs embed unknown metabolites into their biochemical context. A: Complete network presentation of partial correlations that are significantly different from zero at α = 0.05 after Bonferroni correction. The unknown metabolites are spread over the entire network and are involved in various metabolic pathways. B–D: Selected high-scoring sub-networks. We observe that GGM edges directly correspond to chemical reactions which alter specific chemical groups (e.g. carbonyl groups and methyl groups). Solid lines denote positive partial correlation. Dashed lines indicate negative partial correlations. Line widths represent partial correlation strengths.
Mentions: In the second step of our analysis we focused solely on intrinsic relations between the measured metabolites and, in particular, on associations between known and unknown compounds. To this end, we applied Gaussian graphical models (GGMs), which we have previously shown to be able to reconstruct pathways involving directly related metabolites from cross-sectional blood serum metabolomics data [21], [22]. GGMs are based on partial correlation coefficients, that is, correlations between pairs of metabolites corrected for the effects of all remaining metabolites. Each known metabolite is annotated with a “super-pathway” corresponding to its general metabolic class, and a “sub-pathway” representing more specific metabolic pathways (see Dataset S1). In order to obtain a dataset that is independent of our genetic analysis, and to avoid circular arguments, co-variations in metabolite concentrations that are due to association with genetic variants (SNPs) were specifically removed from the data (see GGM methods for further details). A partial correlation was included in the model if it was significantly different from zero with α = 0.05 after Bonferroni correction, yielding a corrected significance level of  = 7.9×10−7 and an absolute partial correlation cutoff of ζ = 0.178. The resulting GGM consists of a total of 399 out of 62,835 theoretically possible edges (0.64% connectivity, Figure 3A). In line with our previous observations [21], metabolites tend to be strongly connected within their respective metabolic class, while links between different classes are rare (see Text S1). Inspecting the GGM in detail, we observe that the unknowns are tightly integrated within the network and connected to known compounds of various metabolic classes. This is reflected both in the overall network (Figure 3A, Text S1) and in the top list of high-scoring GGM edges (Table 2), where 18 of the 30 strongest partial correlations comprise at least one unknown metabolite. The highest partial correlation in the dataset actually involves a known-unknown metabolite pair, namely 3-indoxylsulfate and the unknown metabolite X-12405 (ζ = 0.840). For pairs of known metabolites, we consistently observe associations of biochemically related metabolites from various metabolic pathways, such as the metabolites inosine and guanosine (ζ = 0.798), which are involved in nucleotide metabolism, or androsterone sulfate and epiandrosterone sulfate (ζ = 0.755), which represent related steroid hormone metabolites. Other pathways with related metabolite pairs include amino acid metabolism, lipid metabolism, bile acid metabolism, and xanthine metabolism. Following our line of reasoning, correlating pairs of a known and an unknown metabolite then directly point to specific pathways of cellular metabolism on which the unknown metabolite may lie. The investigation of the sub-network structure around the unknown compounds provides additional biochemical context for that compound.

Bottom Line: Overlaying the inferred genetic associations, metabolic networks, and knowledge-based pathway information, we derive testable hypotheses on the biochemical identities of 106 unknown metabolites.As a proof of principle, we experimentally confirm nine concrete predictions.Our approach is generic in nature and can be directly transferred to metabolomics data from different experimental platforms.

View Article: PubMed Central - PubMed

Affiliation: Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, Neuherberg, Germany.

ABSTRACT
Recent genome-wide association studies (GWAS) with metabolomics data linked genetic variation in the human genome to differences in individual metabolite levels. A strong relevance of this metabolic individuality for biomedical and pharmaceutical research has been reported. However, a considerable amount of the molecules currently quantified by modern metabolomics techniques are chemically unidentified. The identification of these "unknown metabolites" is still a demanding and intricate task, limiting their usability as functional markers of metabolic processes. As a consequence, previous GWAS largely ignored unknown metabolites as metabolic traits for the analysis. Here we present a systems-level approach that combines genome-wide association analysis and Gaussian graphical modeling with metabolomics to predict the identity of the unknown metabolites. We apply our method to original data of 517 metabolic traits, of which 225 are unknowns, and genotyping information on 655,658 genetic variants, measured in 1,768 human blood samples. We report previously undescribed genotype-metabotype associations for six distinct gene loci (SLC22A2, COMT, CYP3A5, CYP2C18, GBA3, UGT3A1) and one locus not related to any known gene (rs12413935). Overlaying the inferred genetic associations, metabolic networks, and knowledge-based pathway information, we derive testable hypotheses on the biochemical identities of 106 unknown metabolites. As a proof of principle, we experimentally confirm nine concrete predictions. We demonstrate the benefit of our method for the functional interpretation of previous metabolomics biomarker studies on liver detoxification, hypertension, and insulin resistance. Our approach is generic in nature and can be directly transferred to metabolomics data from different experimental platforms.

Show MeSH
Related in: MedlinePlus