Limits...
A distinct metabolic signature predicts development of fasting plasma glucose.

Hische M, Larhlimi A, Schwarz F, Fischer-Rosinský A, Bobbert T, Assmann A, Catchpole GS, Pfeiffer AF, Willmitzer L, Selbig J, Spranger J - J Clin Bioinforma (2012)

Bottom Line: We found a metabolic pattern consisting of nine metabolites that predicted fasting glucose development with an accuracy of 0.47 in tenfold cross-validation using Random Forest regression.We also showed that adding established risk markers did not improve the model accuracy.This result could only be captured by application of multivariate statistical approaches.

View Article: PubMed Central - HTML - PubMed

Affiliation: Clinic of Endocrinology, Diabetes and Nutrition, Charité-Universitätsmedizin Berlin, Charitéplatz 1, 10117 Berlin, Germany. joachim.spranger@charite.de.

ABSTRACT

Background: High blood glucose and diabetes are amongst the conditions causing the greatest losses in years of healthy life worldwide. Therefore, numerous studies aim to identify reliable risk markers for development of impaired glucose metabolism and type 2 diabetes. However, the molecular basis of impaired glucose metabolism is so far insufficiently understood. The development of so called 'omics' approaches in the recent years promises to identify molecular markers and to further understand the molecular basis of impaired glucose metabolism and type 2 diabetes. Although univariate statistical approaches are often applied, we demonstrate here that the application of multivariate statistical approaches is highly recommended to fully capture the complexity of data gained using high-throughput methods.

Methods: We took blood plasma samples from 172 subjects who participated in the prospective Metabolic Syndrome Berlin Potsdam follow-up study (MESY-BEPO Follow-up). We analysed these samples using Gas Chromatography coupled with Mass Spectrometry (GC-MS), and measured 286 metabolites. Furthermore, fasting glucose levels were measured using standard methods at baseline, and after an average of six years. We did correlation analysis and built linear regression models as well as Random Forest regression models to identify metabolites that predict the development of fasting glucose in our cohort.

Results: We found a metabolic pattern consisting of nine metabolites that predicted fasting glucose development with an accuracy of 0.47 in tenfold cross-validation using Random Forest regression. We also showed that adding established risk markers did not improve the model accuracy. However, external validation is eventually desirable. Although not all metabolites belonging to the final pattern are identified yet, the pattern directs attention to amino acid metabolism, energy metabolism and redox homeostasis.

Conclusions: We demonstrate that metabolites identified using a high-throughput method (GC-MS) perform well in predicting the development of fasting plasma glucose over several years. Notably, not single, but a complex pattern of metabolites propels the prediction and therefore reflects the complexity of the underlying molecular mechanisms. This result could only be captured by application of multivariate statistical approaches. Therefore, we highly recommend the usage of statistical methods that seize the complexity of the information given by high-throughput methods.

No MeSH data available.


Related in: MedlinePlus

Random Forest feature selection. Iterative bisection of the number of metabolites by removing the 50% of metabolites with the smallest importance measure. The remaining metabolites were used to build the Random Forest regression model. Shown is the median cross-validation accuracy. The accuracy remains stable up to a pattern of nine metabolites.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3298809&req=5

Figure 2: Random Forest feature selection. Iterative bisection of the number of metabolites by removing the 50% of metabolites with the smallest importance measure. The remaining metabolites were used to build the Random Forest regression model. Shown is the median cross-validation accuracy. The accuracy remains stable up to a pattern of nine metabolites.

Mentions: Thus, we performed a nested feature selection based on the Random Forest importance measure. To define a minimum number of metabolites necessary for accurate prediction of Δglucose, we stepwise bisected the number of metabolites. During this reduction the average model accuracy remained stable up to a pattern of nine metabolites (see Figure 2). Therefore, it is legitimate to use only the nine metabolites with the highest importance in the Random Forest model. These nine metabolites are shown in Table 3. The accuracy using these metabolites in a Random Forest regression model was 0.97. The median cross-validation accuracy was 0.47. Although the current selection of metabolites is smaller than the correlation based selection, the accuracy improved. Detailed examination of the two metabolite selections revealed an incomplete overlap. Metabolites highly correlated with Δglucose also showed a high Random Forest importance. However, some metabolites (e.g. the putative allantoin, citric acid and an unknown) showed high importance but no significant correlation with Δglucose. We assume that these metabolites are responsible for the increase in accuracy. Therefore, we conclude that not only linear but more complex relations may exist between metabolites and the fasting glucose development. Moreover, this assumption of complexity is underlined by the selected metabolites themselves and their location in biochemical pathways. The identified metabolites of the pattern are part of multiple metabolic pathways, e.g. purine degradation, energy metabolism and amino acid metabolism.


A distinct metabolic signature predicts development of fasting plasma glucose.

Hische M, Larhlimi A, Schwarz F, Fischer-Rosinský A, Bobbert T, Assmann A, Catchpole GS, Pfeiffer AF, Willmitzer L, Selbig J, Spranger J - J Clin Bioinforma (2012)

Random Forest feature selection. Iterative bisection of the number of metabolites by removing the 50% of metabolites with the smallest importance measure. The remaining metabolites were used to build the Random Forest regression model. Shown is the median cross-validation accuracy. The accuracy remains stable up to a pattern of nine metabolites.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3298809&req=5

Figure 2: Random Forest feature selection. Iterative bisection of the number of metabolites by removing the 50% of metabolites with the smallest importance measure. The remaining metabolites were used to build the Random Forest regression model. Shown is the median cross-validation accuracy. The accuracy remains stable up to a pattern of nine metabolites.
Mentions: Thus, we performed a nested feature selection based on the Random Forest importance measure. To define a minimum number of metabolites necessary for accurate prediction of Δglucose, we stepwise bisected the number of metabolites. During this reduction the average model accuracy remained stable up to a pattern of nine metabolites (see Figure 2). Therefore, it is legitimate to use only the nine metabolites with the highest importance in the Random Forest model. These nine metabolites are shown in Table 3. The accuracy using these metabolites in a Random Forest regression model was 0.97. The median cross-validation accuracy was 0.47. Although the current selection of metabolites is smaller than the correlation based selection, the accuracy improved. Detailed examination of the two metabolite selections revealed an incomplete overlap. Metabolites highly correlated with Δglucose also showed a high Random Forest importance. However, some metabolites (e.g. the putative allantoin, citric acid and an unknown) showed high importance but no significant correlation with Δglucose. We assume that these metabolites are responsible for the increase in accuracy. Therefore, we conclude that not only linear but more complex relations may exist between metabolites and the fasting glucose development. Moreover, this assumption of complexity is underlined by the selected metabolites themselves and their location in biochemical pathways. The identified metabolites of the pattern are part of multiple metabolic pathways, e.g. purine degradation, energy metabolism and amino acid metabolism.

Bottom Line: We found a metabolic pattern consisting of nine metabolites that predicted fasting glucose development with an accuracy of 0.47 in tenfold cross-validation using Random Forest regression.We also showed that adding established risk markers did not improve the model accuracy.This result could only be captured by application of multivariate statistical approaches.

View Article: PubMed Central - HTML - PubMed

Affiliation: Clinic of Endocrinology, Diabetes and Nutrition, Charité-Universitätsmedizin Berlin, Charitéplatz 1, 10117 Berlin, Germany. joachim.spranger@charite.de.

ABSTRACT

Background: High blood glucose and diabetes are amongst the conditions causing the greatest losses in years of healthy life worldwide. Therefore, numerous studies aim to identify reliable risk markers for development of impaired glucose metabolism and type 2 diabetes. However, the molecular basis of impaired glucose metabolism is so far insufficiently understood. The development of so called 'omics' approaches in the recent years promises to identify molecular markers and to further understand the molecular basis of impaired glucose metabolism and type 2 diabetes. Although univariate statistical approaches are often applied, we demonstrate here that the application of multivariate statistical approaches is highly recommended to fully capture the complexity of data gained using high-throughput methods.

Methods: We took blood plasma samples from 172 subjects who participated in the prospective Metabolic Syndrome Berlin Potsdam follow-up study (MESY-BEPO Follow-up). We analysed these samples using Gas Chromatography coupled with Mass Spectrometry (GC-MS), and measured 286 metabolites. Furthermore, fasting glucose levels were measured using standard methods at baseline, and after an average of six years. We did correlation analysis and built linear regression models as well as Random Forest regression models to identify metabolites that predict the development of fasting glucose in our cohort.

Results: We found a metabolic pattern consisting of nine metabolites that predicted fasting glucose development with an accuracy of 0.47 in tenfold cross-validation using Random Forest regression. We also showed that adding established risk markers did not improve the model accuracy. However, external validation is eventually desirable. Although not all metabolites belonging to the final pattern are identified yet, the pattern directs attention to amino acid metabolism, energy metabolism and redox homeostasis.

Conclusions: We demonstrate that metabolites identified using a high-throughput method (GC-MS) perform well in predicting the development of fasting plasma glucose over several years. Notably, not single, but a complex pattern of metabolites propels the prediction and therefore reflects the complexity of the underlying molecular mechanisms. This result could only be captured by application of multivariate statistical approaches. Therefore, we highly recommend the usage of statistical methods that seize the complexity of the information given by high-throughput methods.

No MeSH data available.


Related in: MedlinePlus