Limits...
Molecular phenotyping of a UK population: defining the human serum metabolome.

Dunn WB, Lin W, Broadhurst D, Begley P, Brown M, Zelena E, Vaughan AA, Halsall A, Harding N, Knowles JD, Francis-McIntyre S, Tseng A, Ellis DI, O'Hagan S, Aarons G, Benjamin B, Chew-Graham S, Moseley C, Potter P, Winder CL, Potts C, Thornton P, McWhirter C, Zubair M, Pan M, Burns A, Cruickshank JK, Jayson GC, Purandare N, Wu FC, Finn JD, Haselden JN, Nicholls AW, Wilson ID, Goodacre R, Kell DB - Metabolomics (2014)

Bottom Line: Overall, this is a large scale and non-targeted chromatographic MS-based metabolomics study, using samples from over 1,000 individuals, to provide a comprehensive measurement of their serum metabolomes.This work provides an important baseline or reference dataset for understanding the 'normal' relative concentrations and variation in the human serum metabolome.These may be related to our increasing knowledge of the human metabolic network map.

View Article: PubMed Central - PubMed

Affiliation: Faculty of Engineering and Physical Sciences, School of Chemistry, Manchester Institute of Biotechnology, The University of Manchester, Manchester, M1 7DN UK ; Faculty of Engineering & Physical Sciences, Manchester Centre for Integrative Systems Biology, Manchester Institute of Biotechnology, The University of Manchester, Manchester, M1 7DN UK ; Faculty of Medical and Human Sciences, Centre for Endocrinology and Diabetes, Institute of Human Development, The University of Manchester, Manchester, UK ; Centre for Advanced Discovery and Experimental Therapeutics (CADET), Central Manchester University Hospitals NHS Foundation Trust, Manchester Academic Health Sciences Centre, Manchester, M13 9WL UK ; School of Biosciences, University of Birmingham, Edgbaston, Birmingham B15 2TT, UK.

ABSTRACT

Phenotyping of 1,200 'healthy' adults from the UK has been performed through the investigation of diverse classes of hydrophilic and lipophilic metabolites present in serum by applying a series of chromatography-mass spectrometry platforms. These data were made robust to instrumental drift by numerical correction; this was prerequisite to allow detection of subtle metabolic differences. The variation in observed metabolite relative concentrations between the 1,200 subjects ranged from less than 5 % to more than 200 %. Variations in metabolites could be related to differences in gender, age, BMI, blood pressure, and smoking. Investigations suggest that a sample size of 600 subjects is both necessary and sufficient for robust analysis of these data. Overall, this is a large scale and non-targeted chromatographic MS-based metabolomics study, using samples from over 1,000 individuals, to provide a comprehensive measurement of their serum metabolomes. This work provides an important baseline or reference dataset for understanding the 'normal' relative concentrations and variation in the human serum metabolome. These may be related to our increasing knowledge of the human metabolic network map. Information on the Husermet study is available at http://www.husermet.org/. Importantly, all of the data are made freely available at MetaboLights (http://www.ebi.ac.uk/metabolights/).

No MeSH data available.


Classification analysis to assess sample size effects. The accuracy rate of discrimination with 95 % confidence intervals for data acquired applying UPLC–MS(+) for the three parameters of age (age <50 vs. age >65), BMI (BMI <25 vs. BMI >30) and gender (male vs. female). A Random Forest (RF) classifier was employed and 100 bootstrap sample sets were used for the assessment of classification accuracy
© Copyright Policy - OpenAccess
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4289517&req=5

Fig3: Classification analysis to assess sample size effects. The accuracy rate of discrimination with 95 % confidence intervals for data acquired applying UPLC–MS(+) for the three parameters of age (age <50 vs. age >65), BMI (BMI <25 vs. BMI >30) and gender (male vs. female). A Random Forest (RF) classifier was employed and 100 bootstrap sample sets were used for the assessment of classification accuracy

Mentions: It is becoming increasingly evident that many biological studies are underpowered with regard to their ability to come to a robust and statistically significant and justifiable biological conclusion (Broadhurst and Kell 2006; Button et al. 2013; Dunn et al. 2011; Dunn et al. 2012; Ioannidis 2005; Ioannidis and Panagiotou 2011). It is obvious that sample size in metabolomic studies is an important aspect of experimental design, especially in terms of applying metabolites as predictive biomarkers. Although these issues have been addressed in theory [see Xia et al. (2013) for a detailed discussion], to our knowledge, no previous large-scale studies have assessed the influence of sample size. Thus, we studied the effect of sample size in terms of the prediction power of classification and the consistency of feature selection. The experimental design was to divide the whole sample population into several subsets for classification and feature selection. The results of these subsets are used to select the smallest subset which has an acceptable performance, comparing this with the whole sample population in both classification and feature selection. Three groups, viz. age, gender and BMI for the three analytical platforms, have been used to evaluate the effects of sample size. Sample size is defined as the sum of samples in both classes in a binary classification and in this study the number of samples in each class was not equivalent (see Supplementary Table 1). In an ideal study the number of samples in each class would be balanced. Figure 3 shows the prediction accuracy using Random Forests (RF) with a 95 % confidence interval in the three groups (age, gender and BMI) for UPLC–MS positive ion mode. At low sample sizes the prediction accuracy was variable, but as the sample size was increased the median accuracy also increased with concomitant decrease in variation. These data showed that a sample size of 600 was appropriate to achieve similar results to those of the whole sample population with the current dataset where we are looking for general (i.e. not disease-specific) changes and where the variation is expected to be lower than that for the comparison of two populations such as ones that are ‘healthy’ and ‘diseased’. A previous study based on NMR data has shown that sample sizes of low thousands of subjects offer sufficient statistical precision to detect biomarkers quantifying predisposition to disease, a different assessment to the one we have performed above (Nicholson et al. 2011). We emphasise that this highlights the requirement to include hundreds of samples in these types of studies but does not suggest that a sample size of 600 is appropriate for all studies [for detailed discussions on this subject see Xia et al. (2013)]. However, the trends observed for all analytical platforms suggested a higher sample size would still slightly increase the prediction accuracy. The same trends were also seen with UPLC–MS(−) as well as for GC–MS. Classification results with RF and Support Vector Machine (SVM) classifiers for all three platforms and the effects of sample size on feature selection are shown in Supplementary Figs. 3 and 4.Fig. 3


Molecular phenotyping of a UK population: defining the human serum metabolome.

Dunn WB, Lin W, Broadhurst D, Begley P, Brown M, Zelena E, Vaughan AA, Halsall A, Harding N, Knowles JD, Francis-McIntyre S, Tseng A, Ellis DI, O'Hagan S, Aarons G, Benjamin B, Chew-Graham S, Moseley C, Potter P, Winder CL, Potts C, Thornton P, McWhirter C, Zubair M, Pan M, Burns A, Cruickshank JK, Jayson GC, Purandare N, Wu FC, Finn JD, Haselden JN, Nicholls AW, Wilson ID, Goodacre R, Kell DB - Metabolomics (2014)

Classification analysis to assess sample size effects. The accuracy rate of discrimination with 95 % confidence intervals for data acquired applying UPLC–MS(+) for the three parameters of age (age <50 vs. age >65), BMI (BMI <25 vs. BMI >30) and gender (male vs. female). A Random Forest (RF) classifier was employed and 100 bootstrap sample sets were used for the assessment of classification accuracy
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4289517&req=5

Fig3: Classification analysis to assess sample size effects. The accuracy rate of discrimination with 95 % confidence intervals for data acquired applying UPLC–MS(+) for the three parameters of age (age <50 vs. age >65), BMI (BMI <25 vs. BMI >30) and gender (male vs. female). A Random Forest (RF) classifier was employed and 100 bootstrap sample sets were used for the assessment of classification accuracy
Mentions: It is becoming increasingly evident that many biological studies are underpowered with regard to their ability to come to a robust and statistically significant and justifiable biological conclusion (Broadhurst and Kell 2006; Button et al. 2013; Dunn et al. 2011; Dunn et al. 2012; Ioannidis 2005; Ioannidis and Panagiotou 2011). It is obvious that sample size in metabolomic studies is an important aspect of experimental design, especially in terms of applying metabolites as predictive biomarkers. Although these issues have been addressed in theory [see Xia et al. (2013) for a detailed discussion], to our knowledge, no previous large-scale studies have assessed the influence of sample size. Thus, we studied the effect of sample size in terms of the prediction power of classification and the consistency of feature selection. The experimental design was to divide the whole sample population into several subsets for classification and feature selection. The results of these subsets are used to select the smallest subset which has an acceptable performance, comparing this with the whole sample population in both classification and feature selection. Three groups, viz. age, gender and BMI for the three analytical platforms, have been used to evaluate the effects of sample size. Sample size is defined as the sum of samples in both classes in a binary classification and in this study the number of samples in each class was not equivalent (see Supplementary Table 1). In an ideal study the number of samples in each class would be balanced. Figure 3 shows the prediction accuracy using Random Forests (RF) with a 95 % confidence interval in the three groups (age, gender and BMI) for UPLC–MS positive ion mode. At low sample sizes the prediction accuracy was variable, but as the sample size was increased the median accuracy also increased with concomitant decrease in variation. These data showed that a sample size of 600 was appropriate to achieve similar results to those of the whole sample population with the current dataset where we are looking for general (i.e. not disease-specific) changes and where the variation is expected to be lower than that for the comparison of two populations such as ones that are ‘healthy’ and ‘diseased’. A previous study based on NMR data has shown that sample sizes of low thousands of subjects offer sufficient statistical precision to detect biomarkers quantifying predisposition to disease, a different assessment to the one we have performed above (Nicholson et al. 2011). We emphasise that this highlights the requirement to include hundreds of samples in these types of studies but does not suggest that a sample size of 600 is appropriate for all studies [for detailed discussions on this subject see Xia et al. (2013)]. However, the trends observed for all analytical platforms suggested a higher sample size would still slightly increase the prediction accuracy. The same trends were also seen with UPLC–MS(−) as well as for GC–MS. Classification results with RF and Support Vector Machine (SVM) classifiers for all three platforms and the effects of sample size on feature selection are shown in Supplementary Figs. 3 and 4.Fig. 3

Bottom Line: Overall, this is a large scale and non-targeted chromatographic MS-based metabolomics study, using samples from over 1,000 individuals, to provide a comprehensive measurement of their serum metabolomes.This work provides an important baseline or reference dataset for understanding the 'normal' relative concentrations and variation in the human serum metabolome.These may be related to our increasing knowledge of the human metabolic network map.

View Article: PubMed Central - PubMed

Affiliation: Faculty of Engineering and Physical Sciences, School of Chemistry, Manchester Institute of Biotechnology, The University of Manchester, Manchester, M1 7DN UK ; Faculty of Engineering & Physical Sciences, Manchester Centre for Integrative Systems Biology, Manchester Institute of Biotechnology, The University of Manchester, Manchester, M1 7DN UK ; Faculty of Medical and Human Sciences, Centre for Endocrinology and Diabetes, Institute of Human Development, The University of Manchester, Manchester, UK ; Centre for Advanced Discovery and Experimental Therapeutics (CADET), Central Manchester University Hospitals NHS Foundation Trust, Manchester Academic Health Sciences Centre, Manchester, M13 9WL UK ; School of Biosciences, University of Birmingham, Edgbaston, Birmingham B15 2TT, UK.

ABSTRACT

Phenotyping of 1,200 'healthy' adults from the UK has been performed through the investigation of diverse classes of hydrophilic and lipophilic metabolites present in serum by applying a series of chromatography-mass spectrometry platforms. These data were made robust to instrumental drift by numerical correction; this was prerequisite to allow detection of subtle metabolic differences. The variation in observed metabolite relative concentrations between the 1,200 subjects ranged from less than 5 % to more than 200 %. Variations in metabolites could be related to differences in gender, age, BMI, blood pressure, and smoking. Investigations suggest that a sample size of 600 subjects is both necessary and sufficient for robust analysis of these data. Overall, this is a large scale and non-targeted chromatographic MS-based metabolomics study, using samples from over 1,000 individuals, to provide a comprehensive measurement of their serum metabolomes. This work provides an important baseline or reference dataset for understanding the 'normal' relative concentrations and variation in the human serum metabolome. These may be related to our increasing knowledge of the human metabolic network map. Information on the Husermet study is available at http://www.husermet.org/. Importantly, all of the data are made freely available at MetaboLights (http://www.ebi.ac.uk/metabolights/).

No MeSH data available.