Limits...
Mining the human phenome using allelic scores that index biological intermediates.

Evans DM, Brion MJ, Paternoster L, Kemp JP, McMahon G, Munafò M, Whitfield JB, Medland SE, Montgomery GW, GIANT ConsortiumCRP ConsortiumTAG ConsortiumTimpson NJ, St Pourcain B, Lawlor DA, Martin NG, Dehghan A, Hirschhorn J, Smith GD - PLoS Genet. (2013)

Bottom Line: We compared the explanatory ability of allelic scores in terms of their capacity to proxy for the intermediate of interest, and the extent to which they associated with disease.We found that allelic scores derived from known variants and allelic scores derived from hundreds of thousands of genetic markers explained significant portions of the variance in biological intermediates of interest, and many of these scores showed expected correlations with disease.We conclude that our method represents a simple way in which potentially tens of thousands of molecular phenotypes could be screened for causal relationships with disease without having to expensively measure these variables in individual disease collections.

View Article: PubMed Central - PubMed

Affiliation: MRC Integrative Epidemiology Unit, University of Bristol, Bristol, United Kingdom ; School of Social and Community Medicine, University of Bristol, Bristol, United Kingdom ; University of Queensland Diamantina Institute, Translational Research Institute, Brisbane, Queensland, Australia.

ABSTRACT
It is common practice in genome-wide association studies (GWAS) to focus on the relationship between disease risk and genetic variants one marker at a time. When relevant genes are identified it is often possible to implicate biological intermediates and pathways likely to be involved in disease aetiology. However, single genetic variants typically explain small amounts of disease risk. Our idea is to construct allelic scores that explain greater proportions of the variance in biological intermediates, and subsequently use these scores to data mine GWAS. To investigate the approach's properties, we indexed three biological intermediates where the results of large GWAS meta-analyses were available: body mass index, C-reactive protein and low density lipoprotein levels. We generated allelic scores in the Avon Longitudinal Study of Parents and Children, and in publicly available data from the first Wellcome Trust Case Control Consortium. We compared the explanatory ability of allelic scores in terms of their capacity to proxy for the intermediate of interest, and the extent to which they associated with disease. We found that allelic scores derived from known variants and allelic scores derived from hundreds of thousands of genetic markers explained significant portions of the variance in biological intermediates of interest, and many of these scores showed expected correlations with disease. Genome-wide allelic scores however tended to lack specificity suggesting that they should be used with caution and perhaps only to proxy biological intermediates for which there are no known individual variants. Power calculations confirm the feasibility of extending our strategy to the analysis of tens of thousands of molecular phenotypes in large genome-wide meta-analyses. We conclude that our method represents a simple way in which potentially tens of thousands of molecular phenotypes could be screened for causal relationships with disease without having to expensively measure these variables in individual disease collections.

Show MeSH

Related in: MedlinePlus

Association between polygene score and BMI measured at age nine in the ALSPAC cohort.Association between polygene score and BMI measured at age nine using different p-value thresholds for the construction of the score in ALSPAC children (N = 5819). The lines joining the circles display the results for allelic scores calculated by using genotyped variants from across the genome in either a weighted (unbroken line) or an unweighted (dashed line) fashion. The lines joining the triangles display scores calculated similarly but excluding all variants +/−1 MB around 32 known BMI variants, and using either a weighted (unbroken line) or unweighted (dashed line) strategy. The histogram in the background displays the number of SNPs involved in construction of the allelic score at each corresponding SNP inclusion threshold for the “All variants” condition.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3814299&req=5

pgen-1003919-g001: Association between polygene score and BMI measured at age nine in the ALSPAC cohort.Association between polygene score and BMI measured at age nine using different p-value thresholds for the construction of the score in ALSPAC children (N = 5819). The lines joining the circles display the results for allelic scores calculated by using genotyped variants from across the genome in either a weighted (unbroken line) or an unweighted (dashed line) fashion. The lines joining the triangles display scores calculated similarly but excluding all variants +/−1 MB around 32 known BMI variants, and using either a weighted (unbroken line) or unweighted (dashed line) strategy. The histogram in the background displays the number of SNPs involved in construction of the allelic score at each corresponding SNP inclusion threshold for the “All variants” condition.

Mentions: Figures 1 through 3 display the proportion of variance in each of the different intermediate variables (i.e. BMI, CRP, and LDLc respectively) within the ALSPAC cohort explained by a genome-wide allelic score of variants constructed according to different SNP inclusion thresholds. Figure 1 shows the results for BMI when all the observed genotypes were used in calculation of the scores and when regions around known variants were excluded from construction of the scores. In the case of the genome-wide scores including the known regions, the weighted score explained from 2.3% to 4.9% of the phenotypic variance in BMI depending on the SNP inclusion threshold, whereas the unweighted score explained from 2.1% to 3.9% of the variance. The weighted score explained more of the phenotypic variance in BMI than the unweighted score across all SNP inclusion thresholds tested. In the case of the weighted score, the proportion of variance explained tended to be greatest when the SNP inclusion threshold was liberal (i.e. the more SNPs included in construction of the score the better). In contrast, the predictive ability of the unweighted score reached a maximum at the p<0.2 selection threshold, but decreased either side of this maximum as the threshold became more or less conservative. Constructing an allelic score using only the known variants explained 3.2% of the variance in BMI when weighted and 2.3% of the variance in BMI when using an unweighted score. Interestingly using known variants explained smaller amounts of the phenotypic variance than that explained by the best weighted genome-wide predictors- even with the known regions removed.


Mining the human phenome using allelic scores that index biological intermediates.

Evans DM, Brion MJ, Paternoster L, Kemp JP, McMahon G, Munafò M, Whitfield JB, Medland SE, Montgomery GW, GIANT ConsortiumCRP ConsortiumTAG ConsortiumTimpson NJ, St Pourcain B, Lawlor DA, Martin NG, Dehghan A, Hirschhorn J, Smith GD - PLoS Genet. (2013)

Association between polygene score and BMI measured at age nine in the ALSPAC cohort.Association between polygene score and BMI measured at age nine using different p-value thresholds for the construction of the score in ALSPAC children (N = 5819). The lines joining the circles display the results for allelic scores calculated by using genotyped variants from across the genome in either a weighted (unbroken line) or an unweighted (dashed line) fashion. The lines joining the triangles display scores calculated similarly but excluding all variants +/−1 MB around 32 known BMI variants, and using either a weighted (unbroken line) or unweighted (dashed line) strategy. The histogram in the background displays the number of SNPs involved in construction of the allelic score at each corresponding SNP inclusion threshold for the “All variants” condition.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3814299&req=5

pgen-1003919-g001: Association between polygene score and BMI measured at age nine in the ALSPAC cohort.Association between polygene score and BMI measured at age nine using different p-value thresholds for the construction of the score in ALSPAC children (N = 5819). The lines joining the circles display the results for allelic scores calculated by using genotyped variants from across the genome in either a weighted (unbroken line) or an unweighted (dashed line) fashion. The lines joining the triangles display scores calculated similarly but excluding all variants +/−1 MB around 32 known BMI variants, and using either a weighted (unbroken line) or unweighted (dashed line) strategy. The histogram in the background displays the number of SNPs involved in construction of the allelic score at each corresponding SNP inclusion threshold for the “All variants” condition.
Mentions: Figures 1 through 3 display the proportion of variance in each of the different intermediate variables (i.e. BMI, CRP, and LDLc respectively) within the ALSPAC cohort explained by a genome-wide allelic score of variants constructed according to different SNP inclusion thresholds. Figure 1 shows the results for BMI when all the observed genotypes were used in calculation of the scores and when regions around known variants were excluded from construction of the scores. In the case of the genome-wide scores including the known regions, the weighted score explained from 2.3% to 4.9% of the phenotypic variance in BMI depending on the SNP inclusion threshold, whereas the unweighted score explained from 2.1% to 3.9% of the variance. The weighted score explained more of the phenotypic variance in BMI than the unweighted score across all SNP inclusion thresholds tested. In the case of the weighted score, the proportion of variance explained tended to be greatest when the SNP inclusion threshold was liberal (i.e. the more SNPs included in construction of the score the better). In contrast, the predictive ability of the unweighted score reached a maximum at the p<0.2 selection threshold, but decreased either side of this maximum as the threshold became more or less conservative. Constructing an allelic score using only the known variants explained 3.2% of the variance in BMI when weighted and 2.3% of the variance in BMI when using an unweighted score. Interestingly using known variants explained smaller amounts of the phenotypic variance than that explained by the best weighted genome-wide predictors- even with the known regions removed.

Bottom Line: We compared the explanatory ability of allelic scores in terms of their capacity to proxy for the intermediate of interest, and the extent to which they associated with disease.We found that allelic scores derived from known variants and allelic scores derived from hundreds of thousands of genetic markers explained significant portions of the variance in biological intermediates of interest, and many of these scores showed expected correlations with disease.We conclude that our method represents a simple way in which potentially tens of thousands of molecular phenotypes could be screened for causal relationships with disease without having to expensively measure these variables in individual disease collections.

View Article: PubMed Central - PubMed

Affiliation: MRC Integrative Epidemiology Unit, University of Bristol, Bristol, United Kingdom ; School of Social and Community Medicine, University of Bristol, Bristol, United Kingdom ; University of Queensland Diamantina Institute, Translational Research Institute, Brisbane, Queensland, Australia.

ABSTRACT
It is common practice in genome-wide association studies (GWAS) to focus on the relationship between disease risk and genetic variants one marker at a time. When relevant genes are identified it is often possible to implicate biological intermediates and pathways likely to be involved in disease aetiology. However, single genetic variants typically explain small amounts of disease risk. Our idea is to construct allelic scores that explain greater proportions of the variance in biological intermediates, and subsequently use these scores to data mine GWAS. To investigate the approach's properties, we indexed three biological intermediates where the results of large GWAS meta-analyses were available: body mass index, C-reactive protein and low density lipoprotein levels. We generated allelic scores in the Avon Longitudinal Study of Parents and Children, and in publicly available data from the first Wellcome Trust Case Control Consortium. We compared the explanatory ability of allelic scores in terms of their capacity to proxy for the intermediate of interest, and the extent to which they associated with disease. We found that allelic scores derived from known variants and allelic scores derived from hundreds of thousands of genetic markers explained significant portions of the variance in biological intermediates of interest, and many of these scores showed expected correlations with disease. Genome-wide allelic scores however tended to lack specificity suggesting that they should be used with caution and perhaps only to proxy biological intermediates for which there are no known individual variants. Power calculations confirm the feasibility of extending our strategy to the analysis of tens of thousands of molecular phenotypes in large genome-wide meta-analyses. We conclude that our method represents a simple way in which potentially tens of thousands of molecular phenotypes could be screened for causal relationships with disease without having to expensively measure these variables in individual disease collections.

Show MeSH
Related in: MedlinePlus