Limits...
Imputation without doing imputation: a new method for the detection of non-genotyped causal variants.

Howey R, Cordell HJ - Genet. Epidemiol. (2014)

Bottom Line: This observation motivates popular but computationally intensive approaches based on imputation or haplotyping.These two SNPs are used as predictors in linear or logistic regression analysis to generate a final significance test.Previous analysis showed that fine-scale sequencing of a Gambian reference panel in the region of the known causal locus, followed by imputation, increased the signal of association to genome-wide significance levels.

View Article: PubMed Central - PubMed

Affiliation: Institute of Genetic Medicine, Newcastle University, International Centre for Life, Central Parkway, Newcastle upon Tyne, United Kingdom.

Show MeSH

Related in: MedlinePlus

Results from Scenario 4 (haplotype effects). The top plots shows bar plots of the calculated powers (for P-values 10−8, 10−6, and 10−4) and type I errors (for P-values 10−4, , and 10−3) for imputation (Imp), haplotype analysis (Hap), single-SNP logistic regression (LR), and the AI test. The bottom plots show examples of results from four separate power replicates. Gray crosses show the results obtained from imputation, black dots show the results obtained from AI, and triangles show the imputation results obtained at the two causal SNPs. The default window size of 10 SNPs was used in the AI test.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4150535&req=5

fig03: Results from Scenario 4 (haplotype effects). The top plots shows bar plots of the calculated powers (for P-values 10−8, 10−6, and 10−4) and type I errors (for P-values 10−4, , and 10−3) for imputation (Imp), haplotype analysis (Hap), single-SNP logistic regression (LR), and the AI test. The bottom plots show examples of results from four separate power replicates. Gray crosses show the results obtained from imputation, black dots show the results obtained from AI, and triangles show the imputation results obtained at the two causal SNPs. The default window size of 10 SNPs was used in the AI test.

Mentions: Figure 3 (top) shows the detection power and type I error for standard imputation, haplotype analysis, AI, and single-SNP logistic regression for data simulated under a more complicated scenario (Scenario 4), a haplotype model in which disease was assumed to be caused by a haplotype effect defined by two underlying causal nongenotyped SNPs, rs7926004, and rs10500679. Possession of a T-G haplotype at these two SNPs was assumed to increase the risk of disease by a factor of 1.8, while possession of a C-C haplotype at these two SNPs increased the risk of disease by a factor of 1.5, in comparison to haplotypes T-C and C-G. As in Scenarios 1-3, standard imputation, haplotype analysis and AI considerably outperform single-SNP logistic regression in terms of detection power, although the type I error is greater for imputation than for AI or haplotype analysis. We were not surprised to find that haplotype analysis performed well in this scenario, even though the causal SNPs were not genotyped. This result is loosely consistent with the results of Morris and Kaplan [2002], who showed that haplotype analysis performed better than single SNP testing when disease was attributable to multiple alleles at a single locus; from a statistical point of view, a haplotype effect at two unobserved loci could be considered statistically equivalent to multiple susceptibility alleles at a single unobserved locus. We were surprised that standard imputation showed such high power, as this scenario was specifically designed to encapsulate a situation where there is a strong haplotype effect that results in much weaker marginal effects at each of the contributing SNPs, when analyzed individually. The bottom plots of Figure 3 go some way toward explaining this slightly counter-intuitive result. In each of these example replicates, we see that both AI and standard imputation are able to capture a signal in the vicinity of the two causal SNPs. However, the signal captured by imputation is not, in fact, a signal at either of the causal SNPs, which have been well imputed but show only weak marginal effects, as expected. Rather, the imputation signal comes from other well-imputed SNPs in the region that presumably mark the causal haplotype. Just as haplotype analysis can capture the effect of an untyped causal variant through testing a haplotype that marks (i.e., is a good surrogate for) the untyped causal variant, it seems that imputation can capture the effect of an underlying causal haplotype through testing a SNP that marks this underlying causal haplotype. This is a potentially attractive property of imputation that has not, to our knowledge, been previously demonstrated. However, it does raise an important issue regarding the interpretation of imputation results. Given a set of imputation results such as those seen in the bottom plots of Figure 3, the usual interpretation would be that the signal is due to a causal effect at the top scoring SNP (or possibly due to a causal effect at another SNP in strong LD with the top scoring SNP). Our results demonstrate that this is by no means the only explanation for such a signal. Indeed, our results suggest that it is not possible, statistically speaking, to distinguish between this simple explanation and other, more complicated explanations; distinguishing between different possible explanations may require different types of experiment, based on different types of data.


Imputation without doing imputation: a new method for the detection of non-genotyped causal variants.

Howey R, Cordell HJ - Genet. Epidemiol. (2014)

Results from Scenario 4 (haplotype effects). The top plots shows bar plots of the calculated powers (for P-values 10−8, 10−6, and 10−4) and type I errors (for P-values 10−4, , and 10−3) for imputation (Imp), haplotype analysis (Hap), single-SNP logistic regression (LR), and the AI test. The bottom plots show examples of results from four separate power replicates. Gray crosses show the results obtained from imputation, black dots show the results obtained from AI, and triangles show the imputation results obtained at the two causal SNPs. The default window size of 10 SNPs was used in the AI test.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4150535&req=5

fig03: Results from Scenario 4 (haplotype effects). The top plots shows bar plots of the calculated powers (for P-values 10−8, 10−6, and 10−4) and type I errors (for P-values 10−4, , and 10−3) for imputation (Imp), haplotype analysis (Hap), single-SNP logistic regression (LR), and the AI test. The bottom plots show examples of results from four separate power replicates. Gray crosses show the results obtained from imputation, black dots show the results obtained from AI, and triangles show the imputation results obtained at the two causal SNPs. The default window size of 10 SNPs was used in the AI test.
Mentions: Figure 3 (top) shows the detection power and type I error for standard imputation, haplotype analysis, AI, and single-SNP logistic regression for data simulated under a more complicated scenario (Scenario 4), a haplotype model in which disease was assumed to be caused by a haplotype effect defined by two underlying causal nongenotyped SNPs, rs7926004, and rs10500679. Possession of a T-G haplotype at these two SNPs was assumed to increase the risk of disease by a factor of 1.8, while possession of a C-C haplotype at these two SNPs increased the risk of disease by a factor of 1.5, in comparison to haplotypes T-C and C-G. As in Scenarios 1-3, standard imputation, haplotype analysis and AI considerably outperform single-SNP logistic regression in terms of detection power, although the type I error is greater for imputation than for AI or haplotype analysis. We were not surprised to find that haplotype analysis performed well in this scenario, even though the causal SNPs were not genotyped. This result is loosely consistent with the results of Morris and Kaplan [2002], who showed that haplotype analysis performed better than single SNP testing when disease was attributable to multiple alleles at a single locus; from a statistical point of view, a haplotype effect at two unobserved loci could be considered statistically equivalent to multiple susceptibility alleles at a single unobserved locus. We were surprised that standard imputation showed such high power, as this scenario was specifically designed to encapsulate a situation where there is a strong haplotype effect that results in much weaker marginal effects at each of the contributing SNPs, when analyzed individually. The bottom plots of Figure 3 go some way toward explaining this slightly counter-intuitive result. In each of these example replicates, we see that both AI and standard imputation are able to capture a signal in the vicinity of the two causal SNPs. However, the signal captured by imputation is not, in fact, a signal at either of the causal SNPs, which have been well imputed but show only weak marginal effects, as expected. Rather, the imputation signal comes from other well-imputed SNPs in the region that presumably mark the causal haplotype. Just as haplotype analysis can capture the effect of an untyped causal variant through testing a haplotype that marks (i.e., is a good surrogate for) the untyped causal variant, it seems that imputation can capture the effect of an underlying causal haplotype through testing a SNP that marks this underlying causal haplotype. This is a potentially attractive property of imputation that has not, to our knowledge, been previously demonstrated. However, it does raise an important issue regarding the interpretation of imputation results. Given a set of imputation results such as those seen in the bottom plots of Figure 3, the usual interpretation would be that the signal is due to a causal effect at the top scoring SNP (or possibly due to a causal effect at another SNP in strong LD with the top scoring SNP). Our results demonstrate that this is by no means the only explanation for such a signal. Indeed, our results suggest that it is not possible, statistically speaking, to distinguish between this simple explanation and other, more complicated explanations; distinguishing between different possible explanations may require different types of experiment, based on different types of data.

Bottom Line: This observation motivates popular but computationally intensive approaches based on imputation or haplotyping.These two SNPs are used as predictors in linear or logistic regression analysis to generate a final significance test.Previous analysis showed that fine-scale sequencing of a Gambian reference panel in the region of the known causal locus, followed by imputation, increased the signal of association to genome-wide significance levels.

View Article: PubMed Central - PubMed

Affiliation: Institute of Genetic Medicine, Newcastle University, International Centre for Life, Central Parkway, Newcastle upon Tyne, United Kingdom.

Show MeSH
Related in: MedlinePlus