Imputation without doing imputation: a new method for the detection of non-genotyped causal variants.
Bottom Line: This observation motivates popular but computationally intensive approaches based on imputation or haplotyping.These two SNPs are used as predictors in linear or logistic regression analysis to generate a final significance test.Previous analysis showed that fine-scale sequencing of a Gambian reference panel in the region of the known causal locus, followed by imputation, increased the signal of association to genome-wide significance levels.
Affiliation: Institute of Genetic Medicine, Newcastle University, International Centre for Life, Central Parkway, Newcastle upon Tyne, United Kingdom.Show MeSH
Related in: MedlinePlus
Mentions: We carried out imputation (without prephasing) within the Gambian case/control samples, and within the cases and pseudocontrols derived from the Gambian case-parent trio samples, in the 4 Mb region around the known causal SNP (rs334) on chromosome 11. We used the program IMPUTE2 [Howie et al., 2009; Marchini et al., 2007] with data from the 1000 Genomes Project [1000 Genomes Project Consortium et al., 2012] (Phase I interim data, updated release April 2012) as a reference panel. 22,907 SNPs (in the case/control data) or 31,757 SNPs (in the case/pseudocontrol data) passing postimputation quality control (“info” score >0.5) from an original 66,754 imputed SNPs were analyzed using the “-method threshold” method in the program SNPTEST (allowing for the first three principal components as covariates in the case/control analysis) to test for association with disease status. Figure 7 shows the results from this analysis. Imputation followed by single-SNP analysis is able to improve the signal of association to in the case/control dataset (compared to seen previously when using real genotyped SNPs alone) and to in the trio dataset (compared to seen previously when using real genotyped SNPs alone). Thus, the signals detected through the AI test in SnipSnip could potentially have been detected through use of genome-wide imputation. However, imputation on a genome-wide scale is computationally demanding and requires careful postimputation quality control and filtering to remove untrustworthy results. An initial (much faster) analysis using SnipSnip could allow one to focus one's imputation efforts on the most promising regions before embarking on a full genome-wide imputation analysis. In the current example, imputation of this 4 Mb region (comprising 960 genotyped SNPs) in the 2,118 Gambian case/control individuals using IMPUTE2, followed by single-SNP analysis in SNPTEST, took approximately 18 h on our Linux system (not including the time required to reformat files, including performing a liftover from Build 36 to Build 37 positions in order to match up the study samples with the 1000 Genomes data); we estimate that to carry out the same analysis across the entire genome (328,399 genotyped SNPs) would have taken over 36 weeks. This time can, of course, be reduced substantially through pre-phasing [Howie et al., 2012] and/or implementation in parallel on a compute cluster (see whole-genome imputation results presented below), but still compares unfavorably with the 37 min (with covariates) or 44 sec (without covariates) taken by SnipSnip (which can also similarly be reduced through implementation in parallel) to perform a whole-genome analysis on the same samples.
Affiliation: Institute of Genetic Medicine, Newcastle University, International Centre for Life, Central Parkway, Newcastle upon Tyne, United Kingdom.