Limits...
A flexible and accurate genotype imputation method for the next generation of genome-wide association studies.

Howie BN, Donnelly P, Marchini J - PLoS Genet. (2009)

Bottom Line: We find that IMPUTE v2 attains higher accuracy than other methods when the HapMap provides the sole reference panel, but that the size of the panel constrains the improvements that can be made.We also find that imputation accuracy can be greatly enhanced by expanding the reference panel to contain thousands of chromosomes and that IMPUTE v2 outperforms other methods in this setting at both rare and common SNPs, with overall error rates that are 15%-20% lower than those of the closest competing method.One particularly challenging aspect of next-generation association studies is to integrate information across multiple reference panels genotyped on different sets of SNPs; we show that our approach to this problem has practical advantages over other suggested solutions.

View Article: PubMed Central - PubMed

Affiliation: Department of Statistics, University of Oxford, Oxford, UK.

ABSTRACT
Genotype imputation methods are now being widely used in the analysis of genome-wide association studies. Most imputation analyses to date have used the HapMap as a reference dataset, but new reference panels (such as controls genotyped on multiple SNP chips and densely typed samples from the 1,000 Genomes Project) will soon allow a broader range of SNPs to be imputed with higher accuracy, thereby increasing power. We describe a genotype imputation method (IMPUTE version 2) that is designed to address the challenges presented by these new datasets. The main innovation of our approach is a flexible modelling framework that increases accuracy and combines information across multiple reference panels while remaining computationally feasible. We find that IMPUTE v2 attains higher accuracy than other methods when the HapMap provides the sole reference panel, but that the size of the panel constrains the improvements that can be made. We also find that imputation accuracy can be greatly enhanced by expanding the reference panel to contain thousands of chromosomes and that IMPUTE v2 outperforms other methods in this setting at both rare and common SNPs, with overall error rates that are 15%-20% lower than those of the closest competing method. One particularly challenging aspect of next-generation association studies is to integrate information across multiple reference panels genotyped on different sets of SNPs; we show that our approach to this problem has practical advantages over other suggested solutions.

Show MeSH
Percentage discordance versus percentage missing genotypes for Scenario A dataset.(A) Full range of results, corresponding to calling thresholds from 0.33 to 0.99. (B) Magnified results for calling thresholds near 0.99. (C) Magnified results for calling thresholds near 0.33.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2689936&req=5

pgen-1000529-g003: Percentage discordance versus percentage missing genotypes for Scenario A dataset.(A) Full range of results, corresponding to calling thresholds from 0.33 to 0.99. (B) Magnified results for calling thresholds near 0.99. (C) Magnified results for calling thresholds near 0.33.

Mentions: The results of this analysis are shown in Figure 3. The x-axes in this figure display the discordance between imputed genotype calls and observed Illumina calls, which is a surrogate for imputation error rate; the y-axes display the percentage of genotypes for which no call was made. Each method's line is formed by considering several different calling thresholds for imputation posterior probabilities. For example, a certain number of maximum posterior probabilities will exceed a threshold of 0.9, and among these we can ask what percentage of the best-guess imputed genotypes disagree with the Illumina genotypes. This yields an x-coordinate, and the y-coordinate is simply the percentage of all imputed genotypes for which no posterior probability exceeds the threshold. We generated the lines on the plots by repeating these calculations for calling thresholds ranging from 0.33 to 0.99 for each method.


A flexible and accurate genotype imputation method for the next generation of genome-wide association studies.

Howie BN, Donnelly P, Marchini J - PLoS Genet. (2009)

Percentage discordance versus percentage missing genotypes for Scenario A dataset.(A) Full range of results, corresponding to calling thresholds from 0.33 to 0.99. (B) Magnified results for calling thresholds near 0.99. (C) Magnified results for calling thresholds near 0.33.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2689936&req=5

pgen-1000529-g003: Percentage discordance versus percentage missing genotypes for Scenario A dataset.(A) Full range of results, corresponding to calling thresholds from 0.33 to 0.99. (B) Magnified results for calling thresholds near 0.99. (C) Magnified results for calling thresholds near 0.33.
Mentions: The results of this analysis are shown in Figure 3. The x-axes in this figure display the discordance between imputed genotype calls and observed Illumina calls, which is a surrogate for imputation error rate; the y-axes display the percentage of genotypes for which no call was made. Each method's line is formed by considering several different calling thresholds for imputation posterior probabilities. For example, a certain number of maximum posterior probabilities will exceed a threshold of 0.9, and among these we can ask what percentage of the best-guess imputed genotypes disagree with the Illumina genotypes. This yields an x-coordinate, and the y-coordinate is simply the percentage of all imputed genotypes for which no posterior probability exceeds the threshold. We generated the lines on the plots by repeating these calculations for calling thresholds ranging from 0.33 to 0.99 for each method.

Bottom Line: We find that IMPUTE v2 attains higher accuracy than other methods when the HapMap provides the sole reference panel, but that the size of the panel constrains the improvements that can be made.We also find that imputation accuracy can be greatly enhanced by expanding the reference panel to contain thousands of chromosomes and that IMPUTE v2 outperforms other methods in this setting at both rare and common SNPs, with overall error rates that are 15%-20% lower than those of the closest competing method.One particularly challenging aspect of next-generation association studies is to integrate information across multiple reference panels genotyped on different sets of SNPs; we show that our approach to this problem has practical advantages over other suggested solutions.

View Article: PubMed Central - PubMed

Affiliation: Department of Statistics, University of Oxford, Oxford, UK.

ABSTRACT
Genotype imputation methods are now being widely used in the analysis of genome-wide association studies. Most imputation analyses to date have used the HapMap as a reference dataset, but new reference panels (such as controls genotyped on multiple SNP chips and densely typed samples from the 1,000 Genomes Project) will soon allow a broader range of SNPs to be imputed with higher accuracy, thereby increasing power. We describe a genotype imputation method (IMPUTE version 2) that is designed to address the challenges presented by these new datasets. The main innovation of our approach is a flexible modelling framework that increases accuracy and combines information across multiple reference panels while remaining computationally feasible. We find that IMPUTE v2 attains higher accuracy than other methods when the HapMap provides the sole reference panel, but that the size of the panel constrains the improvements that can be made. We also find that imputation accuracy can be greatly enhanced by expanding the reference panel to contain thousands of chromosomes and that IMPUTE v2 outperforms other methods in this setting at both rare and common SNPs, with overall error rates that are 15%-20% lower than those of the closest competing method. One particularly challenging aspect of next-generation association studies is to integrate information across multiple reference panels genotyped on different sets of SNPs; we show that our approach to this problem has practical advantages over other suggested solutions.

Show MeSH