Limits...
Caution in interpreting results from imputation analysis when linkage disequilibrium extends over a large distance: a case study on venous thrombosis.

Germain M, Saut N, Oudot-Mellakh T, Letenneur L, Dupuy AM, Bertrand M, Alessi MC, Lambert JC, Zelenika D, Emmerich J, Tiret L, Cambien F, Lathrop M, Amouyel P, Morange PE, Trégouët DA - PLoS ONE (2012)

Bottom Line: A comprehensive linkage disequilibrium and haplotype analysis of the whole locus where twelve SNPs exhibited association p-values lower than 2.23 10(-11) and the use of independent case-control samples demonstrated that the culprit variant was a rare variant located ~1 Mb away from the original hits, not tagged by current genome-wide genotyping arrays and even not well imputed in the original GWAS samples.This variant was in fact the rs1799963, also known as the FII G20210A prothrombin mutation.This work may be of major interest not only for its scientific impact but also for its methodological findings.

View Article: PubMed Central - PubMed

Affiliation: INSERM UMR_S 937, ICAN Institute, Université Pierre et Marie Curie, Paris, France.

ABSTRACT
By applying an imputation strategy based on the 1000 Genomes project to two genome-wide association studies (GWAS), we detected a susceptibility locus for venous thrombosis on chromosome 11p11.2 that was missed by previous GWAS analyses that had been conducted on the same datasets. A comprehensive linkage disequilibrium and haplotype analysis of the whole locus where twelve SNPs exhibited association p-values lower than 2.23 10(-11) and the use of independent case-control samples demonstrated that the culprit variant was a rare variant located ~1 Mb away from the original hits, not tagged by current genome-wide genotyping arrays and even not well imputed in the original GWAS samples. This variant was in fact the rs1799963, also known as the FII G20210A prothrombin mutation. This work may be of major interest not only for its scientific impact but also for its methodological findings.

Show MeSH

Related in: MedlinePlus

Box-Plot representation of the imputation quality (r2) according to the minor allele frequency of the SNPs inferred from 1000G 2010-08 release.Box-plot derived from the imputation analysis of the largest GWAS (2).
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3366937&req=5

pone-0038538-g005: Box-Plot representation of the imputation quality (r2) according to the minor allele frequency of the SNPs inferred from 1000G 2010-08 release.Box-plot derived from the imputation analysis of the largest GWAS (2).

Mentions: By conducting an updated and comprehensive in-depth analysis of two GWAS, we were able to “re-discover” a strong risk locus for VT known for more than one decade [15], the F2 gene, but missed by all large scale association studies conducted so far on the disease [1], [2], [16]. Several conclusions can be drawn from this work. First, it adds to the rather limited illustrative literature about the interest of imputation-based GWAS analyses using the 1000 Genomes project that can help identify rare variants in new disease-associated loci not detected by the first waves of GWAS; Second, the functional variant could be quite far away from the detected hits. In our example, the original association signal mapped to an interval from 47,373,425 bp (MYBPC3) to 48,064,194 bp (PTPRJ) on chromosome 11, and this is up to 1.3 Mb away from the functional G20210A mutation. Would PTPRJ have been a plausible biological candidate for VT, our quest for the culprit variant could have led us to a dead end; Third, a functional variant could be missed if its imputation quality is low which would likely be the case for a non genotyped rare variant showing low to modest pairwise LD with other SNPs in its neighborhood. As shown in Figure 5, imputation quality was satisfactory for SNPs with inferred MAF >0.01. About 75% of the SNPs with MAF <0.01 demonstrated poor imputation quality in our study while ∼82% of the SNPs with MAF >0.01 were correctly (r2>0.3) imputed. Rare variants, like FII G20210A, which are not present on genotyping arrays can nevertheless be tagged by haplotypes generated from common SNPs not necessarily in strong LD with each other. A similar phenomenon was previously observed at the LPA locus associated with coronary artery disease [13]. From a population genetics perspective, it would be interesting to investigate whether evolutionary selection forces could be exerted on the F2 locus as suspected for the F5 gene [17], [18] and could explain why a functional “deleterious” allele was maintained on long-range haplotype.


Caution in interpreting results from imputation analysis when linkage disequilibrium extends over a large distance: a case study on venous thrombosis.

Germain M, Saut N, Oudot-Mellakh T, Letenneur L, Dupuy AM, Bertrand M, Alessi MC, Lambert JC, Zelenika D, Emmerich J, Tiret L, Cambien F, Lathrop M, Amouyel P, Morange PE, Trégouët DA - PLoS ONE (2012)

Box-Plot representation of the imputation quality (r2) according to the minor allele frequency of the SNPs inferred from 1000G 2010-08 release.Box-plot derived from the imputation analysis of the largest GWAS (2).
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3366937&req=5

pone-0038538-g005: Box-Plot representation of the imputation quality (r2) according to the minor allele frequency of the SNPs inferred from 1000G 2010-08 release.Box-plot derived from the imputation analysis of the largest GWAS (2).
Mentions: By conducting an updated and comprehensive in-depth analysis of two GWAS, we were able to “re-discover” a strong risk locus for VT known for more than one decade [15], the F2 gene, but missed by all large scale association studies conducted so far on the disease [1], [2], [16]. Several conclusions can be drawn from this work. First, it adds to the rather limited illustrative literature about the interest of imputation-based GWAS analyses using the 1000 Genomes project that can help identify rare variants in new disease-associated loci not detected by the first waves of GWAS; Second, the functional variant could be quite far away from the detected hits. In our example, the original association signal mapped to an interval from 47,373,425 bp (MYBPC3) to 48,064,194 bp (PTPRJ) on chromosome 11, and this is up to 1.3 Mb away from the functional G20210A mutation. Would PTPRJ have been a plausible biological candidate for VT, our quest for the culprit variant could have led us to a dead end; Third, a functional variant could be missed if its imputation quality is low which would likely be the case for a non genotyped rare variant showing low to modest pairwise LD with other SNPs in its neighborhood. As shown in Figure 5, imputation quality was satisfactory for SNPs with inferred MAF >0.01. About 75% of the SNPs with MAF <0.01 demonstrated poor imputation quality in our study while ∼82% of the SNPs with MAF >0.01 were correctly (r2>0.3) imputed. Rare variants, like FII G20210A, which are not present on genotyping arrays can nevertheless be tagged by haplotypes generated from common SNPs not necessarily in strong LD with each other. A similar phenomenon was previously observed at the LPA locus associated with coronary artery disease [13]. From a population genetics perspective, it would be interesting to investigate whether evolutionary selection forces could be exerted on the F2 locus as suspected for the F5 gene [17], [18] and could explain why a functional “deleterious” allele was maintained on long-range haplotype.

Bottom Line: A comprehensive linkage disequilibrium and haplotype analysis of the whole locus where twelve SNPs exhibited association p-values lower than 2.23 10(-11) and the use of independent case-control samples demonstrated that the culprit variant was a rare variant located ~1 Mb away from the original hits, not tagged by current genome-wide genotyping arrays and even not well imputed in the original GWAS samples.This variant was in fact the rs1799963, also known as the FII G20210A prothrombin mutation.This work may be of major interest not only for its scientific impact but also for its methodological findings.

View Article: PubMed Central - PubMed

Affiliation: INSERM UMR_S 937, ICAN Institute, Université Pierre et Marie Curie, Paris, France.

ABSTRACT
By applying an imputation strategy based on the 1000 Genomes project to two genome-wide association studies (GWAS), we detected a susceptibility locus for venous thrombosis on chromosome 11p11.2 that was missed by previous GWAS analyses that had been conducted on the same datasets. A comprehensive linkage disequilibrium and haplotype analysis of the whole locus where twelve SNPs exhibited association p-values lower than 2.23 10(-11) and the use of independent case-control samples demonstrated that the culprit variant was a rare variant located ~1 Mb away from the original hits, not tagged by current genome-wide genotyping arrays and even not well imputed in the original GWAS samples. This variant was in fact the rs1799963, also known as the FII G20210A prothrombin mutation. This work may be of major interest not only for its scientific impact but also for its methodological findings.

Show MeSH
Related in: MedlinePlus