Limits...
Caution in interpreting results from imputation analysis when linkage disequilibrium extends over a large distance: a case study on venous thrombosis.

Germain M, Saut N, Oudot-Mellakh T, Letenneur L, Dupuy AM, Bertrand M, Alessi MC, Lambert JC, Zelenika D, Emmerich J, Tiret L, Cambien F, Lathrop M, Amouyel P, Morange PE, Trégouët DA - PLoS ONE (2012)

Bottom Line: A comprehensive linkage disequilibrium and haplotype analysis of the whole locus where twelve SNPs exhibited association p-values lower than 2.23 10(-11) and the use of independent case-control samples demonstrated that the culprit variant was a rare variant located ~1 Mb away from the original hits, not tagged by current genome-wide genotyping arrays and even not well imputed in the original GWAS samples.This variant was in fact the rs1799963, also known as the FII G20210A prothrombin mutation.This work may be of major interest not only for its scientific impact but also for its methodological findings.

View Article: PubMed Central - PubMed

Affiliation: INSERM UMR_S 937, ICAN Institute, Université Pierre et Marie Curie, Paris, France.

ABSTRACT
By applying an imputation strategy based on the 1000 Genomes project to two genome-wide association studies (GWAS), we detected a susceptibility locus for venous thrombosis on chromosome 11p11.2 that was missed by previous GWAS analyses that had been conducted on the same datasets. A comprehensive linkage disequilibrium and haplotype analysis of the whole locus where twelve SNPs exhibited association p-values lower than 2.23 10(-11) and the use of independent case-control samples demonstrated that the culprit variant was a rare variant located ~1 Mb away from the original hits, not tagged by current genome-wide genotyping arrays and even not well imputed in the original GWAS samples. This variant was in fact the rs1799963, also known as the FII G20210A prothrombin mutation. This work may be of major interest not only for its scientific impact but also for its methodological findings.

Show MeSH

Related in: MedlinePlus

Pairwise linkage disequilibrium r2 between genotyped SNPs at the 11p11.2 locus over the 47,373,425–48,064,194 bp region in the second GWAS study (Germain et al. Plos One 2011).
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3366937&req=5

pone-0038538-g002: Pairwise linkage disequilibrium r2 between genotyped SNPs at the 11p11.2 locus over the 47,373,425–48,064,194 bp region in the second GWAS study (Germain et al. Plos One 2011).

Mentions: The imputation analysis was performed using MACH [11] (v1.0.16a) (http://www.sph.umich.edu/csg/abecasis/mach/) and Minimac (v4.4.3) (http://genome.sph.umich.edu/wki/Minimac) software. 6,754,935 SNPs were imputed with good imputation quality (r2>0.3) [11] in both GWAS. The allele frequency distribution of the imputed studied SNPs was shown in Figure S1. Association of imputed SNPs with VT was tested using a likelihood ratio test statistics implemented in the mach2dat (v1.0.18) software (http://www.sph.umich.edu/csg/abecasis/mach/) while adjusting for the first four principal components as described in [2], separately in each GWAS. The results obtained in the two GWAS were then combined using a fixed-effect meta-analysis based methodology implemented in the METAL software [12]. A Quantile-Quantile (QQ) plot representation of the results was shown in Figure S2 and the resulting genomic control factor was 0.993. 217 SNPs were found, at the genome-wide significant at the 7.4×10−9 level, consistently associated with VT across the two GWAS samples (Table S1). These VT-associated SNPs overlapped five loci on four chromosomes. Four of the loci mapped the aforementioned ABO, F5, F11 and FGG genes while a novel association involving the 11p11.2 locus (Table 1) was identified. Twelve SNPs, from position 47,373,425 to 48,064,194 (according to hg19 reference) and overlapping the MYBPC3-SPI1-CELF1-KBTBD4-NUP160-PTPRJ gene cluster demonstrated significant associations with VT, ranging from P = 6.97 10−13 to P = 2.23 10−11. All these SNPs had similar allele frequencies (∼3%) and similar genetic effects on VT risk (Odds Ratio (OR) ∼ 2.5) suggesting the existence of a strong linkage disequilibrium (LD) block (pairwise r2 or /D'/>0.8) explaining the association signal observed at the 11p11.2 locus. This hypothesis was supported by the results of a series of conditional logistic analyses showing that after adjusting on any of these SNPs all other observed associations at this locus vanished (P>0.10). Conversely, genotyped SNPs at this locus exhibited low to moderate LD (Figure 2) with median and 90th percentile of the pairwise r2 distribution being 0.10 and 0.52, respectively. We further investigated the haplotype structure derived from the genotyped SNPs using a previously described statistical methodology [13] based on the Stochastic-EM algorithm [14]. For this, an Akaike Information Criterion (AIC) based strategy was applied to our largest GWAS [2] in order to identify the most informative and parsimonious haplotypic model of 1 to 4 genotyped SNPs, mapping 47,300,000 to 48,100,000, with respect to disease risk prediction. The best identified combination included three SNPs, rs2856650, rs3740689 and rs10769258, that generated six haplotypes whose global distribution strongly differed between cases and controls, consistently in both GWAS (Table 2). The haplotypic association appeared to be mainly attributable to the uncommon AGT haplotype that was more frequent in cases than in controls. The Odds Ratio for VT associated with this AGT haplotype was 2.37 [1.36–4.15] (P = 2.39 10−3) and 2.99 [2.02–4.44] (P = 4.23 10−8) in the first and second GWAS, respectively) (Table 2). When haplotype analyses were adjusted for the imputed dose at any of the twelve SNPs, these ORs were reduced to 1.03 [0.41–2.58] (P = 0.950) and 0.96 [0.55–1.66] (P = 0.879) suggesting that the AGT haplotype actually tagged the rare alleles of the long range LD block.


Caution in interpreting results from imputation analysis when linkage disequilibrium extends over a large distance: a case study on venous thrombosis.

Germain M, Saut N, Oudot-Mellakh T, Letenneur L, Dupuy AM, Bertrand M, Alessi MC, Lambert JC, Zelenika D, Emmerich J, Tiret L, Cambien F, Lathrop M, Amouyel P, Morange PE, Trégouët DA - PLoS ONE (2012)

Pairwise linkage disequilibrium r2 between genotyped SNPs at the 11p11.2 locus over the 47,373,425–48,064,194 bp region in the second GWAS study (Germain et al. Plos One 2011).
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3366937&req=5

pone-0038538-g002: Pairwise linkage disequilibrium r2 between genotyped SNPs at the 11p11.2 locus over the 47,373,425–48,064,194 bp region in the second GWAS study (Germain et al. Plos One 2011).
Mentions: The imputation analysis was performed using MACH [11] (v1.0.16a) (http://www.sph.umich.edu/csg/abecasis/mach/) and Minimac (v4.4.3) (http://genome.sph.umich.edu/wki/Minimac) software. 6,754,935 SNPs were imputed with good imputation quality (r2>0.3) [11] in both GWAS. The allele frequency distribution of the imputed studied SNPs was shown in Figure S1. Association of imputed SNPs with VT was tested using a likelihood ratio test statistics implemented in the mach2dat (v1.0.18) software (http://www.sph.umich.edu/csg/abecasis/mach/) while adjusting for the first four principal components as described in [2], separately in each GWAS. The results obtained in the two GWAS were then combined using a fixed-effect meta-analysis based methodology implemented in the METAL software [12]. A Quantile-Quantile (QQ) plot representation of the results was shown in Figure S2 and the resulting genomic control factor was 0.993. 217 SNPs were found, at the genome-wide significant at the 7.4×10−9 level, consistently associated with VT across the two GWAS samples (Table S1). These VT-associated SNPs overlapped five loci on four chromosomes. Four of the loci mapped the aforementioned ABO, F5, F11 and FGG genes while a novel association involving the 11p11.2 locus (Table 1) was identified. Twelve SNPs, from position 47,373,425 to 48,064,194 (according to hg19 reference) and overlapping the MYBPC3-SPI1-CELF1-KBTBD4-NUP160-PTPRJ gene cluster demonstrated significant associations with VT, ranging from P = 6.97 10−13 to P = 2.23 10−11. All these SNPs had similar allele frequencies (∼3%) and similar genetic effects on VT risk (Odds Ratio (OR) ∼ 2.5) suggesting the existence of a strong linkage disequilibrium (LD) block (pairwise r2 or /D'/>0.8) explaining the association signal observed at the 11p11.2 locus. This hypothesis was supported by the results of a series of conditional logistic analyses showing that after adjusting on any of these SNPs all other observed associations at this locus vanished (P>0.10). Conversely, genotyped SNPs at this locus exhibited low to moderate LD (Figure 2) with median and 90th percentile of the pairwise r2 distribution being 0.10 and 0.52, respectively. We further investigated the haplotype structure derived from the genotyped SNPs using a previously described statistical methodology [13] based on the Stochastic-EM algorithm [14]. For this, an Akaike Information Criterion (AIC) based strategy was applied to our largest GWAS [2] in order to identify the most informative and parsimonious haplotypic model of 1 to 4 genotyped SNPs, mapping 47,300,000 to 48,100,000, with respect to disease risk prediction. The best identified combination included three SNPs, rs2856650, rs3740689 and rs10769258, that generated six haplotypes whose global distribution strongly differed between cases and controls, consistently in both GWAS (Table 2). The haplotypic association appeared to be mainly attributable to the uncommon AGT haplotype that was more frequent in cases than in controls. The Odds Ratio for VT associated with this AGT haplotype was 2.37 [1.36–4.15] (P = 2.39 10−3) and 2.99 [2.02–4.44] (P = 4.23 10−8) in the first and second GWAS, respectively) (Table 2). When haplotype analyses were adjusted for the imputed dose at any of the twelve SNPs, these ORs were reduced to 1.03 [0.41–2.58] (P = 0.950) and 0.96 [0.55–1.66] (P = 0.879) suggesting that the AGT haplotype actually tagged the rare alleles of the long range LD block.

Bottom Line: A comprehensive linkage disequilibrium and haplotype analysis of the whole locus where twelve SNPs exhibited association p-values lower than 2.23 10(-11) and the use of independent case-control samples demonstrated that the culprit variant was a rare variant located ~1 Mb away from the original hits, not tagged by current genome-wide genotyping arrays and even not well imputed in the original GWAS samples.This variant was in fact the rs1799963, also known as the FII G20210A prothrombin mutation.This work may be of major interest not only for its scientific impact but also for its methodological findings.

View Article: PubMed Central - PubMed

Affiliation: INSERM UMR_S 937, ICAN Institute, Université Pierre et Marie Curie, Paris, France.

ABSTRACT
By applying an imputation strategy based on the 1000 Genomes project to two genome-wide association studies (GWAS), we detected a susceptibility locus for venous thrombosis on chromosome 11p11.2 that was missed by previous GWAS analyses that had been conducted on the same datasets. A comprehensive linkage disequilibrium and haplotype analysis of the whole locus where twelve SNPs exhibited association p-values lower than 2.23 10(-11) and the use of independent case-control samples demonstrated that the culprit variant was a rare variant located ~1 Mb away from the original hits, not tagged by current genome-wide genotyping arrays and even not well imputed in the original GWAS samples. This variant was in fact the rs1799963, also known as the FII G20210A prothrombin mutation. This work may be of major interest not only for its scientific impact but also for its methodological findings.

Show MeSH
Related in: MedlinePlus