Limits...
Challenges in conducting genome-wide association studies in highly admixed multi-ethnic populations: the Generation R Study.

Medina-Gomez C, Felix JF, Estrada K, Peters MJ, Herrera L, Kruithof CJ, Duijts L, Hofman A, van Duijn CM, Uitterlinden AG, Jaddoe VW, Rivadeneira F - Eur. J. Epidemiol. (2015)

Bottom Line: Genome-wide association studies (GWAS) have been successful in identifying loci associated with a wide range of complex human traits and diseases.However, the inclusion of other ethnic groups as well as admixed populations in GWAS studies is rapidly rising following the pressing need to extrapolate findings to non-European populations and to increase statistical power.Furthermore, we highlight a number of practical considerations and alternatives pertinent to the quality control and analysis of admixed GWAS data.

View Article: PubMed Central - PubMed

Affiliation: The Generation R Study Group, Erasmus University Medical Center, Rotterdam, The Netherlands.

ABSTRACT
Genome-wide association studies (GWAS) have been successful in identifying loci associated with a wide range of complex human traits and diseases. Up to now, the majority of GWAS have focused on European populations. However, the inclusion of other ethnic groups as well as admixed populations in GWAS studies is rapidly rising following the pressing need to extrapolate findings to non-European populations and to increase statistical power. In this paper, we describe the methodological steps surrounding genetic data generation, quality control, study design and analytical procedures needed to run GWAS in the multiethnic and highly admixed Generation R Study, a large prospective birth cohort in Rotterdam, the Netherlands. Furthermore, we highlight a number of practical considerations and alternatives pertinent to the quality control and analysis of admixed GWAS data.

Show MeSH
Imputation Quality metrics evaluation 1KG. a Boxplots of the MACH Rsq in function of the MAF of the imputed SNPs. b Imputation quality distribution per MAF category. Blue and green denotes the poorly and well imputed SNPs based in a 0.3 quality score as threshold. 8,263,752 out of 30,072,738 (27.4 %) are poorly imputed SNPs (Rsq < 0.3)
© Copyright Policy - OpenAccess
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4385148&req=5

Fig4: Imputation Quality metrics evaluation 1KG. a Boxplots of the MACH Rsq in function of the MAF of the imputed SNPs. b Imputation quality distribution per MAF category. Blue and green denotes the poorly and well imputed SNPs based in a 0.3 quality score as threshold. 8,263,752 out of 30,072,738 (27.4 %) are poorly imputed SNPs (Rsq < 0.3)

Mentions: We were able to impute 30,072,738 autosomal variants using the 1KG reference panel, in which 28,681,763 are SNPs and 1,390,975 are insertion/deletions. The mean Rsq for all variants was 0.574 (median 0.622, IQR = 0.636); when markers with MAF < 0.01 were excluded (comprising 18,804,120 SNPs or 62.52 % of the markers), the mean Rsq increased to 0.815 (median 0.929, IQR = 0.244). Figure 4 shows an assessment of imputation accuracy by MAF. Although imputation quality was poor in the lower spectrum of allele frequencies (MAF < 0.05), 15,164,960 markers had an Rsq ≥ 0.3 and were suitable for analysis. Moreover, the number of markers comprising bins of common frequency (6,894,397 markers with MAF > 0.05) is much lower than the number of markers comprising bins of low frequency (23,178,341 markers with MAF < 0.05), which usually have low imputation quality. Online Resource 6 summarizes the performance of the imputation per chromosome. The number of SNPs imputed on chromosome X was 1,264,877, of which 903,868 (71.5 %) were rare (MAF < 0.005). As expected, quality was not as high as for the autosomal chromosomes, as a consequence of the lower number of haplotypes contributed by men in this chromosome. Considering markers of sufficient imputation quality (Rsq ≥ 0.3) on the autosomal chromosomes only, the 1KG imputation resulted in 18,874,123 more markers than those arising from the HapMap imputations including 7,892,440 markers with a MAF > 0.01. There are minimal differences in imputation quality when comparing the 2,972,940 SNPs common across the two datasets [mean Rsq, 0.886 (median = 0.972, IQR = 0.123) for the HapMap imputed dataset against 0.903 (median = 0.978, IQR = 0.097) in the 1KG imputed dataset]. When further filtering markers for MAF > 0.01 and Rsq ≥ 0.3, (resulting in 2,671,742 SNPs) the concordance rate, based on best guess genotypes, between the Hapmap and the 1KG imputed datasets was 0.983 as calculated by PLINK (using the --merge-mode 7 option).Fig. 4


Challenges in conducting genome-wide association studies in highly admixed multi-ethnic populations: the Generation R Study.

Medina-Gomez C, Felix JF, Estrada K, Peters MJ, Herrera L, Kruithof CJ, Duijts L, Hofman A, van Duijn CM, Uitterlinden AG, Jaddoe VW, Rivadeneira F - Eur. J. Epidemiol. (2015)

Imputation Quality metrics evaluation 1KG. a Boxplots of the MACH Rsq in function of the MAF of the imputed SNPs. b Imputation quality distribution per MAF category. Blue and green denotes the poorly and well imputed SNPs based in a 0.3 quality score as threshold. 8,263,752 out of 30,072,738 (27.4 %) are poorly imputed SNPs (Rsq < 0.3)
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4385148&req=5

Fig4: Imputation Quality metrics evaluation 1KG. a Boxplots of the MACH Rsq in function of the MAF of the imputed SNPs. b Imputation quality distribution per MAF category. Blue and green denotes the poorly and well imputed SNPs based in a 0.3 quality score as threshold. 8,263,752 out of 30,072,738 (27.4 %) are poorly imputed SNPs (Rsq < 0.3)
Mentions: We were able to impute 30,072,738 autosomal variants using the 1KG reference panel, in which 28,681,763 are SNPs and 1,390,975 are insertion/deletions. The mean Rsq for all variants was 0.574 (median 0.622, IQR = 0.636); when markers with MAF < 0.01 were excluded (comprising 18,804,120 SNPs or 62.52 % of the markers), the mean Rsq increased to 0.815 (median 0.929, IQR = 0.244). Figure 4 shows an assessment of imputation accuracy by MAF. Although imputation quality was poor in the lower spectrum of allele frequencies (MAF < 0.05), 15,164,960 markers had an Rsq ≥ 0.3 and were suitable for analysis. Moreover, the number of markers comprising bins of common frequency (6,894,397 markers with MAF > 0.05) is much lower than the number of markers comprising bins of low frequency (23,178,341 markers with MAF < 0.05), which usually have low imputation quality. Online Resource 6 summarizes the performance of the imputation per chromosome. The number of SNPs imputed on chromosome X was 1,264,877, of which 903,868 (71.5 %) were rare (MAF < 0.005). As expected, quality was not as high as for the autosomal chromosomes, as a consequence of the lower number of haplotypes contributed by men in this chromosome. Considering markers of sufficient imputation quality (Rsq ≥ 0.3) on the autosomal chromosomes only, the 1KG imputation resulted in 18,874,123 more markers than those arising from the HapMap imputations including 7,892,440 markers with a MAF > 0.01. There are minimal differences in imputation quality when comparing the 2,972,940 SNPs common across the two datasets [mean Rsq, 0.886 (median = 0.972, IQR = 0.123) for the HapMap imputed dataset against 0.903 (median = 0.978, IQR = 0.097) in the 1KG imputed dataset]. When further filtering markers for MAF > 0.01 and Rsq ≥ 0.3, (resulting in 2,671,742 SNPs) the concordance rate, based on best guess genotypes, between the Hapmap and the 1KG imputed datasets was 0.983 as calculated by PLINK (using the --merge-mode 7 option).Fig. 4

Bottom Line: Genome-wide association studies (GWAS) have been successful in identifying loci associated with a wide range of complex human traits and diseases.However, the inclusion of other ethnic groups as well as admixed populations in GWAS studies is rapidly rising following the pressing need to extrapolate findings to non-European populations and to increase statistical power.Furthermore, we highlight a number of practical considerations and alternatives pertinent to the quality control and analysis of admixed GWAS data.

View Article: PubMed Central - PubMed

Affiliation: The Generation R Study Group, Erasmus University Medical Center, Rotterdam, The Netherlands.

ABSTRACT
Genome-wide association studies (GWAS) have been successful in identifying loci associated with a wide range of complex human traits and diseases. Up to now, the majority of GWAS have focused on European populations. However, the inclusion of other ethnic groups as well as admixed populations in GWAS studies is rapidly rising following the pressing need to extrapolate findings to non-European populations and to increase statistical power. In this paper, we describe the methodological steps surrounding genetic data generation, quality control, study design and analytical procedures needed to run GWAS in the multiethnic and highly admixed Generation R Study, a large prospective birth cohort in Rotterdam, the Netherlands. Furthermore, we highlight a number of practical considerations and alternatives pertinent to the quality control and analysis of admixed GWAS data.

Show MeSH