Limits...
Challenges in conducting genome-wide association studies in highly admixed multi-ethnic populations: the Generation R Study.

Medina-Gomez C, Felix JF, Estrada K, Peters MJ, Herrera L, Kruithof CJ, Duijts L, Hofman A, van Duijn CM, Uitterlinden AG, Jaddoe VW, Rivadeneira F - Eur. J. Epidemiol. (2015)

Bottom Line: Genome-wide association studies (GWAS) have been successful in identifying loci associated with a wide range of complex human traits and diseases.However, the inclusion of other ethnic groups as well as admixed populations in GWAS studies is rapidly rising following the pressing need to extrapolate findings to non-European populations and to increase statistical power.Furthermore, we highlight a number of practical considerations and alternatives pertinent to the quality control and analysis of admixed GWAS data.

View Article: PubMed Central - PubMed

Affiliation: The Generation R Study Group, Erasmus University Medical Center, Rotterdam, The Netherlands.

ABSTRACT
Genome-wide association studies (GWAS) have been successful in identifying loci associated with a wide range of complex human traits and diseases. Up to now, the majority of GWAS have focused on European populations. However, the inclusion of other ethnic groups as well as admixed populations in GWAS studies is rapidly rising following the pressing need to extrapolate findings to non-European populations and to increase statistical power. In this paper, we describe the methodological steps surrounding genetic data generation, quality control, study design and analytical procedures needed to run GWAS in the multiethnic and highly admixed Generation R Study, a large prospective birth cohort in Rotterdam, the Netherlands. Furthermore, we highlight a number of practical considerations and alternatives pertinent to the quality control and analysis of admixed GWAS data.

Show MeSH
Flowchart overview of the entire GWAS QC process. Quality control of all samples from Generation R-1 and Generation R-2 after merging of the projects. Red font denotes exclusion of either SNPs or samples from the dataset in the different QC steps. (Color figure online)
© Copyright Policy - OpenAccess
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4385148&req=5

Fig1: Flowchart overview of the entire GWAS QC process. Quality control of all samples from Generation R-1 and Generation R-2 after merging of the projects. Red font denotes exclusion of either SNPs or samples from the dataset in the different QC steps. (Color figure online)

Mentions: Marker QC included filters for: (1) marker call rate (calling <0.2 – <0.05, --geno option), checked in two rounds, the initial with a threshold of 80 % and the second one more stringent (95 %), after inspection of sample quality, (2) minor allele frequency (MAF ≤ 0.001, --maf option), (3) differential missingness between the two projects (P < 1 × 10−7, --test-missing option) and (4) deviation from Hardy–Weinberg equilibrium proportion (P < 10−7--hwe option). Sample QC included: (1) duplicate detection (PLINK option IBS = 1), (2) sex discordance rates (--check-sex option), comparing the reported sex of each participant with the sex predicted by the genetic data (expected chromosome X heterozygosity). When results were inconclusive, the Genome Studio plots, log R ratios and B-allele frequencies, for both X and Y chromosomes were inspected. (3) Genotype call rate (<0.05 – <0.025--mind option) checked in two rounds, the initial with a threshold of 95 % and the second one more stringent (97.5 %), after inspection of marker quality and (4) high heterozygosity rate, over 4 SD of the mean heterozygosity of all samples (--het option). The step by step summary of the applied QC pipeline is presented in Fig. 1, and Online Resources 1 and 2.Fig. 1


Challenges in conducting genome-wide association studies in highly admixed multi-ethnic populations: the Generation R Study.

Medina-Gomez C, Felix JF, Estrada K, Peters MJ, Herrera L, Kruithof CJ, Duijts L, Hofman A, van Duijn CM, Uitterlinden AG, Jaddoe VW, Rivadeneira F - Eur. J. Epidemiol. (2015)

Flowchart overview of the entire GWAS QC process. Quality control of all samples from Generation R-1 and Generation R-2 after merging of the projects. Red font denotes exclusion of either SNPs or samples from the dataset in the different QC steps. (Color figure online)
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4385148&req=5

Fig1: Flowchart overview of the entire GWAS QC process. Quality control of all samples from Generation R-1 and Generation R-2 after merging of the projects. Red font denotes exclusion of either SNPs or samples from the dataset in the different QC steps. (Color figure online)
Mentions: Marker QC included filters for: (1) marker call rate (calling <0.2 – <0.05, --geno option), checked in two rounds, the initial with a threshold of 80 % and the second one more stringent (95 %), after inspection of sample quality, (2) minor allele frequency (MAF ≤ 0.001, --maf option), (3) differential missingness between the two projects (P < 1 × 10−7, --test-missing option) and (4) deviation from Hardy–Weinberg equilibrium proportion (P < 10−7--hwe option). Sample QC included: (1) duplicate detection (PLINK option IBS = 1), (2) sex discordance rates (--check-sex option), comparing the reported sex of each participant with the sex predicted by the genetic data (expected chromosome X heterozygosity). When results were inconclusive, the Genome Studio plots, log R ratios and B-allele frequencies, for both X and Y chromosomes were inspected. (3) Genotype call rate (<0.05 – <0.025--mind option) checked in two rounds, the initial with a threshold of 95 % and the second one more stringent (97.5 %), after inspection of marker quality and (4) high heterozygosity rate, over 4 SD of the mean heterozygosity of all samples (--het option). The step by step summary of the applied QC pipeline is presented in Fig. 1, and Online Resources 1 and 2.Fig. 1

Bottom Line: Genome-wide association studies (GWAS) have been successful in identifying loci associated with a wide range of complex human traits and diseases.However, the inclusion of other ethnic groups as well as admixed populations in GWAS studies is rapidly rising following the pressing need to extrapolate findings to non-European populations and to increase statistical power.Furthermore, we highlight a number of practical considerations and alternatives pertinent to the quality control and analysis of admixed GWAS data.

View Article: PubMed Central - PubMed

Affiliation: The Generation R Study Group, Erasmus University Medical Center, Rotterdam, The Netherlands.

ABSTRACT
Genome-wide association studies (GWAS) have been successful in identifying loci associated with a wide range of complex human traits and diseases. Up to now, the majority of GWAS have focused on European populations. However, the inclusion of other ethnic groups as well as admixed populations in GWAS studies is rapidly rising following the pressing need to extrapolate findings to non-European populations and to increase statistical power. In this paper, we describe the methodological steps surrounding genetic data generation, quality control, study design and analytical procedures needed to run GWAS in the multiethnic and highly admixed Generation R Study, a large prospective birth cohort in Rotterdam, the Netherlands. Furthermore, we highlight a number of practical considerations and alternatives pertinent to the quality control and analysis of admixed GWAS data.

Show MeSH