Limits...
Correction of population stratification in large multi-ethnic association studies.

Serre D, Montpetit A, Paré G, Engert JC, Yusuf S, Keavney B, Hudson TJ, Anand S - PLoS ONE (2008)

Bottom Line: The vast majority of genetic risk factors for complex diseases have, taken individually, a small effect on the end phenotype.Population-based association studies therefore need very large sample sizes to detect significant differences between affected and non-affected individuals.Our results highlight the importance of carefully addressing population stratification and of carefully "cleaning" the sample prior to analyses to obtain stronger signals of association and to avoid spurious results.

View Article: PubMed Central - PubMed

Affiliation: Genome Quebec Innovation Centre, McGill University, Montreal, Quebec, Canada.

ABSTRACT

Background: The vast majority of genetic risk factors for complex diseases have, taken individually, a small effect on the end phenotype. Population-based association studies therefore need very large sample sizes to detect significant differences between affected and non-affected individuals. Including thousands of affected individuals in a study requires recruitment in numerous centers, possibly from different geographic regions. Unfortunately such a recruitment strategy is likely to complicate the study design and to generate concerns regarding population stratification.

Methodology/principal findings: We analyzed 9,751 individuals representing three main ethnic groups - Europeans, Arabs and South Asians - that had been enrolled from 154 centers involving 52 countries for a global case/control study of acute myocardial infarction. All individuals were genotyped at 103 candidate genes using 1,536 SNPs selected with a tagging strategy that captures most of the genetic diversity in different populations. We show that relying solely on self-reported ethnicity is not sufficient to exclude population stratification and we present additional methods to identify and correct for stratification.

Conclusions/significance: Our results highlight the importance of carefully addressing population stratification and of carefully "cleaning" the sample prior to analyses to obtain stronger signals of association and to avoid spurious results.

Show MeSH

Related in: MedlinePlus

Distribution of pair-wise allele sharing among the INTERHEART                            European individuals.The graph shows the QQ plot of the distribution of all pair-wise measures                            of allele sharing against a normal distribution (the red line displays                            the expectation). The green line shows to the empirical cut-off used to                            identify related individuals (correspond to an allele sharing larger                            than 83%). The deviation on the left-hand side of the graph                            (i.e. low allele sharing) corresponds to pairs of individuals                            originating from different sub-populations.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2198793&req=5

pone-0001382-g001: Distribution of pair-wise allele sharing among the INTERHEART European individuals.The graph shows the QQ plot of the distribution of all pair-wise measures of allele sharing against a normal distribution (the red line displays the expectation). The green line shows to the empirical cut-off used to identify related individuals (correspond to an allele sharing larger than 83%). The deviation on the left-hand side of the graph (i.e. low allele sharing) corresponds to pairs of individuals originating from different sub-populations.

Mentions: To estimate whether the datasets made of individuals of a same self-reported ethnicity were roughly genetically homogenous, we calculated in each population sample the proportion of shared alleles between every pair of individuals. If individuals are sampled randomly from a homogeneous random-mating population, we expect every individual to be, on average, equally distant genetically from everybody else (since information from many unlinked loci is summarized). We thus plotted the distribution of allele sharing for all pair-wise comparisons within each population sample against a normal distribution (see Figure 1 for the European individuals and Supplemental Figure S3 for the other two datasets). Overall, the distributions appear roughly normal (i.e., we obtain a straight line on the QQ-plot for most of the range) but with significant deviations on both extremes. We observed a dramatic deviation on the right-hand side of the graph for the pairs of individuals with a proportion of allele sharing larger than 0.83 that could indicate sampling of related individuals. The most extreme case in the European sample consists of identical or nearly identical (>99%) genotypes obtained from 16 pairs of supposedly different individuals. The great majority of the pairs with a high proportion of shared alleles (i.e. larger than 0.83) are composed of individuals recruited in the same center. Overall we identified 71 likely related individuals (39 pairs) in the European population sample, 75 (41 pairs) in the South Asian sample and 97 (60 pairs) in the Arab sample. To test whether these individuals were actually related, we randomly selected 87 individuals from pairs with a very high proportion of allele sharing (>0.83), after exclusion of identical or nearly identical DNAs (>0.99), and genotyped them at 99 microsatellite loci. Kinship analyses using the Bayesian approach implemented in ML-relate [10] identified the same pairs of related individuals, with different degrees of relatedness: 71 pairs of parent/offspring, 28 full-siblings and 12 half-siblings.


Correction of population stratification in large multi-ethnic association studies.

Serre D, Montpetit A, Paré G, Engert JC, Yusuf S, Keavney B, Hudson TJ, Anand S - PLoS ONE (2008)

Distribution of pair-wise allele sharing among the INTERHEART                            European individuals.The graph shows the QQ plot of the distribution of all pair-wise measures                            of allele sharing against a normal distribution (the red line displays                            the expectation). The green line shows to the empirical cut-off used to                            identify related individuals (correspond to an allele sharing larger                            than 83%). The deviation on the left-hand side of the graph                            (i.e. low allele sharing) corresponds to pairs of individuals                            originating from different sub-populations.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2198793&req=5

pone-0001382-g001: Distribution of pair-wise allele sharing among the INTERHEART European individuals.The graph shows the QQ plot of the distribution of all pair-wise measures of allele sharing against a normal distribution (the red line displays the expectation). The green line shows to the empirical cut-off used to identify related individuals (correspond to an allele sharing larger than 83%). The deviation on the left-hand side of the graph (i.e. low allele sharing) corresponds to pairs of individuals originating from different sub-populations.
Mentions: To estimate whether the datasets made of individuals of a same self-reported ethnicity were roughly genetically homogenous, we calculated in each population sample the proportion of shared alleles between every pair of individuals. If individuals are sampled randomly from a homogeneous random-mating population, we expect every individual to be, on average, equally distant genetically from everybody else (since information from many unlinked loci is summarized). We thus plotted the distribution of allele sharing for all pair-wise comparisons within each population sample against a normal distribution (see Figure 1 for the European individuals and Supplemental Figure S3 for the other two datasets). Overall, the distributions appear roughly normal (i.e., we obtain a straight line on the QQ-plot for most of the range) but with significant deviations on both extremes. We observed a dramatic deviation on the right-hand side of the graph for the pairs of individuals with a proportion of allele sharing larger than 0.83 that could indicate sampling of related individuals. The most extreme case in the European sample consists of identical or nearly identical (>99%) genotypes obtained from 16 pairs of supposedly different individuals. The great majority of the pairs with a high proportion of shared alleles (i.e. larger than 0.83) are composed of individuals recruited in the same center. Overall we identified 71 likely related individuals (39 pairs) in the European population sample, 75 (41 pairs) in the South Asian sample and 97 (60 pairs) in the Arab sample. To test whether these individuals were actually related, we randomly selected 87 individuals from pairs with a very high proportion of allele sharing (>0.83), after exclusion of identical or nearly identical DNAs (>0.99), and genotyped them at 99 microsatellite loci. Kinship analyses using the Bayesian approach implemented in ML-relate [10] identified the same pairs of related individuals, with different degrees of relatedness: 71 pairs of parent/offspring, 28 full-siblings and 12 half-siblings.

Bottom Line: The vast majority of genetic risk factors for complex diseases have, taken individually, a small effect on the end phenotype.Population-based association studies therefore need very large sample sizes to detect significant differences between affected and non-affected individuals.Our results highlight the importance of carefully addressing population stratification and of carefully "cleaning" the sample prior to analyses to obtain stronger signals of association and to avoid spurious results.

View Article: PubMed Central - PubMed

Affiliation: Genome Quebec Innovation Centre, McGill University, Montreal, Quebec, Canada.

ABSTRACT

Background: The vast majority of genetic risk factors for complex diseases have, taken individually, a small effect on the end phenotype. Population-based association studies therefore need very large sample sizes to detect significant differences between affected and non-affected individuals. Including thousands of affected individuals in a study requires recruitment in numerous centers, possibly from different geographic regions. Unfortunately such a recruitment strategy is likely to complicate the study design and to generate concerns regarding population stratification.

Methodology/principal findings: We analyzed 9,751 individuals representing three main ethnic groups - Europeans, Arabs and South Asians - that had been enrolled from 154 centers involving 52 countries for a global case/control study of acute myocardial infarction. All individuals were genotyped at 103 candidate genes using 1,536 SNPs selected with a tagging strategy that captures most of the genetic diversity in different populations. We show that relying solely on self-reported ethnicity is not sufficient to exclude population stratification and we present additional methods to identify and correct for stratification.

Conclusions/significance: Our results highlight the importance of carefully addressing population stratification and of carefully "cleaning" the sample prior to analyses to obtain stronger signals of association and to avoid spurious results.

Show MeSH
Related in: MedlinePlus