Limits...
Analyses of a set of 128 ancestry informative single-nucleotide polymorphisms in a global set of 119 population samples.

Kidd JR, Friedlaender FR, Speed WC, Pakstis AJ, De La Vega FM, Kidd KK - Investig Genet (2011)

Bottom Line: However, to estimate ancestry, including possible admixture within an individual, as well as heterogeneity within a group of individuals, allele frequencies are necessary for what are believed to be the contributing populations.At the same time, the limitations of the panel for distinguishing ancestry and quantifying admixture among Eurasian populations are noted.We demonstrate the simultaneous importance of the specific set of population samples and their relative sample sizes in the use of the structure program to determine which groups cluster together and consequently influence the ability of a marker panel to infer ancestry.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Genetics, Yale University School of Medicine, 333 Cedar Street, New Haven, CT 06510, USA. Judith.Kidd@yale.edu.

ABSTRACT

Background: Using DNA to determine an individual's ancestry from among human populations is generally interesting and useful for many purposes, including admixture mapping, controlling for population structure in disease or trait association studies and forensic ancestry inference. However, to estimate ancestry, including possible admixture within an individual, as well as heterogeneity within a group of individuals, allele frequencies are necessary for what are believed to be the contributing populations. For this purpose, panels of ancestry informative markers (AIMs) have been developed.

Results: We are presenting our work on one such panel, composed of 128 ancestry informative single-nucleotide polymorphisms (AISNPs) already proposed in the literature. Compared to previous studies of these AISNPs, we have studied three times the number of individuals (4,871) in three times as many population samples (119). We have validated this panel for many ancestry assignment and admixture studies, especially those that were the rationale for the original selection of the 128 SNPs: African Americans and Mexican Americans. At the same time, the limitations of the panel for distinguishing ancestry and quantifying admixture among Eurasian populations are noted.

Conclusion: We demonstrate the simultaneous importance of the specific set of population samples and their relative sample sizes in the use of the structure program to determine which groups cluster together and consequently influence the ability of a marker panel to infer ancestry. We demonstrate the strengths and weaknesses of this particular panel of AISNPs in a global context.

No MeSH data available.


Related in: MedlinePlus

Comparisons of Fst distributions for the 128 ancestry informative single-nucleotide polymorphisms (AISNPs) and for a reference set of 2327 SNPs.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3025953&req=5

Figure 1: Comparisons of Fst distributions for the 128 ancestry informative single-nucleotide polymorphisms (AISNPs) and for a reference set of 2327 SNPs.

Mentions: The Fst distribution of the 128 AISNPs was compared to the distribution of 2,327 non-AISNPs typed in our lab (Figure 1 and Additional file 3). Although Kosoy et al. [6] selected their 128 AISNPs not on the basis of Fst, but rather on the Informativeness statistic (In) of Rosenberg et al. [38,39], Fst clearly separates the two distributions by 1.25 standard deviations. The hypothesis that the two distributions are the same is rejected with a probability considerably less than 0.001. Outliers in the two distributions are given in Additional file 4. At the high-Fst end of the distributions, there are nine sites with Fst greater than 0.48: seven are in the reference distribution, and two are in the AISNP distribution. Of the seven in the reference distribution, five are located in or near genes of known phenotypic effect (SLC24A5, OCA2 (two SNPs), HERC2 and EDAR), and each of these genes is well known to have SNPs with marked global variation in allele frequency; but the best "known" SNPs are not part of this 128 AISNP set (Additional file 4). Though not associated with a phenotype, the remaining two "outliers" in the reference distribution have comparably high Fst values (Additional file 4). The two outliers at the high end of the AISNP distribution are sites in or near EDAR (rs260690, Fst = 0.5205) and RTTN (rs4891825, Fst = 0.5176). There are 10 outliers at the low end of the reference Fst distribution with Fst <0.04. Only one of the AISNPs falls below the mode of the reference distribution: TWGS1 (rs4798812, Fst = 0.08753).


Analyses of a set of 128 ancestry informative single-nucleotide polymorphisms in a global set of 119 population samples.

Kidd JR, Friedlaender FR, Speed WC, Pakstis AJ, De La Vega FM, Kidd KK - Investig Genet (2011)

Comparisons of Fst distributions for the 128 ancestry informative single-nucleotide polymorphisms (AISNPs) and for a reference set of 2327 SNPs.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3025953&req=5

Figure 1: Comparisons of Fst distributions for the 128 ancestry informative single-nucleotide polymorphisms (AISNPs) and for a reference set of 2327 SNPs.
Mentions: The Fst distribution of the 128 AISNPs was compared to the distribution of 2,327 non-AISNPs typed in our lab (Figure 1 and Additional file 3). Although Kosoy et al. [6] selected their 128 AISNPs not on the basis of Fst, but rather on the Informativeness statistic (In) of Rosenberg et al. [38,39], Fst clearly separates the two distributions by 1.25 standard deviations. The hypothesis that the two distributions are the same is rejected with a probability considerably less than 0.001. Outliers in the two distributions are given in Additional file 4. At the high-Fst end of the distributions, there are nine sites with Fst greater than 0.48: seven are in the reference distribution, and two are in the AISNP distribution. Of the seven in the reference distribution, five are located in or near genes of known phenotypic effect (SLC24A5, OCA2 (two SNPs), HERC2 and EDAR), and each of these genes is well known to have SNPs with marked global variation in allele frequency; but the best "known" SNPs are not part of this 128 AISNP set (Additional file 4). Though not associated with a phenotype, the remaining two "outliers" in the reference distribution have comparably high Fst values (Additional file 4). The two outliers at the high end of the AISNP distribution are sites in or near EDAR (rs260690, Fst = 0.5205) and RTTN (rs4891825, Fst = 0.5176). There are 10 outliers at the low end of the reference Fst distribution with Fst <0.04. Only one of the AISNPs falls below the mode of the reference distribution: TWGS1 (rs4798812, Fst = 0.08753).

Bottom Line: However, to estimate ancestry, including possible admixture within an individual, as well as heterogeneity within a group of individuals, allele frequencies are necessary for what are believed to be the contributing populations.At the same time, the limitations of the panel for distinguishing ancestry and quantifying admixture among Eurasian populations are noted.We demonstrate the simultaneous importance of the specific set of population samples and their relative sample sizes in the use of the structure program to determine which groups cluster together and consequently influence the ability of a marker panel to infer ancestry.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Genetics, Yale University School of Medicine, 333 Cedar Street, New Haven, CT 06510, USA. Judith.Kidd@yale.edu.

ABSTRACT

Background: Using DNA to determine an individual's ancestry from among human populations is generally interesting and useful for many purposes, including admixture mapping, controlling for population structure in disease or trait association studies and forensic ancestry inference. However, to estimate ancestry, including possible admixture within an individual, as well as heterogeneity within a group of individuals, allele frequencies are necessary for what are believed to be the contributing populations. For this purpose, panels of ancestry informative markers (AIMs) have been developed.

Results: We are presenting our work on one such panel, composed of 128 ancestry informative single-nucleotide polymorphisms (AISNPs) already proposed in the literature. Compared to previous studies of these AISNPs, we have studied three times the number of individuals (4,871) in three times as many population samples (119). We have validated this panel for many ancestry assignment and admixture studies, especially those that were the rationale for the original selection of the 128 SNPs: African Americans and Mexican Americans. At the same time, the limitations of the panel for distinguishing ancestry and quantifying admixture among Eurasian populations are noted.

Conclusion: We demonstrate the simultaneous importance of the specific set of population samples and their relative sample sizes in the use of the structure program to determine which groups cluster together and consequently influence the ability of a marker panel to infer ancestry. We demonstrate the strengths and weaknesses of this particular panel of AISNPs in a global context.

No MeSH data available.


Related in: MedlinePlus