Limits...
AdmixKJump: identifying population structure in recently diverged groups.

O'Connor TD - Source Code Biol Med (2015)

Bottom Line: I also show that AdmixKJump is more accurate with fewer samples per population.Furthermore, in contrast to the cross-validation approach, AdmixKJump is able to detect the population split between the Finnish and Tuscan populations of the 1000 Genomes Project.AdmixKJump has more power to detect the number of populations in a cohort of samples with smaller sample sizes and shorter divergence times.

View Article: PubMed Central - PubMed

Affiliation: Institute for Genome Sciences, Program in Personalized and Genomic Medicine, Department of Medicine, University of Maryland School of Medicine, 801 W Baltimore St, Baltimore, 21201 MD USA.

ABSTRACT

Motivation: Correctly modeling population structure is important for understanding recent evolution and for association studies in humans. While pre-existing knowledge of population history can be used to specify expected levels of subdivision, objective metrics to detect population structure are important and may even be preferable for identifying groups in some situations. One such metric for genomic scale data is implemented in the cross-validation procedure of the program ADMIXTURE, but it has not been evaluated on recently diverged and potentially cryptic levels of population structure. Here, I develop a new method, AdmixKJump, and test both metrics under this scenario.

Findings: I show that AdmixKJump is more sensitive to recent population divisions compared to the cross-validation metric using both realistic simulations, as well as 1000 Genomes Project European genomic data. With two populations of 50 individuals each, AdmixKJump is able to detect two populations with 100% accuracy that split at least 10KYA, whereas cross-validation obtains this 100% level at 14KYA. I also show that AdmixKJump is more accurate with fewer samples per population. Furthermore, in contrast to the cross-validation approach, AdmixKJump is able to detect the population split between the Finnish and Tuscan populations of the 1000 Genomes Project.

Conclusion: AdmixKJump has more power to detect the number of populations in a cohort of samples with smaller sample sizes and shorter divergence times.

Availability: A java implementation can be found at https://sites.google.com/site/igsevolgenomicslab/home/downloads.

No MeSH data available.


Split time vs metric accuracy. The x-axis is a split time parameter added to the Human demographic model indicating the point when two populations start diverging. The y-axis has two labels, the first, Ancestry Accuracy, indicates how accurate the model parameters correctly cluster the two populations, where 50% accuracy is a random assignment. The second y-axis label indicates the % accuracy of AdmixKJump or cross-validation to correctly identify K∗=2 or two clusters. I am reporting population sample sizes of 10 (blue), 30 (red), and 50 (purple).
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4325960&req=5

Fig1: Split time vs metric accuracy. The x-axis is a split time parameter added to the Human demographic model indicating the point when two populations start diverging. The y-axis has two labels, the first, Ancestry Accuracy, indicates how accurate the model parameters correctly cluster the two populations, where 50% accuracy is a random assignment. The second y-axis label indicates the % accuracy of AdmixKJump or cross-validation to correctly identify K∗=2 or two clusters. I am reporting population sample sizes of 10 (blue), 30 (red), and 50 (purple).

Mentions: I also find that the new measure has more power with smaller sample sizes, for instance N=30 is 100% at 12KYA for AdmixKJump (see Figure 1).Figure 1


AdmixKJump: identifying population structure in recently diverged groups.

O'Connor TD - Source Code Biol Med (2015)

Split time vs metric accuracy. The x-axis is a split time parameter added to the Human demographic model indicating the point when two populations start diverging. The y-axis has two labels, the first, Ancestry Accuracy, indicates how accurate the model parameters correctly cluster the two populations, where 50% accuracy is a random assignment. The second y-axis label indicates the % accuracy of AdmixKJump or cross-validation to correctly identify K∗=2 or two clusters. I am reporting population sample sizes of 10 (blue), 30 (red), and 50 (purple).
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4325960&req=5

Fig1: Split time vs metric accuracy. The x-axis is a split time parameter added to the Human demographic model indicating the point when two populations start diverging. The y-axis has two labels, the first, Ancestry Accuracy, indicates how accurate the model parameters correctly cluster the two populations, where 50% accuracy is a random assignment. The second y-axis label indicates the % accuracy of AdmixKJump or cross-validation to correctly identify K∗=2 or two clusters. I am reporting population sample sizes of 10 (blue), 30 (red), and 50 (purple).
Mentions: I also find that the new measure has more power with smaller sample sizes, for instance N=30 is 100% at 12KYA for AdmixKJump (see Figure 1).Figure 1

Bottom Line: I also show that AdmixKJump is more accurate with fewer samples per population.Furthermore, in contrast to the cross-validation approach, AdmixKJump is able to detect the population split between the Finnish and Tuscan populations of the 1000 Genomes Project.AdmixKJump has more power to detect the number of populations in a cohort of samples with smaller sample sizes and shorter divergence times.

View Article: PubMed Central - PubMed

Affiliation: Institute for Genome Sciences, Program in Personalized and Genomic Medicine, Department of Medicine, University of Maryland School of Medicine, 801 W Baltimore St, Baltimore, 21201 MD USA.

ABSTRACT

Motivation: Correctly modeling population structure is important for understanding recent evolution and for association studies in humans. While pre-existing knowledge of population history can be used to specify expected levels of subdivision, objective metrics to detect population structure are important and may even be preferable for identifying groups in some situations. One such metric for genomic scale data is implemented in the cross-validation procedure of the program ADMIXTURE, but it has not been evaluated on recently diverged and potentially cryptic levels of population structure. Here, I develop a new method, AdmixKJump, and test both metrics under this scenario.

Findings: I show that AdmixKJump is more sensitive to recent population divisions compared to the cross-validation metric using both realistic simulations, as well as 1000 Genomes Project European genomic data. With two populations of 50 individuals each, AdmixKJump is able to detect two populations with 100% accuracy that split at least 10KYA, whereas cross-validation obtains this 100% level at 14KYA. I also show that AdmixKJump is more accurate with fewer samples per population. Furthermore, in contrast to the cross-validation approach, AdmixKJump is able to detect the population split between the Finnish and Tuscan populations of the 1000 Genomes Project.

Conclusion: AdmixKJump has more power to detect the number of populations in a cohort of samples with smaller sample sizes and shorter divergence times.

Availability: A java implementation can be found at https://sites.google.com/site/igsevolgenomicslab/home/downloads.

No MeSH data available.