Limits...
Population structure and eigenanalysis.

Patterson N, Price AL, Reich D - PLoS Genet. (2006)

Bottom Line: Current methods for inferring population structure from genetic data do not provide formal significance tests for population differentiation.We place the method on a solid statistical footing, using results from modern statistics to develop formal significance tests.This means that we can predict the dataset size needed to detect structure.

View Article: PubMed Central - PubMed

Affiliation: Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America.

ABSTRACT
Current methods for inferring population structure from genetic data do not provide formal significance tests for population differentiation. We discuss an approach to studying population structure (principal components analysis) that was first applied to genetic data by Cavalli-Sforza and colleagues. We place the method on a solid statistical footing, using results from modern statistics to develop formal significance tests. We also uncover a general "phase change" phenomenon about the ability to detect structure in genetic data, which emerges from the statistical theory we use, and has an important implication for the ability to discover structure in genetic data: for a fixed but large dataset size, divergence between two populations (as measured, for example, by a statistic like FST) below a threshold is essentially undetectable, but a little above threshold, detection will be easy. This means that we can predict the dataset size needed to detect structure.

Show MeSH
Simulation of an Admixed PopulationWe show a simple demography generating an admixed population. Populations A,B,D trifurcated 100 generations ago, while population C is a recent admixture of A and B. Admixture weights for the proportion of population A in population C are Beta-distributed with parameters (3.5,1.5). Effective population sizes are 10,000.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC1713260&req=5

pgen-0020190-g007: Simulation of an Admixed PopulationWe show a simple demography generating an admixed population. Populations A,B,D trifurcated 100 generations ago, while population C is a recent admixture of A and B. Admixture weights for the proportion of population A in population C are Beta-distributed with parameters (3.5,1.5). Effective population sizes are 10,000.

Mentions: We generated simulated data, by taking a trifurcation between populations (A,B,D) 100 generations ago. Population C is a recent admixture of A and B. The mixing proportion of A in an individual from C is Beta-distributed B(3.5,1.5) so that the average contribution of population A in an individual of population C is .7 (see Figure 7). Effective population sizes are 10,000 for each population. We then simulated data for 10,000 unlinked markers (more details are in the Methods section). FST between any pair of A,B,D is .005. We are attempting to mimic the data of Figure 5, and chose to run our analysis on simulated samples from populations B,C,D, not using samples from A. We expect two significant eigenvalues corresponding to the splits of populations B,C, and D. If population A is included in the analysis, we also get just two significant eigenvalues, as predicted by theory. This is what is observed (unpublished data), with, as predicted, the admixed population not adding to the number of axes of variation (the third eigenvalue is not significant). In Figure 8 we show a plot of the first two eigenvectors. Note the dispersion of population C along a line. This is diagnostic of admixture. The resemblance of Figures 5 and 8 is striking.


Population structure and eigenanalysis.

Patterson N, Price AL, Reich D - PLoS Genet. (2006)

Simulation of an Admixed PopulationWe show a simple demography generating an admixed population. Populations A,B,D trifurcated 100 generations ago, while population C is a recent admixture of A and B. Admixture weights for the proportion of population A in population C are Beta-distributed with parameters (3.5,1.5). Effective population sizes are 10,000.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC1713260&req=5

pgen-0020190-g007: Simulation of an Admixed PopulationWe show a simple demography generating an admixed population. Populations A,B,D trifurcated 100 generations ago, while population C is a recent admixture of A and B. Admixture weights for the proportion of population A in population C are Beta-distributed with parameters (3.5,1.5). Effective population sizes are 10,000.
Mentions: We generated simulated data, by taking a trifurcation between populations (A,B,D) 100 generations ago. Population C is a recent admixture of A and B. The mixing proportion of A in an individual from C is Beta-distributed B(3.5,1.5) so that the average contribution of population A in an individual of population C is .7 (see Figure 7). Effective population sizes are 10,000 for each population. We then simulated data for 10,000 unlinked markers (more details are in the Methods section). FST between any pair of A,B,D is .005. We are attempting to mimic the data of Figure 5, and chose to run our analysis on simulated samples from populations B,C,D, not using samples from A. We expect two significant eigenvalues corresponding to the splits of populations B,C, and D. If population A is included in the analysis, we also get just two significant eigenvalues, as predicted by theory. This is what is observed (unpublished data), with, as predicted, the admixed population not adding to the number of axes of variation (the third eigenvalue is not significant). In Figure 8 we show a plot of the first two eigenvectors. Note the dispersion of population C along a line. This is diagnostic of admixture. The resemblance of Figures 5 and 8 is striking.

Bottom Line: Current methods for inferring population structure from genetic data do not provide formal significance tests for population differentiation.We place the method on a solid statistical footing, using results from modern statistics to develop formal significance tests.This means that we can predict the dataset size needed to detect structure.

View Article: PubMed Central - PubMed

Affiliation: Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America.

ABSTRACT
Current methods for inferring population structure from genetic data do not provide formal significance tests for population differentiation. We discuss an approach to studying population structure (principal components analysis) that was first applied to genetic data by Cavalli-Sforza and colleagues. We place the method on a solid statistical footing, using results from modern statistics to develop formal significance tests. We also uncover a general "phase change" phenomenon about the ability to detect structure in genetic data, which emerges from the statistical theory we use, and has an important implication for the ability to discover structure in genetic data: for a fixed but large dataset size, divergence between two populations (as measured, for example, by a statistic like FST) below a threshold is essentially undetectable, but a little above threshold, detection will be easy. This means that we can predict the dataset size needed to detect structure.

Show MeSH