Limits...
Visualizing spatial population structure with estimated effective migration surfaces.

Petkova D, Novembre J, Stephens M - Nat. Genet. (2015)

Bottom Line: We use the concept of 'effective migration' to model the relationship between genetics and geography.Our approach uses a population genetic model to relate effective migration rates to expected genetic dissimilarities.We illustrate its potential and limitations using simulations and data from elephant, human and Arabidopsis thaliana populations.

View Article: PubMed Central - PubMed

Affiliation: Department of Statistics, The University of Chicago, Chicago, Illinois, USA.

ABSTRACT
Genetic data often exhibit patterns broadly consistent with 'isolation by distance'-a phenomenon where genetic similarity decays with geographic distance. In a heterogeneous habitat, this may occur more quickly in some regions than in others: for example, barriers to gene flow can accelerate differentiation between neighboring groups. We use the concept of 'effective migration' to model the relationship between genetics and geography. In this paradigm, effective migration is low in regions where genetic similarity decays quickly. We present a method to visualize variation in effective migration across a habitat from geographically indexed genetic data. Our approach uses a population genetic model to relate effective migration rates to expected genetic dissimilarities. We illustrate its potential and limitations using simulations and data from elephant, human and Arabidopsis thaliana populations. The resulting visualizations highlight important spatial features of population structure that are difficult to discern using existing methods for summarizing genetic variation.

Show MeSH

Related in: MedlinePlus

Simulation comparing EEMS and PCA analysis. For each method, we show results for two migration scenarios, representing “uniform” migration and a “barrier” to migration, and three different sampling schemes. (a,b) The true underlying migration rates under the two scenarios; colors represent relative migration rates. (c) The three sampling schemes used; the size of the circle at each node is proportional to the number of individuals sampled at that location, and locations are color-coded to facilitate cross-referencing the EEMS and PCA results. (d) PCA results. (e) EEMS results. In contrast to PCA, EEMS is robust to the sampling scheme and shows clear qualitative differences between the estimated effective migration rates under the two scenarios, which reflect the underlying simulation truth.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4696895&req=5

Figure 2: Simulation comparing EEMS and PCA analysis. For each method, we show results for two migration scenarios, representing “uniform” migration and a “barrier” to migration, and three different sampling schemes. (a,b) The true underlying migration rates under the two scenarios; colors represent relative migration rates. (c) The three sampling schemes used; the size of the circle at each node is proportional to the number of individuals sampled at that location, and locations are color-coded to facilitate cross-referencing the EEMS and PCA results. (d) PCA results. (e) EEMS results. In contrast to PCA, EEMS is robust to the sampling scheme and shows clear qualitative differences between the estimated effective migration rates under the two scenarios, which reflect the underlying simulation truth.

Mentions: We illustrate the benefits and limitations of EEMS with several simulations. We used the program ms29 to simulate data under two migration scenarios: in the “uniform” scenario, which represents pure isolation by distance, migration rates do not vary throughout the habitat (Fig. 2a); in the “barrier” scenario a central region with lower migration rates separates the east and the west of the habitat (Fig. 2b). We applied both EEMS and PCA to data generated under these scenarios and under three different sampling schemes (Fig. 2c). The results illustrate two key points. First, whatever the sampling scheme, the migration scenario is easier to discern from the EEMS contour plots (Fig. 2e) than from the PCA projections (Fig. 2d). For the isolation by distance situation, the surfaces are approximately uniform under all three sampling schemes, and for the barrier simulation, the surfaces highlight the barrier as an area of lower effective migration. In contrast, the simple nature of the underlying structure is not obvious from the PCA projections in either setting, and indeed, the PCA results for the different scenarios do not differ in an easily identifiable, systematic way. Second, EEMS is less sensitive to the underlying sampling scheme than PCA. Indeed, the inferred surfaces are qualitatively unaffected by sampling scheme, except in the extreme case where there are no samples taken on one side of the migration barrier. This renders the migration rates on that side of the barrier inestimable from the data, so that estimates in that region are driven by the prior which assumes no heterogeneity in migration rates. In contrast, PCA is heavily influenced by irregular sampling 19, 20, 21. For example, biased sampling and the presence of a barrier can both produce clusters in the PCA results (top row in Fig. 2d).


Visualizing spatial population structure with estimated effective migration surfaces.

Petkova D, Novembre J, Stephens M - Nat. Genet. (2015)

Simulation comparing EEMS and PCA analysis. For each method, we show results for two migration scenarios, representing “uniform” migration and a “barrier” to migration, and three different sampling schemes. (a,b) The true underlying migration rates under the two scenarios; colors represent relative migration rates. (c) The three sampling schemes used; the size of the circle at each node is proportional to the number of individuals sampled at that location, and locations are color-coded to facilitate cross-referencing the EEMS and PCA results. (d) PCA results. (e) EEMS results. In contrast to PCA, EEMS is robust to the sampling scheme and shows clear qualitative differences between the estimated effective migration rates under the two scenarios, which reflect the underlying simulation truth.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4696895&req=5

Figure 2: Simulation comparing EEMS and PCA analysis. For each method, we show results for two migration scenarios, representing “uniform” migration and a “barrier” to migration, and three different sampling schemes. (a,b) The true underlying migration rates under the two scenarios; colors represent relative migration rates. (c) The three sampling schemes used; the size of the circle at each node is proportional to the number of individuals sampled at that location, and locations are color-coded to facilitate cross-referencing the EEMS and PCA results. (d) PCA results. (e) EEMS results. In contrast to PCA, EEMS is robust to the sampling scheme and shows clear qualitative differences between the estimated effective migration rates under the two scenarios, which reflect the underlying simulation truth.
Mentions: We illustrate the benefits and limitations of EEMS with several simulations. We used the program ms29 to simulate data under two migration scenarios: in the “uniform” scenario, which represents pure isolation by distance, migration rates do not vary throughout the habitat (Fig. 2a); in the “barrier” scenario a central region with lower migration rates separates the east and the west of the habitat (Fig. 2b). We applied both EEMS and PCA to data generated under these scenarios and under three different sampling schemes (Fig. 2c). The results illustrate two key points. First, whatever the sampling scheme, the migration scenario is easier to discern from the EEMS contour plots (Fig. 2e) than from the PCA projections (Fig. 2d). For the isolation by distance situation, the surfaces are approximately uniform under all three sampling schemes, and for the barrier simulation, the surfaces highlight the barrier as an area of lower effective migration. In contrast, the simple nature of the underlying structure is not obvious from the PCA projections in either setting, and indeed, the PCA results for the different scenarios do not differ in an easily identifiable, systematic way. Second, EEMS is less sensitive to the underlying sampling scheme than PCA. Indeed, the inferred surfaces are qualitatively unaffected by sampling scheme, except in the extreme case where there are no samples taken on one side of the migration barrier. This renders the migration rates on that side of the barrier inestimable from the data, so that estimates in that region are driven by the prior which assumes no heterogeneity in migration rates. In contrast, PCA is heavily influenced by irregular sampling 19, 20, 21. For example, biased sampling and the presence of a barrier can both produce clusters in the PCA results (top row in Fig. 2d).

Bottom Line: We use the concept of 'effective migration' to model the relationship between genetics and geography.Our approach uses a population genetic model to relate effective migration rates to expected genetic dissimilarities.We illustrate its potential and limitations using simulations and data from elephant, human and Arabidopsis thaliana populations.

View Article: PubMed Central - PubMed

Affiliation: Department of Statistics, The University of Chicago, Chicago, Illinois, USA.

ABSTRACT
Genetic data often exhibit patterns broadly consistent with 'isolation by distance'-a phenomenon where genetic similarity decays with geographic distance. In a heterogeneous habitat, this may occur more quickly in some regions than in others: for example, barriers to gene flow can accelerate differentiation between neighboring groups. We use the concept of 'effective migration' to model the relationship between genetics and geography. In this paradigm, effective migration is low in regions where genetic similarity decays quickly. We present a method to visualize variation in effective migration across a habitat from geographically indexed genetic data. Our approach uses a population genetic model to relate effective migration rates to expected genetic dissimilarities. We illustrate its potential and limitations using simulations and data from elephant, human and Arabidopsis thaliana populations. The resulting visualizations highlight important spatial features of population structure that are difficult to discern using existing methods for summarizing genetic variation.

Show MeSH
Related in: MedlinePlus