Limits...
Visualizing spatial population structure with estimated effective migration surfaces.

Petkova D, Novembre J, Stephens M - Nat. Genet. (2015)

Bottom Line: We use the concept of 'effective migration' to model the relationship between genetics and geography.Our approach uses a population genetic model to relate effective migration rates to expected genetic dissimilarities.We illustrate its potential and limitations using simulations and data from elephant, human and Arabidopsis thaliana populations.

View Article: PubMed Central - PubMed

Affiliation: Department of Statistics, The University of Chicago, Chicago, Illinois, USA.

ABSTRACT
Genetic data often exhibit patterns broadly consistent with 'isolation by distance'-a phenomenon where genetic similarity decays with geographic distance. In a heterogeneous habitat, this may occur more quickly in some regions than in others: for example, barriers to gene flow can accelerate differentiation between neighboring groups. We use the concept of 'effective migration' to model the relationship between genetics and geography. In this paradigm, effective migration is low in regions where genetic similarity decays quickly. We present a method to visualize variation in effective migration across a habitat from geographically indexed genetic data. Our approach uses a population genetic model to relate effective migration rates to expected genetic dissimilarities. We illustrate its potential and limitations using simulations and data from elephant, human and Arabidopsis thaliana populations. The resulting visualizations highlight important spatial features of population structure that are difficult to discern using existing methods for summarizing genetic variation.

Show MeSH

Related in: MedlinePlus

A schematic overview of EEMS, using African elephant data for illustration. (a–c) Setting up the population grid: (a) Samples are collected at known locations across a two-dimensional habitat; green and orange colors represent two subspecies – forest and savanna elephants. (b) A dense triangular grid is chosen to span the habitat. (c) Each sample is assigned to the closest deme on the grid. (d–f) Estimated Effective Migration Surface (EEMS) analysis: (d) Migration rates vary according to a Voronoi tessellation which partitions the habitat into “cells” with constant migration rate; colors represent relative rates of migration, ranging from low (orange) to high (blue). (e) Each edge has the same migration rate as the cell it falls into. The cell locations and migration rates are adjusted, using Bayesian inference, so that the expected genetic dissimilarities under the EEMS model matches the observed genetic dissimilarities. (f) The EEMS is a color contour plot produced by averaging draws from the posterior distribution of the migration rates, interpolating between grid points. Here, and in all other figures, log(m) denotes the effective migration rate on the log10 scale, relative to the overall migration rate across the habitat. (Thus log(m) = 1 corresponds to effective migration that is 10-fold faster than the average.) The main feature of the elephant EEMS is a “barrier” of low effective migration that separates the habitats of the two subspecies: forest elephants to the west, and savanna elephants to the north, south and east.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4696895&req=5

Figure 1: A schematic overview of EEMS, using African elephant data for illustration. (a–c) Setting up the population grid: (a) Samples are collected at known locations across a two-dimensional habitat; green and orange colors represent two subspecies – forest and savanna elephants. (b) A dense triangular grid is chosen to span the habitat. (c) Each sample is assigned to the closest deme on the grid. (d–f) Estimated Effective Migration Surface (EEMS) analysis: (d) Migration rates vary according to a Voronoi tessellation which partitions the habitat into “cells” with constant migration rate; colors represent relative rates of migration, ranging from low (orange) to high (blue). (e) Each edge has the same migration rate as the cell it falls into. The cell locations and migration rates are adjusted, using Bayesian inference, so that the expected genetic dissimilarities under the EEMS model matches the observed genetic dissimilarities. (f) The EEMS is a color contour plot produced by averaging draws from the posterior distribution of the migration rates, interpolating between grid points. Here, and in all other figures, log(m) denotes the effective migration rate on the log10 scale, relative to the overall migration rate across the habitat. (Thus log(m) = 1 corresponds to effective migration that is 10-fold faster than the average.) The main feature of the elephant EEMS is a “barrier” of low effective migration that separates the habitats of the two subspecies: forest elephants to the west, and savanna elephants to the north, south and east.

Mentions: Figure 1 provides a schematic overview of our approach. EEMS is based on the stepping stone model 28, in which individuals migrate locally between subpopulations (demes) and migration rates can vary by location. To capture continuous population structure, we cover the habitat with a dense regular grid; each deme exchanges migrants only with its neighbors. Under the stepping stone model, expected genetic dissimilarities depend on the sample locations and the migration rates. The expected genetic dissimilarity between two individuals can be computed by integrating over all possible migration histories in their genetic ancestry and we approximate it using resistance distance, a distance metric from circuit theory that integrates all possible migration routes between two demes 26. The estimation procedure adjusts the migration rates of all edges in the graph so that the genetic differences expected under the model closely match the genetic differences observed in the data; it also encourages nearby edges to have similar migration rates. The estimates are then interpolated across the habitat to produce an “estimated effective migration surface” – hence EEMS – which provides a visual summary of the observed genetic dissimilarities and how they relate to geographic location. For example, if genetic similarities tend to decay faster in some regions, those areas will have lower effective migration. If, on the other hand, the relationship between genetic similarity and geographic distance is the same throughout the habitat, the estimated surface will be relatively constant. We use the term “effective” because the model makes assumptions (most importantly, equilibrium in time) that may preclude interpreting effective migration as representing historical rates of gene flow. Nonetheless, we illustrate that the method provides an intuitive and informative way to visualize patterns of population structure in geo-referenced samples.


Visualizing spatial population structure with estimated effective migration surfaces.

Petkova D, Novembre J, Stephens M - Nat. Genet. (2015)

A schematic overview of EEMS, using African elephant data for illustration. (a–c) Setting up the population grid: (a) Samples are collected at known locations across a two-dimensional habitat; green and orange colors represent two subspecies – forest and savanna elephants. (b) A dense triangular grid is chosen to span the habitat. (c) Each sample is assigned to the closest deme on the grid. (d–f) Estimated Effective Migration Surface (EEMS) analysis: (d) Migration rates vary according to a Voronoi tessellation which partitions the habitat into “cells” with constant migration rate; colors represent relative rates of migration, ranging from low (orange) to high (blue). (e) Each edge has the same migration rate as the cell it falls into. The cell locations and migration rates are adjusted, using Bayesian inference, so that the expected genetic dissimilarities under the EEMS model matches the observed genetic dissimilarities. (f) The EEMS is a color contour plot produced by averaging draws from the posterior distribution of the migration rates, interpolating between grid points. Here, and in all other figures, log(m) denotes the effective migration rate on the log10 scale, relative to the overall migration rate across the habitat. (Thus log(m) = 1 corresponds to effective migration that is 10-fold faster than the average.) The main feature of the elephant EEMS is a “barrier” of low effective migration that separates the habitats of the two subspecies: forest elephants to the west, and savanna elephants to the north, south and east.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4696895&req=5

Figure 1: A schematic overview of EEMS, using African elephant data for illustration. (a–c) Setting up the population grid: (a) Samples are collected at known locations across a two-dimensional habitat; green and orange colors represent two subspecies – forest and savanna elephants. (b) A dense triangular grid is chosen to span the habitat. (c) Each sample is assigned to the closest deme on the grid. (d–f) Estimated Effective Migration Surface (EEMS) analysis: (d) Migration rates vary according to a Voronoi tessellation which partitions the habitat into “cells” with constant migration rate; colors represent relative rates of migration, ranging from low (orange) to high (blue). (e) Each edge has the same migration rate as the cell it falls into. The cell locations and migration rates are adjusted, using Bayesian inference, so that the expected genetic dissimilarities under the EEMS model matches the observed genetic dissimilarities. (f) The EEMS is a color contour plot produced by averaging draws from the posterior distribution of the migration rates, interpolating between grid points. Here, and in all other figures, log(m) denotes the effective migration rate on the log10 scale, relative to the overall migration rate across the habitat. (Thus log(m) = 1 corresponds to effective migration that is 10-fold faster than the average.) The main feature of the elephant EEMS is a “barrier” of low effective migration that separates the habitats of the two subspecies: forest elephants to the west, and savanna elephants to the north, south and east.
Mentions: Figure 1 provides a schematic overview of our approach. EEMS is based on the stepping stone model 28, in which individuals migrate locally between subpopulations (demes) and migration rates can vary by location. To capture continuous population structure, we cover the habitat with a dense regular grid; each deme exchanges migrants only with its neighbors. Under the stepping stone model, expected genetic dissimilarities depend on the sample locations and the migration rates. The expected genetic dissimilarity between two individuals can be computed by integrating over all possible migration histories in their genetic ancestry and we approximate it using resistance distance, a distance metric from circuit theory that integrates all possible migration routes between two demes 26. The estimation procedure adjusts the migration rates of all edges in the graph so that the genetic differences expected under the model closely match the genetic differences observed in the data; it also encourages nearby edges to have similar migration rates. The estimates are then interpolated across the habitat to produce an “estimated effective migration surface” – hence EEMS – which provides a visual summary of the observed genetic dissimilarities and how they relate to geographic location. For example, if genetic similarities tend to decay faster in some regions, those areas will have lower effective migration. If, on the other hand, the relationship between genetic similarity and geographic distance is the same throughout the habitat, the estimated surface will be relatively constant. We use the term “effective” because the model makes assumptions (most importantly, equilibrium in time) that may preclude interpreting effective migration as representing historical rates of gene flow. Nonetheless, we illustrate that the method provides an intuitive and informative way to visualize patterns of population structure in geo-referenced samples.

Bottom Line: We use the concept of 'effective migration' to model the relationship between genetics and geography.Our approach uses a population genetic model to relate effective migration rates to expected genetic dissimilarities.We illustrate its potential and limitations using simulations and data from elephant, human and Arabidopsis thaliana populations.

View Article: PubMed Central - PubMed

Affiliation: Department of Statistics, The University of Chicago, Chicago, Illinois, USA.

ABSTRACT
Genetic data often exhibit patterns broadly consistent with 'isolation by distance'-a phenomenon where genetic similarity decays with geographic distance. In a heterogeneous habitat, this may occur more quickly in some regions than in others: for example, barriers to gene flow can accelerate differentiation between neighboring groups. We use the concept of 'effective migration' to model the relationship between genetics and geography. In this paradigm, effective migration is low in regions where genetic similarity decays quickly. We present a method to visualize variation in effective migration across a habitat from geographically indexed genetic data. Our approach uses a population genetic model to relate effective migration rates to expected genetic dissimilarities. We illustrate its potential and limitations using simulations and data from elephant, human and Arabidopsis thaliana populations. The resulting visualizations highlight important spatial features of population structure that are difficult to discern using existing methods for summarizing genetic variation.

Show MeSH
Related in: MedlinePlus