Limits...
Sampling strategies for frequency spectrum-based population genomic inference.

Robinson JD, Coffman AJ, Hickerson MJ, Gutenkunst RN - BMC Evol. Biol. (2014)

Bottom Line: We quantify i) the uncertainty in model selection, using tools from information theory, and ii) the accuracy and precision of parameter estimates, using the root mean squared error, as a function of the timing of demographic events, sample sizes used in the analysis, and complexity of the simulated models.Here, we illustrate the utility of the genome-wide AFS for estimating demographic history and provide recommendations to guide sampling in population genomics studies that seek to draw inference from the AFS.Our results indicate that larger samples of individuals (and thus larger AFS) provide greater power for model selection and parameter estimation for more recent demographic events.

View Article: PubMed Central - PubMed

Affiliation: Department of Biology, City College of New York, New York, NY, 10031, USA. RobinsonJ@dnr.sc.gov.

ABSTRACT

Background: The allele frequency spectrum (AFS) consists of counts of the number of single nucleotide polymorphism (SNP) loci with derived variants present at each given frequency in a sample. Multiple approaches have recently been developed for parameter estimation and calculation of model likelihoods based on the joint AFS from two or more populations. We conducted a simulation study of one of these approaches, implemented in the Python module δaδi, to compare parameter estimation and model selection accuracy given different sample sizes under one- and two-population models.

Results: Our simulations included a variety of demographic models and two parameterizations that differed in the timing of events (divergence or size change). Using a number of SNPs reasonably obtained through next-generation sequencing approaches (10,000 - 50,000), accurate parameter estimates and model selection were possible for models with more ancient demographic events, even given relatively small numbers of sampled individuals. However, for recent events, larger numbers of individuals were required to achieve accuracy and precision in parameter estimates similar to that seen for models with older divergence or population size changes. We quantify i) the uncertainty in model selection, using tools from information theory, and ii) the accuracy and precision of parameter estimates, using the root mean squared error, as a function of the timing of demographic events, sample sizes used in the analysis, and complexity of the simulated models.

Conclusions: Here, we illustrate the utility of the genome-wide AFS for estimating demographic history and provide recommendations to guide sampling in population genomics studies that seek to draw inference from the AFS. Our results indicate that larger samples of individuals (and thus larger AFS) provide greater power for model selection and parameter estimation for more recent demographic events.

Show MeSH
Information in the allele frequency spectrum. A comparison between two spectra of similar size (n = 10 diploid individuals sampled from each of two populations) that differ in the rate of migration between populations. Migration between populations increases the correlation in allele frequencies, thus increasing the density of SNPs falling along the diagonal of the AFS.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4269862&req=5

Fig1: Information in the allele frequency spectrum. A comparison between two spectra of similar size (n = 10 diploid individuals sampled from each of two populations) that differ in the rate of migration between populations. Migration between populations increases the correlation in allele frequencies, thus increasing the density of SNPs falling along the diagonal of the AFS.

Mentions: For datasets composed of biallelic, unlinked SNPs, the AFS is a complete summary of the data [4], and many commonly used statistics, such as the number of segregating sites, FST, and Tajima’s D [8], can be calculated directly from the frequency spectrum. Additionally, patterns in the AFS can be indicative of demographic and/or selective events in the evolutionary history of the population or populations under consideration. For instance, gene flow between populations increases the correlation in allele frequencies, increasing the proportion of variable sites that fall along the diagonal of the AFS (Figure 1). The AFS is therefore well suited for the analysis of population genomic data, which are increasingly feasible to collect due to the rapid pace of development in sequencing technologies. Estimates of historical demography from the AFS can also be used to provide a baseline against which tests for the signatures of selection can be carried out [9-11]. However, the utility of parameter estimates obtained from analysis of the AFS will depend on their accuracy and precision, as well as the power of the analytical framework for model selection.Figure 1


Sampling strategies for frequency spectrum-based population genomic inference.

Robinson JD, Coffman AJ, Hickerson MJ, Gutenkunst RN - BMC Evol. Biol. (2014)

Information in the allele frequency spectrum. A comparison between two spectra of similar size (n = 10 diploid individuals sampled from each of two populations) that differ in the rate of migration between populations. Migration between populations increases the correlation in allele frequencies, thus increasing the density of SNPs falling along the diagonal of the AFS.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4269862&req=5

Fig1: Information in the allele frequency spectrum. A comparison between two spectra of similar size (n = 10 diploid individuals sampled from each of two populations) that differ in the rate of migration between populations. Migration between populations increases the correlation in allele frequencies, thus increasing the density of SNPs falling along the diagonal of the AFS.
Mentions: For datasets composed of biallelic, unlinked SNPs, the AFS is a complete summary of the data [4], and many commonly used statistics, such as the number of segregating sites, FST, and Tajima’s D [8], can be calculated directly from the frequency spectrum. Additionally, patterns in the AFS can be indicative of demographic and/or selective events in the evolutionary history of the population or populations under consideration. For instance, gene flow between populations increases the correlation in allele frequencies, increasing the proportion of variable sites that fall along the diagonal of the AFS (Figure 1). The AFS is therefore well suited for the analysis of population genomic data, which are increasingly feasible to collect due to the rapid pace of development in sequencing technologies. Estimates of historical demography from the AFS can also be used to provide a baseline against which tests for the signatures of selection can be carried out [9-11]. However, the utility of parameter estimates obtained from analysis of the AFS will depend on their accuracy and precision, as well as the power of the analytical framework for model selection.Figure 1

Bottom Line: We quantify i) the uncertainty in model selection, using tools from information theory, and ii) the accuracy and precision of parameter estimates, using the root mean squared error, as a function of the timing of demographic events, sample sizes used in the analysis, and complexity of the simulated models.Here, we illustrate the utility of the genome-wide AFS for estimating demographic history and provide recommendations to guide sampling in population genomics studies that seek to draw inference from the AFS.Our results indicate that larger samples of individuals (and thus larger AFS) provide greater power for model selection and parameter estimation for more recent demographic events.

View Article: PubMed Central - PubMed

Affiliation: Department of Biology, City College of New York, New York, NY, 10031, USA. RobinsonJ@dnr.sc.gov.

ABSTRACT

Background: The allele frequency spectrum (AFS) consists of counts of the number of single nucleotide polymorphism (SNP) loci with derived variants present at each given frequency in a sample. Multiple approaches have recently been developed for parameter estimation and calculation of model likelihoods based on the joint AFS from two or more populations. We conducted a simulation study of one of these approaches, implemented in the Python module δaδi, to compare parameter estimation and model selection accuracy given different sample sizes under one- and two-population models.

Results: Our simulations included a variety of demographic models and two parameterizations that differed in the timing of events (divergence or size change). Using a number of SNPs reasonably obtained through next-generation sequencing approaches (10,000 - 50,000), accurate parameter estimates and model selection were possible for models with more ancient demographic events, even given relatively small numbers of sampled individuals. However, for recent events, larger numbers of individuals were required to achieve accuracy and precision in parameter estimates similar to that seen for models with older divergence or population size changes. We quantify i) the uncertainty in model selection, using tools from information theory, and ii) the accuracy and precision of parameter estimates, using the root mean squared error, as a function of the timing of demographic events, sample sizes used in the analysis, and complexity of the simulated models.

Conclusions: Here, we illustrate the utility of the genome-wide AFS for estimating demographic history and provide recommendations to guide sampling in population genomics studies that seek to draw inference from the AFS. Our results indicate that larger samples of individuals (and thus larger AFS) provide greater power for model selection and parameter estimation for more recent demographic events.

Show MeSH