Limits...
Sampling strategies for frequency spectrum-based population genomic inference.

Robinson JD, Coffman AJ, Hickerson MJ, Gutenkunst RN - BMC Evol. Biol. (2014)

Bottom Line: We quantify i) the uncertainty in model selection, using tools from information theory, and ii) the accuracy and precision of parameter estimates, using the root mean squared error, as a function of the timing of demographic events, sample sizes used in the analysis, and complexity of the simulated models.Here, we illustrate the utility of the genome-wide AFS for estimating demographic history and provide recommendations to guide sampling in population genomics studies that seek to draw inference from the AFS.Our results indicate that larger samples of individuals (and thus larger AFS) provide greater power for model selection and parameter estimation for more recent demographic events.

View Article: PubMed Central - PubMed

Affiliation: Department of Biology, City College of New York, New York, NY, 10031, USA. RobinsonJ@dnr.sc.gov.

ABSTRACT

Background: The allele frequency spectrum (AFS) consists of counts of the number of single nucleotide polymorphism (SNP) loci with derived variants present at each given frequency in a sample. Multiple approaches have recently been developed for parameter estimation and calculation of model likelihoods based on the joint AFS from two or more populations. We conducted a simulation study of one of these approaches, implemented in the Python module δaδi, to compare parameter estimation and model selection accuracy given different sample sizes under one- and two-population models.

Results: Our simulations included a variety of demographic models and two parameterizations that differed in the timing of events (divergence or size change). Using a number of SNPs reasonably obtained through next-generation sequencing approaches (10,000 - 50,000), accurate parameter estimates and model selection were possible for models with more ancient demographic events, even given relatively small numbers of sampled individuals. However, for recent events, larger numbers of individuals were required to achieve accuracy and precision in parameter estimates similar to that seen for models with older divergence or population size changes. We quantify i) the uncertainty in model selection, using tools from information theory, and ii) the accuracy and precision of parameter estimates, using the root mean squared error, as a function of the timing of demographic events, sample sizes used in the analysis, and complexity of the simulated models.

Conclusions: Here, we illustrate the utility of the genome-wide AFS for estimating demographic history and provide recommendations to guide sampling in population genomics studies that seek to draw inference from the AFS. Our results indicate that larger samples of individuals (and thus larger AFS) provide greater power for model selection and parameter estimation for more recent demographic events.

Show MeSH
Accuracy of parameter estimates for two-population models. Plots show RMSE versus sample size for each of the three two-population models. Both ancient (A, filled circles to the left) and recent (B, open circles to the right) parameterizations are shown.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4269862&req=5

Fig7: Accuracy of parameter estimates for two-population models. Plots show RMSE versus sample size for each of the three two-population models. Both ancient (A, filled circles to the left) and recent (B, open circles to the right) parameterizations are shown.

Mentions: As in the single-population simulations, the accuracy and precision of parameter estimates improved with larger sample sizes, for both recent and ancient population divergence models (Figure 7). Two-population models also converged more slowly to their maximum likelihood estimates when fit to data simulated under alternative models. For instance, fitting the IM model to data simulated under the ISO model with ancient divergence required, on average, more than 14 iterations for samples of 10 diploid individuals per population. Parameter estimates were largely unbiased for the parameters and models considered and, in most cases, they converged to their simulated values in the larger sample sizes (Additional file 1: Table S2).Figure 7


Sampling strategies for frequency spectrum-based population genomic inference.

Robinson JD, Coffman AJ, Hickerson MJ, Gutenkunst RN - BMC Evol. Biol. (2014)

Accuracy of parameter estimates for two-population models. Plots show RMSE versus sample size for each of the three two-population models. Both ancient (A, filled circles to the left) and recent (B, open circles to the right) parameterizations are shown.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4269862&req=5

Fig7: Accuracy of parameter estimates for two-population models. Plots show RMSE versus sample size for each of the three two-population models. Both ancient (A, filled circles to the left) and recent (B, open circles to the right) parameterizations are shown.
Mentions: As in the single-population simulations, the accuracy and precision of parameter estimates improved with larger sample sizes, for both recent and ancient population divergence models (Figure 7). Two-population models also converged more slowly to their maximum likelihood estimates when fit to data simulated under alternative models. For instance, fitting the IM model to data simulated under the ISO model with ancient divergence required, on average, more than 14 iterations for samples of 10 diploid individuals per population. Parameter estimates were largely unbiased for the parameters and models considered and, in most cases, they converged to their simulated values in the larger sample sizes (Additional file 1: Table S2).Figure 7

Bottom Line: We quantify i) the uncertainty in model selection, using tools from information theory, and ii) the accuracy and precision of parameter estimates, using the root mean squared error, as a function of the timing of demographic events, sample sizes used in the analysis, and complexity of the simulated models.Here, we illustrate the utility of the genome-wide AFS for estimating demographic history and provide recommendations to guide sampling in population genomics studies that seek to draw inference from the AFS.Our results indicate that larger samples of individuals (and thus larger AFS) provide greater power for model selection and parameter estimation for more recent demographic events.

View Article: PubMed Central - PubMed

Affiliation: Department of Biology, City College of New York, New York, NY, 10031, USA. RobinsonJ@dnr.sc.gov.

ABSTRACT

Background: The allele frequency spectrum (AFS) consists of counts of the number of single nucleotide polymorphism (SNP) loci with derived variants present at each given frequency in a sample. Multiple approaches have recently been developed for parameter estimation and calculation of model likelihoods based on the joint AFS from two or more populations. We conducted a simulation study of one of these approaches, implemented in the Python module δaδi, to compare parameter estimation and model selection accuracy given different sample sizes under one- and two-population models.

Results: Our simulations included a variety of demographic models and two parameterizations that differed in the timing of events (divergence or size change). Using a number of SNPs reasonably obtained through next-generation sequencing approaches (10,000 - 50,000), accurate parameter estimates and model selection were possible for models with more ancient demographic events, even given relatively small numbers of sampled individuals. However, for recent events, larger numbers of individuals were required to achieve accuracy and precision in parameter estimates similar to that seen for models with older divergence or population size changes. We quantify i) the uncertainty in model selection, using tools from information theory, and ii) the accuracy and precision of parameter estimates, using the root mean squared error, as a function of the timing of demographic events, sample sizes used in the analysis, and complexity of the simulated models.

Conclusions: Here, we illustrate the utility of the genome-wide AFS for estimating demographic history and provide recommendations to guide sampling in population genomics studies that seek to draw inference from the AFS. Our results indicate that larger samples of individuals (and thus larger AFS) provide greater power for model selection and parameter estimation for more recent demographic events.

Show MeSH