Limits...
Evaluating the ability of the pairwise joint site frequency spectrum to co-estimate selection and demography.

Mathew LA, Jensen JD - Front Genet (2015)

Bottom Line: Although it has been well-described that selection and demography may result in similar patterns of diversity, the ability to jointly estimate these two processes has remained elusive.We further demonstrate that the common assumption of selective neutrality when estimating demographic models may lead to severe biases.Finally, we apply the approach we have developed to better characterize the within-host demographic and selective history of human cytomegalovirus (HCMV) infection using published next generation sequencing data.

View Article: PubMed Central - PubMed

Affiliation: School of Life Sciences, École Polytechnique Fédérale de Lausanne Lausanne, Switzerland.

ABSTRACT
The ability to infer the parameters of positive selection from genomic data has many important implications, from identifying drug-resistance mutations in viruses to increasing crop yield by genetically integrating favorable alleles. Although it has been well-described that selection and demography may result in similar patterns of diversity, the ability to jointly estimate these two processes has remained elusive. Here, we use simulation to explore the utility of the joint site frequency spectrum to estimate selection and demography simultaneously, including developing an extension of the previously proposed Jaatha program (Mathew et al., 2013). We evaluate both complete and incomplete selective sweeps under an isolation-with-migration model with and without population size change (both population growth and bottlenecks). Results suggest that while it may not be possible to precisely estimate the strength of selection, it is possible to infer the presence of selection while estimating accurate demographic parameters. We further demonstrate that the common assumption of selective neutrality when estimating demographic models may lead to severe biases. Finally, we apply the approach we have developed to better characterize the within-host demographic and selective history of human cytomegalovirus (HCMV) infection using published next generation sequencing data.

No MeSH data available.


Related in: MedlinePlus

Jaatha results of the simulation study under the complete sweep scenario for the Constant (Δ) and the SizeChange (o) model. Data sets are color-coded according to their recombination rates per site of 10-4 (red), 10-3 (blue), and 10-2 (green). The results are plotted in the parameter ranges that were evaluated in Jaatha. The true value of the data sets produced with no migration were set manually to the lowest value of the parameter range. The demographic parameters are estimated accurately, however α is not. The divergence time of both populations (which also coincides with the timing of the selected allele and the migration rates) can be accurately inferred. θ, q, and α estimates lose precision compared to the incomplete sweep scenario. Incorrectly assuming neutrality (as Neutral for the SizeChange model) causes severe biases: overestimation of divergence times (always bigger than 0.2) and in most cases underestimation of migration rates, and associated mis-inference of the population size of P2 (the population in which the selected allele arose). The recombination rate did not have any effect on the accuracy of the estimates.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4538300&req=5

Figure 4: Jaatha results of the simulation study under the complete sweep scenario for the Constant (Δ) and the SizeChange (o) model. Data sets are color-coded according to their recombination rates per site of 10-4 (red), 10-3 (blue), and 10-2 (green). The results are plotted in the parameter ranges that were evaluated in Jaatha. The true value of the data sets produced with no migration were set manually to the lowest value of the parameter range. The demographic parameters are estimated accurately, however α is not. The divergence time of both populations (which also coincides with the timing of the selected allele and the migration rates) can be accurately inferred. θ, q, and α estimates lose precision compared to the incomplete sweep scenario. Incorrectly assuming neutrality (as Neutral for the SizeChange model) causes severe biases: overestimation of divergence times (always bigger than 0.2) and in most cases underestimation of migration rates, and associated mis-inference of the population size of P2 (the population in which the selected allele arose). The recombination rate did not have any effect on the accuracy of the estimates.

Mentions: When we conditioned on the selected allele being fixed (i.e., representing a complete sweep), all demographic parameters were estimated less accurately with the exception of the migration rate (Figure 4). In particular, the Constant model resulted in over-estimates of θ. Migration rates, however, were estimated with greater accuracy (cp. Figures 3 and 4). Unlike in the cases of incomplete sweeps, if we incorrectly assumed neutrality the estimates revealed severe biases, consistent with the results of Crisci et al. (2013). Divergence times were always estimated to be larger than 0.2 and migration rates were generally underestimated. Similarly poor results were obtained when we analyzed the complete sweep data sets with an incomplete sweep model (Supplementary Figure S4).


Evaluating the ability of the pairwise joint site frequency spectrum to co-estimate selection and demography.

Mathew LA, Jensen JD - Front Genet (2015)

Jaatha results of the simulation study under the complete sweep scenario for the Constant (Δ) and the SizeChange (o) model. Data sets are color-coded according to their recombination rates per site of 10-4 (red), 10-3 (blue), and 10-2 (green). The results are plotted in the parameter ranges that were evaluated in Jaatha. The true value of the data sets produced with no migration were set manually to the lowest value of the parameter range. The demographic parameters are estimated accurately, however α is not. The divergence time of both populations (which also coincides with the timing of the selected allele and the migration rates) can be accurately inferred. θ, q, and α estimates lose precision compared to the incomplete sweep scenario. Incorrectly assuming neutrality (as Neutral for the SizeChange model) causes severe biases: overestimation of divergence times (always bigger than 0.2) and in most cases underestimation of migration rates, and associated mis-inference of the population size of P2 (the population in which the selected allele arose). The recombination rate did not have any effect on the accuracy of the estimates.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4538300&req=5

Figure 4: Jaatha results of the simulation study under the complete sweep scenario for the Constant (Δ) and the SizeChange (o) model. Data sets are color-coded according to their recombination rates per site of 10-4 (red), 10-3 (blue), and 10-2 (green). The results are plotted in the parameter ranges that were evaluated in Jaatha. The true value of the data sets produced with no migration were set manually to the lowest value of the parameter range. The demographic parameters are estimated accurately, however α is not. The divergence time of both populations (which also coincides with the timing of the selected allele and the migration rates) can be accurately inferred. θ, q, and α estimates lose precision compared to the incomplete sweep scenario. Incorrectly assuming neutrality (as Neutral for the SizeChange model) causes severe biases: overestimation of divergence times (always bigger than 0.2) and in most cases underestimation of migration rates, and associated mis-inference of the population size of P2 (the population in which the selected allele arose). The recombination rate did not have any effect on the accuracy of the estimates.
Mentions: When we conditioned on the selected allele being fixed (i.e., representing a complete sweep), all demographic parameters were estimated less accurately with the exception of the migration rate (Figure 4). In particular, the Constant model resulted in over-estimates of θ. Migration rates, however, were estimated with greater accuracy (cp. Figures 3 and 4). Unlike in the cases of incomplete sweeps, if we incorrectly assumed neutrality the estimates revealed severe biases, consistent with the results of Crisci et al. (2013). Divergence times were always estimated to be larger than 0.2 and migration rates were generally underestimated. Similarly poor results were obtained when we analyzed the complete sweep data sets with an incomplete sweep model (Supplementary Figure S4).

Bottom Line: Although it has been well-described that selection and demography may result in similar patterns of diversity, the ability to jointly estimate these two processes has remained elusive.We further demonstrate that the common assumption of selective neutrality when estimating demographic models may lead to severe biases.Finally, we apply the approach we have developed to better characterize the within-host demographic and selective history of human cytomegalovirus (HCMV) infection using published next generation sequencing data.

View Article: PubMed Central - PubMed

Affiliation: School of Life Sciences, École Polytechnique Fédérale de Lausanne Lausanne, Switzerland.

ABSTRACT
The ability to infer the parameters of positive selection from genomic data has many important implications, from identifying drug-resistance mutations in viruses to increasing crop yield by genetically integrating favorable alleles. Although it has been well-described that selection and demography may result in similar patterns of diversity, the ability to jointly estimate these two processes has remained elusive. Here, we use simulation to explore the utility of the joint site frequency spectrum to estimate selection and demography simultaneously, including developing an extension of the previously proposed Jaatha program (Mathew et al., 2013). We evaluate both complete and incomplete selective sweeps under an isolation-with-migration model with and without population size change (both population growth and bottlenecks). Results suggest that while it may not be possible to precisely estimate the strength of selection, it is possible to infer the presence of selection while estimating accurate demographic parameters. We further demonstrate that the common assumption of selective neutrality when estimating demographic models may lead to severe biases. Finally, we apply the approach we have developed to better characterize the within-host demographic and selective history of human cytomegalovirus (HCMV) infection using published next generation sequencing data.

No MeSH data available.


Related in: MedlinePlus