Limits...
Evaluating the ability of the pairwise joint site frequency spectrum to co-estimate selection and demography.

Mathew LA, Jensen JD - Front Genet (2015)

Bottom Line: Although it has been well-described that selection and demography may result in similar patterns of diversity, the ability to jointly estimate these two processes has remained elusive.We further demonstrate that the common assumption of selective neutrality when estimating demographic models may lead to severe biases.Finally, we apply the approach we have developed to better characterize the within-host demographic and selective history of human cytomegalovirus (HCMV) infection using published next generation sequencing data.

View Article: PubMed Central - PubMed

Affiliation: School of Life Sciences, École Polytechnique Fédérale de Lausanne Lausanne, Switzerland.

ABSTRACT
The ability to infer the parameters of positive selection from genomic data has many important implications, from identifying drug-resistance mutations in viruses to increasing crop yield by genetically integrating favorable alleles. Although it has been well-described that selection and demography may result in similar patterns of diversity, the ability to jointly estimate these two processes has remained elusive. Here, we use simulation to explore the utility of the joint site frequency spectrum to estimate selection and demography simultaneously, including developing an extension of the previously proposed Jaatha program (Mathew et al., 2013). We evaluate both complete and incomplete selective sweeps under an isolation-with-migration model with and without population size change (both population growth and bottlenecks). Results suggest that while it may not be possible to precisely estimate the strength of selection, it is possible to infer the presence of selection while estimating accurate demographic parameters. We further demonstrate that the common assumption of selective neutrality when estimating demographic models may lead to severe biases. Finally, we apply the approach we have developed to better characterize the within-host demographic and selective history of human cytomegalovirus (HCMV) infection using published next generation sequencing data.

No MeSH data available.


Related in: MedlinePlus

Jaatha results of the simulation study under the incomplete sweep scenario for the Constant (Δ) and the SizeChange (o) model. For the SizeChange model neutral estimates are also given (x, as Neutral). The average frequency of the selected allele f in the data set is colored accordingly. To distinguish which estimates came from the same data set, three neutral estimates are color-coded and increased in size and show the covariance of the present day population size q with the migration rate m. These three colored ‘X’s represent the neutral estimate for the corresponding colored SizeChange (o) estimates. The results are plotted in the parameter ranges that were evaluated in Jaatha. The true values of the data sets produced with no migration were set manually to the lowest value of the parameter range. The demographic parameters are estimated accurately, however, α is not. The estimation of m improves with increasing frequency of the selected allele in the data sets. Except for a few cases, if one incorrectly assumes neutrality the demographic parameters are still correctly recovered, as shown by the fact that they often lie on or near the diagonal.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4538300&req=5

Figure 3: Jaatha results of the simulation study under the incomplete sweep scenario for the Constant (Δ) and the SizeChange (o) model. For the SizeChange model neutral estimates are also given (x, as Neutral). The average frequency of the selected allele f in the data set is colored accordingly. To distinguish which estimates came from the same data set, three neutral estimates are color-coded and increased in size and show the covariance of the present day population size q with the migration rate m. These three colored ‘X’s represent the neutral estimate for the corresponding colored SizeChange (o) estimates. The results are plotted in the parameter ranges that were evaluated in Jaatha. The true values of the data sets produced with no migration were set manually to the lowest value of the parameter range. The demographic parameters are estimated accurately, however, α is not. The estimation of m improves with increasing frequency of the selected allele in the data sets. Except for a few cases, if one incorrectly assumes neutrality the demographic parameters are still correctly recovered, as shown by the fact that they often lie on or near the diagonal.

Mentions: Under the SizeChange model we performed simulations with msms (Ewing and Hermisson, 2010) and visualized the chosen SS. The SS between the neutral and selected cases appear distinguishable, both in frequency distribution and number (Figure 2, Supplementary Figure S3). However, changing the selection strength did not significantly alter observed patterns owing to the region size; this suggests the ability to only reject neutrality, rather than to estimate precise selective parameters. With increasing growth rates of P2, the population in which the selected allele arose, the SS and the number of polymorphisms produced with differing strengths of selection become increasingly difficult to discriminate. However, increasing the size of the locus improves the ability to distinguish between differing strengths of selection (cp. Supplementary Figure S3 and Figure 2), as expected, owing to the ability to characterize the size of the hitchhiked region (see Jensen et al., 2008). Under the incomplete sweep scenario we find that the demographic parameters can be accurately estimated, but not the selection strength α (Figure 3), though low and high α values appear to be distinguishable. The average frequency f of the selected allele is important for the accuracy of the estimation of the migration rate m. The lower f is the more difficult the estimation of the migration rates becomes. Although we simulated data under selection we found no obvious impact of incorrectly assuming a neutral model under the incomplete sweep scenario (except for an underestimation of θ).


Evaluating the ability of the pairwise joint site frequency spectrum to co-estimate selection and demography.

Mathew LA, Jensen JD - Front Genet (2015)

Jaatha results of the simulation study under the incomplete sweep scenario for the Constant (Δ) and the SizeChange (o) model. For the SizeChange model neutral estimates are also given (x, as Neutral). The average frequency of the selected allele f in the data set is colored accordingly. To distinguish which estimates came from the same data set, three neutral estimates are color-coded and increased in size and show the covariance of the present day population size q with the migration rate m. These three colored ‘X’s represent the neutral estimate for the corresponding colored SizeChange (o) estimates. The results are plotted in the parameter ranges that were evaluated in Jaatha. The true values of the data sets produced with no migration were set manually to the lowest value of the parameter range. The demographic parameters are estimated accurately, however, α is not. The estimation of m improves with increasing frequency of the selected allele in the data sets. Except for a few cases, if one incorrectly assumes neutrality the demographic parameters are still correctly recovered, as shown by the fact that they often lie on or near the diagonal.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4538300&req=5

Figure 3: Jaatha results of the simulation study under the incomplete sweep scenario for the Constant (Δ) and the SizeChange (o) model. For the SizeChange model neutral estimates are also given (x, as Neutral). The average frequency of the selected allele f in the data set is colored accordingly. To distinguish which estimates came from the same data set, three neutral estimates are color-coded and increased in size and show the covariance of the present day population size q with the migration rate m. These three colored ‘X’s represent the neutral estimate for the corresponding colored SizeChange (o) estimates. The results are plotted in the parameter ranges that were evaluated in Jaatha. The true values of the data sets produced with no migration were set manually to the lowest value of the parameter range. The demographic parameters are estimated accurately, however, α is not. The estimation of m improves with increasing frequency of the selected allele in the data sets. Except for a few cases, if one incorrectly assumes neutrality the demographic parameters are still correctly recovered, as shown by the fact that they often lie on or near the diagonal.
Mentions: Under the SizeChange model we performed simulations with msms (Ewing and Hermisson, 2010) and visualized the chosen SS. The SS between the neutral and selected cases appear distinguishable, both in frequency distribution and number (Figure 2, Supplementary Figure S3). However, changing the selection strength did not significantly alter observed patterns owing to the region size; this suggests the ability to only reject neutrality, rather than to estimate precise selective parameters. With increasing growth rates of P2, the population in which the selected allele arose, the SS and the number of polymorphisms produced with differing strengths of selection become increasingly difficult to discriminate. However, increasing the size of the locus improves the ability to distinguish between differing strengths of selection (cp. Supplementary Figure S3 and Figure 2), as expected, owing to the ability to characterize the size of the hitchhiked region (see Jensen et al., 2008). Under the incomplete sweep scenario we find that the demographic parameters can be accurately estimated, but not the selection strength α (Figure 3), though low and high α values appear to be distinguishable. The average frequency f of the selected allele is important for the accuracy of the estimation of the migration rate m. The lower f is the more difficult the estimation of the migration rates becomes. Although we simulated data under selection we found no obvious impact of incorrectly assuming a neutral model under the incomplete sweep scenario (except for an underestimation of θ).

Bottom Line: Although it has been well-described that selection and demography may result in similar patterns of diversity, the ability to jointly estimate these two processes has remained elusive.We further demonstrate that the common assumption of selective neutrality when estimating demographic models may lead to severe biases.Finally, we apply the approach we have developed to better characterize the within-host demographic and selective history of human cytomegalovirus (HCMV) infection using published next generation sequencing data.

View Article: PubMed Central - PubMed

Affiliation: School of Life Sciences, École Polytechnique Fédérale de Lausanne Lausanne, Switzerland.

ABSTRACT
The ability to infer the parameters of positive selection from genomic data has many important implications, from identifying drug-resistance mutations in viruses to increasing crop yield by genetically integrating favorable alleles. Although it has been well-described that selection and demography may result in similar patterns of diversity, the ability to jointly estimate these two processes has remained elusive. Here, we use simulation to explore the utility of the joint site frequency spectrum to estimate selection and demography simultaneously, including developing an extension of the previously proposed Jaatha program (Mathew et al., 2013). We evaluate both complete and incomplete selective sweeps under an isolation-with-migration model with and without population size change (both population growth and bottlenecks). Results suggest that while it may not be possible to precisely estimate the strength of selection, it is possible to infer the presence of selection while estimating accurate demographic parameters. We further demonstrate that the common assumption of selective neutrality when estimating demographic models may lead to severe biases. Finally, we apply the approach we have developed to better characterize the within-host demographic and selective history of human cytomegalovirus (HCMV) infection using published next generation sequencing data.

No MeSH data available.


Related in: MedlinePlus