Limits...
Evaluating the ability of the pairwise joint site frequency spectrum to co-estimate selection and demography.

Mathew LA, Jensen JD - Front Genet (2015)

Bottom Line: Although it has been well-described that selection and demography may result in similar patterns of diversity, the ability to jointly estimate these two processes has remained elusive.We further demonstrate that the common assumption of selective neutrality when estimating demographic models may lead to severe biases.Finally, we apply the approach we have developed to better characterize the within-host demographic and selective history of human cytomegalovirus (HCMV) infection using published next generation sequencing data.

View Article: PubMed Central - PubMed

Affiliation: School of Life Sciences, École Polytechnique Fédérale de Lausanne Lausanne, Switzerland.

ABSTRACT
The ability to infer the parameters of positive selection from genomic data has many important implications, from identifying drug-resistance mutations in viruses to increasing crop yield by genetically integrating favorable alleles. Although it has been well-described that selection and demography may result in similar patterns of diversity, the ability to jointly estimate these two processes has remained elusive. Here, we use simulation to explore the utility of the joint site frequency spectrum to estimate selection and demography simultaneously, including developing an extension of the previously proposed Jaatha program (Mathew et al., 2013). We evaluate both complete and incomplete selective sweeps under an isolation-with-migration model with and without population size change (both population growth and bottlenecks). Results suggest that while it may not be possible to precisely estimate the strength of selection, it is possible to infer the presence of selection while estimating accurate demographic parameters. We further demonstrate that the common assumption of selective neutrality when estimating demographic models may lead to severe biases. Finally, we apply the approach we have developed to better characterize the within-host demographic and selective history of human cytomegalovirus (HCMV) infection using published next generation sequencing data.

No MeSH data available.


Related in: MedlinePlus

We visualized the average values of the 23 summary statistics (SS) used with three differing parameters under the SizeChange model under a complete sweep scenario: selection strength α = 2Nes, the exponential growth rate g of the second population in which also the selected allele arises, and the position of the selected allele pos. For each plot we ran 100 replicates of 100 loci each of 1 kb in length, with 10 samples from each population. For example, the field [2,3] in the matrix represent the number of SNPs found in exactly two samples in population 1 and three samples in population 2. The other simulation parameters were fixed to the following values: τ = 0.05, Ne = 1000, θsite = 0.004, m = 0.2, and recombination rate per site = 1.64⋅10-4. When comparing, note that the scale (placed above each JSFS plot) is different for each subfigure. The distinction between the neutral and selected cases is clearly visible, owing to the decrease in polymorphisms in the selected case. The higher the growth rate g becomes, however, the closer in number and more similar the SS values. The SS show only very minute differences owing to differing locations of the selected allele.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4538300&req=5

Figure 2: We visualized the average values of the 23 summary statistics (SS) used with three differing parameters under the SizeChange model under a complete sweep scenario: selection strength α = 2Nes, the exponential growth rate g of the second population in which also the selected allele arises, and the position of the selected allele pos. For each plot we ran 100 replicates of 100 loci each of 1 kb in length, with 10 samples from each population. For example, the field [2,3] in the matrix represent the number of SNPs found in exactly two samples in population 1 and three samples in population 2. The other simulation parameters were fixed to the following values: τ = 0.05, Ne = 1000, θsite = 0.004, m = 0.2, and recombination rate per site = 1.64⋅10-4. When comparing, note that the scale (placed above each JSFS plot) is different for each subfigure. The distinction between the neutral and selected cases is clearly visible, owing to the decrease in polymorphisms in the selected case. The higher the growth rate g becomes, however, the closer in number and more similar the SS values. The SS show only very minute differences owing to differing locations of the selected allele.

Mentions: Under the SizeChange model we performed simulations with msms (Ewing and Hermisson, 2010) and visualized the chosen SS. The SS between the neutral and selected cases appear distinguishable, both in frequency distribution and number (Figure 2, Supplementary Figure S3). However, changing the selection strength did not significantly alter observed patterns owing to the region size; this suggests the ability to only reject neutrality, rather than to estimate precise selective parameters. With increasing growth rates of P2, the population in which the selected allele arose, the SS and the number of polymorphisms produced with differing strengths of selection become increasingly difficult to discriminate. However, increasing the size of the locus improves the ability to distinguish between differing strengths of selection (cp. Supplementary Figure S3 and Figure 2), as expected, owing to the ability to characterize the size of the hitchhiked region (see Jensen et al., 2008). Under the incomplete sweep scenario we find that the demographic parameters can be accurately estimated, but not the selection strength α (Figure 3), though low and high α values appear to be distinguishable. The average frequency f of the selected allele is important for the accuracy of the estimation of the migration rate m. The lower f is the more difficult the estimation of the migration rates becomes. Although we simulated data under selection we found no obvious impact of incorrectly assuming a neutral model under the incomplete sweep scenario (except for an underestimation of θ).


Evaluating the ability of the pairwise joint site frequency spectrum to co-estimate selection and demography.

Mathew LA, Jensen JD - Front Genet (2015)

We visualized the average values of the 23 summary statistics (SS) used with three differing parameters under the SizeChange model under a complete sweep scenario: selection strength α = 2Nes, the exponential growth rate g of the second population in which also the selected allele arises, and the position of the selected allele pos. For each plot we ran 100 replicates of 100 loci each of 1 kb in length, with 10 samples from each population. For example, the field [2,3] in the matrix represent the number of SNPs found in exactly two samples in population 1 and three samples in population 2. The other simulation parameters were fixed to the following values: τ = 0.05, Ne = 1000, θsite = 0.004, m = 0.2, and recombination rate per site = 1.64⋅10-4. When comparing, note that the scale (placed above each JSFS plot) is different for each subfigure. The distinction between the neutral and selected cases is clearly visible, owing to the decrease in polymorphisms in the selected case. The higher the growth rate g becomes, however, the closer in number and more similar the SS values. The SS show only very minute differences owing to differing locations of the selected allele.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4538300&req=5

Figure 2: We visualized the average values of the 23 summary statistics (SS) used with three differing parameters under the SizeChange model under a complete sweep scenario: selection strength α = 2Nes, the exponential growth rate g of the second population in which also the selected allele arises, and the position of the selected allele pos. For each plot we ran 100 replicates of 100 loci each of 1 kb in length, with 10 samples from each population. For example, the field [2,3] in the matrix represent the number of SNPs found in exactly two samples in population 1 and three samples in population 2. The other simulation parameters were fixed to the following values: τ = 0.05, Ne = 1000, θsite = 0.004, m = 0.2, and recombination rate per site = 1.64⋅10-4. When comparing, note that the scale (placed above each JSFS plot) is different for each subfigure. The distinction between the neutral and selected cases is clearly visible, owing to the decrease in polymorphisms in the selected case. The higher the growth rate g becomes, however, the closer in number and more similar the SS values. The SS show only very minute differences owing to differing locations of the selected allele.
Mentions: Under the SizeChange model we performed simulations with msms (Ewing and Hermisson, 2010) and visualized the chosen SS. The SS between the neutral and selected cases appear distinguishable, both in frequency distribution and number (Figure 2, Supplementary Figure S3). However, changing the selection strength did not significantly alter observed patterns owing to the region size; this suggests the ability to only reject neutrality, rather than to estimate precise selective parameters. With increasing growth rates of P2, the population in which the selected allele arose, the SS and the number of polymorphisms produced with differing strengths of selection become increasingly difficult to discriminate. However, increasing the size of the locus improves the ability to distinguish between differing strengths of selection (cp. Supplementary Figure S3 and Figure 2), as expected, owing to the ability to characterize the size of the hitchhiked region (see Jensen et al., 2008). Under the incomplete sweep scenario we find that the demographic parameters can be accurately estimated, but not the selection strength α (Figure 3), though low and high α values appear to be distinguishable. The average frequency f of the selected allele is important for the accuracy of the estimation of the migration rate m. The lower f is the more difficult the estimation of the migration rates becomes. Although we simulated data under selection we found no obvious impact of incorrectly assuming a neutral model under the incomplete sweep scenario (except for an underestimation of θ).

Bottom Line: Although it has been well-described that selection and demography may result in similar patterns of diversity, the ability to jointly estimate these two processes has remained elusive.We further demonstrate that the common assumption of selective neutrality when estimating demographic models may lead to severe biases.Finally, we apply the approach we have developed to better characterize the within-host demographic and selective history of human cytomegalovirus (HCMV) infection using published next generation sequencing data.

View Article: PubMed Central - PubMed

Affiliation: School of Life Sciences, École Polytechnique Fédérale de Lausanne Lausanne, Switzerland.

ABSTRACT
The ability to infer the parameters of positive selection from genomic data has many important implications, from identifying drug-resistance mutations in viruses to increasing crop yield by genetically integrating favorable alleles. Although it has been well-described that selection and demography may result in similar patterns of diversity, the ability to jointly estimate these two processes has remained elusive. Here, we use simulation to explore the utility of the joint site frequency spectrum to estimate selection and demography simultaneously, including developing an extension of the previously proposed Jaatha program (Mathew et al., 2013). We evaluate both complete and incomplete selective sweeps under an isolation-with-migration model with and without population size change (both population growth and bottlenecks). Results suggest that while it may not be possible to precisely estimate the strength of selection, it is possible to infer the presence of selection while estimating accurate demographic parameters. We further demonstrate that the common assumption of selective neutrality when estimating demographic models may lead to severe biases. Finally, we apply the approach we have developed to better characterize the within-host demographic and selective history of human cytomegalovirus (HCMV) infection using published next generation sequencing data.

No MeSH data available.


Related in: MedlinePlus