Limits...
An approximate bayesian estimator suggests strong, recurrent selective sweeps in Drosophila.

Jensen JD, Thornton KR, Andolfatto P - PLoS Genet. (2008)

Bottom Line: Models positing frequent weakly beneficial substitutions or, alternatively, rare, strongly selected substitutions predict similar average effects on linked neutral variability, if the product of the rate and strength of selection is held constant.Finally, we show that surveying larger genomic regions is expected to lend much more discriminatory power to the approach.It will thus be of great interest to apply this method to emerging whole-genome polymorphism data sets in many taxa.

View Article: PubMed Central - PubMed

Affiliation: Section of Ecology, Behavior and Evolution, University of California San Diego, La Jolla, California, United States of America. jjensen@ucsd.edu

ABSTRACT
The recurrent fixation of newly arising, beneficial mutations in a species reduces levels of linked neutral variability. Models positing frequent weakly beneficial substitutions or, alternatively, rare, strongly selected substitutions predict similar average effects on linked neutral variability, if the product of the rate and strength of selection is held constant. We propose an approximate Bayesian (ABC) polymorphism-based estimator that can be used to distinguish between these models, and apply it to multi-locus data from Drosophila melanogaster. We investigate the extent to which inference about the strength of selection is sensitive to assumptions about the underlying distributions of the rates of substitution and recombination, the strength of selection, heterogeneity in mutation rate, as well as the population's demographic history. We show that assuming fixed values of selection parameters in estimation leads to overestimates of the strength of selection and underestimates of the rate. We estimate parameters for an African population of D. melanogaster (ŝ approximately 2E-03, ) and compare these to previous estimates. Finally, we show that surveying larger genomic regions is expected to lend much more discriminatory power to the approach. It will thus be of great interest to apply this method to emerging whole-genome polymorphism data sets in many taxa.

Show MeSH
Distributions of Fay and Wu's H-statistic [5] and Tajima's D-statistic [45] under common weak and rare strong selection models.(A) The distribution of Fay and Wu's H for 500 bp regions. (B) The distribution of Fay and Wu's H for 100 kb regions. (C) The distribution of Tajima's D for 500 bp regions. (D) The distribution of Tajima's D for 100 kb regions. 1000 replicates were generated under each model and the following parameters were fixed: ρ = 0.1/site, θ = 0.01/site (thus, ρ/θ = 10), and n = 25. The selection coefficient, s, and rate, 2Nλ, differ among models, though their product is the same (2Nλs = 5.0E−07). As shown in [9], the mean H is positive under a recurrent sweep model. However, while we confirm that the means are positive and nearly identical for 2Nλs = constant, we find that previous attempts to differentiate these models have likely been hampered by the scale of the regions considered. Specifically, while the distributions for both statistics appear similar for 500 bp regions, they are quite distinct at larger physical scales (i.e., 100 kb).
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2529407&req=5

pgen-1000198-g003: Distributions of Fay and Wu's H-statistic [5] and Tajima's D-statistic [45] under common weak and rare strong selection models.(A) The distribution of Fay and Wu's H for 500 bp regions. (B) The distribution of Fay and Wu's H for 100 kb regions. (C) The distribution of Tajima's D for 500 bp regions. (D) The distribution of Tajima's D for 100 kb regions. 1000 replicates were generated under each model and the following parameters were fixed: ρ = 0.1/site, θ = 0.01/site (thus, ρ/θ = 10), and n = 25. The selection coefficient, s, and rate, 2Nλ, differ among models, though their product is the same (2Nλs = 5.0E−07). As shown in [9], the mean H is positive under a recurrent sweep model. However, while we confirm that the means are positive and nearly identical for 2Nλs = constant, we find that previous attempts to differentiate these models have likely been hampered by the scale of the regions considered. Specifically, while the distributions for both statistics appear similar for 500 bp regions, they are quite distinct at larger physical scales (i.e., 100 kb).

Mentions: It is noteworthy that for large surveyed regions, more strongly negative values of Fay and Wu's H-statistic (i.e., SFS skewed towards high-frequency derived alleles) and Tajima's D-statistic (i.e., SFS skewed towards rare alleles) are observed under strong selection models (Figure 3), suggesting that differences in the polymorphism site frequency spectrum may also be used to distinguish between models if large enough regions are surveyed. Though this differs qualitatively from the conclusions of Przeworski (2002), simulations demonstrate that this is attributable to a modeling difference (results not shown), as we here allow sweeps within the sampled region (following [24]). This discrepancy between modeling approaches will thus only become greater as region sizes increase.


An approximate bayesian estimator suggests strong, recurrent selective sweeps in Drosophila.

Jensen JD, Thornton KR, Andolfatto P - PLoS Genet. (2008)

Distributions of Fay and Wu's H-statistic [5] and Tajima's D-statistic [45] under common weak and rare strong selection models.(A) The distribution of Fay and Wu's H for 500 bp regions. (B) The distribution of Fay and Wu's H for 100 kb regions. (C) The distribution of Tajima's D for 500 bp regions. (D) The distribution of Tajima's D for 100 kb regions. 1000 replicates were generated under each model and the following parameters were fixed: ρ = 0.1/site, θ = 0.01/site (thus, ρ/θ = 10), and n = 25. The selection coefficient, s, and rate, 2Nλ, differ among models, though their product is the same (2Nλs = 5.0E−07). As shown in [9], the mean H is positive under a recurrent sweep model. However, while we confirm that the means are positive and nearly identical for 2Nλs = constant, we find that previous attempts to differentiate these models have likely been hampered by the scale of the regions considered. Specifically, while the distributions for both statistics appear similar for 500 bp regions, they are quite distinct at larger physical scales (i.e., 100 kb).
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2529407&req=5

pgen-1000198-g003: Distributions of Fay and Wu's H-statistic [5] and Tajima's D-statistic [45] under common weak and rare strong selection models.(A) The distribution of Fay and Wu's H for 500 bp regions. (B) The distribution of Fay and Wu's H for 100 kb regions. (C) The distribution of Tajima's D for 500 bp regions. (D) The distribution of Tajima's D for 100 kb regions. 1000 replicates were generated under each model and the following parameters were fixed: ρ = 0.1/site, θ = 0.01/site (thus, ρ/θ = 10), and n = 25. The selection coefficient, s, and rate, 2Nλ, differ among models, though their product is the same (2Nλs = 5.0E−07). As shown in [9], the mean H is positive under a recurrent sweep model. However, while we confirm that the means are positive and nearly identical for 2Nλs = constant, we find that previous attempts to differentiate these models have likely been hampered by the scale of the regions considered. Specifically, while the distributions for both statistics appear similar for 500 bp regions, they are quite distinct at larger physical scales (i.e., 100 kb).
Mentions: It is noteworthy that for large surveyed regions, more strongly negative values of Fay and Wu's H-statistic (i.e., SFS skewed towards high-frequency derived alleles) and Tajima's D-statistic (i.e., SFS skewed towards rare alleles) are observed under strong selection models (Figure 3), suggesting that differences in the polymorphism site frequency spectrum may also be used to distinguish between models if large enough regions are surveyed. Though this differs qualitatively from the conclusions of Przeworski (2002), simulations demonstrate that this is attributable to a modeling difference (results not shown), as we here allow sweeps within the sampled region (following [24]). This discrepancy between modeling approaches will thus only become greater as region sizes increase.

Bottom Line: Models positing frequent weakly beneficial substitutions or, alternatively, rare, strongly selected substitutions predict similar average effects on linked neutral variability, if the product of the rate and strength of selection is held constant.Finally, we show that surveying larger genomic regions is expected to lend much more discriminatory power to the approach.It will thus be of great interest to apply this method to emerging whole-genome polymorphism data sets in many taxa.

View Article: PubMed Central - PubMed

Affiliation: Section of Ecology, Behavior and Evolution, University of California San Diego, La Jolla, California, United States of America. jjensen@ucsd.edu

ABSTRACT
The recurrent fixation of newly arising, beneficial mutations in a species reduces levels of linked neutral variability. Models positing frequent weakly beneficial substitutions or, alternatively, rare, strongly selected substitutions predict similar average effects on linked neutral variability, if the product of the rate and strength of selection is held constant. We propose an approximate Bayesian (ABC) polymorphism-based estimator that can be used to distinguish between these models, and apply it to multi-locus data from Drosophila melanogaster. We investigate the extent to which inference about the strength of selection is sensitive to assumptions about the underlying distributions of the rates of substitution and recombination, the strength of selection, heterogeneity in mutation rate, as well as the population's demographic history. We show that assuming fixed values of selection parameters in estimation leads to overestimates of the strength of selection and underestimates of the rate. We estimate parameters for an African population of D. melanogaster (ŝ approximately 2E-03, ) and compare these to previous estimates. Finally, we show that surveying larger genomic regions is expected to lend much more discriminatory power to the approach. It will thus be of great interest to apply this method to emerging whole-genome polymorphism data sets in many taxa.

Show MeSH