Limits...
Predicting Carriers of Ongoing Selective Sweeps without Knowledge of the Favored Allele.

Ronen R, Tesler G, Akbari A, Zakov S, Rosenberg NA, Bafna V - PLoS Genet. (2015)

Bottom Line: The HAF score, assigned to individual haplotypes in a sample, naturally captures many of the properties shared by haplotypes carrying a favored allele.We provide a theoretical framework for computing expected HAF scores under different evolutionary scenarios, and we validate the theoretical predictions with simulations.As an application of HAF score computations, we develop an algorithm (PreCIOSS: Predicting Carriers of Ongoing Selective Sweeps) to identify carriers of the favored allele in selective sweeps, and we demonstrate its power on simulations of both hard and soft sweeps, as well as on data from well-known sweeps in human populations.

View Article: PubMed Central - PubMed

Affiliation: Bioinformatics Graduate Program, University of California, San Diego, La Jolla, California, United States of America.

ABSTRACT
Methods for detecting the genomic signatures of natural selection have been heavily studied, and they have been successful in identifying many selective sweeps. For most of these sweeps, the favored allele remains unknown, making it difficult to distinguish carriers of the sweep from non-carriers. In an ongoing selective sweep, carriers of the favored allele are likely to contain a future most recent common ancestor. Therefore, identifying them may prove useful in predicting the evolutionary trajectory--for example, in contexts involving drug-resistant pathogen strains or cancer subclones. The main contribution of this paper is the development and analysis of a new statistic, the Haplotype Allele Frequency (HAF) score. The HAF score, assigned to individual haplotypes in a sample, naturally captures many of the properties shared by haplotypes carrying a favored allele. We provide a theoretical framework for computing expected HAF scores under different evolutionary scenarios, and we validate the theoretical predictions with simulations. As an application of HAF score computations, we develop an algorithm (PreCIOSS: Predicting Carriers of Ongoing Selective Sweeps) to identify carriers of the favored allele in selective sweeps, and we demonstrate its power on simulations of both hard and soft sweeps, as well as on data from well-known sweeps in human populations.

No MeSH data available.


Related in: MedlinePlus

Balanced accuracy of PreCIOSS on a model of European demography.Populations were simulated for a popular model of human demography (S11 Fig and Gravel et al. (2011) [50]). The onset times of selection were separated into (A) pre-bottleneck (51 kya–23 kya) and (B) post-bottleneck (23 kya–current) epochs, with 10000 start times in each bin. All samples were simulated with n = 200, θ = 48, ρ = 25. Samples were simulated with selection coefficient s = 0.005 in the pre-bottleneck epoch and s = 0.02 in the post-bottleneck epoch.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4581834&req=5

pgen.1005527.g006: Balanced accuracy of PreCIOSS on a model of European demography.Populations were simulated for a popular model of human demography (S11 Fig and Gravel et al. (2011) [50]). The onset times of selection were separated into (A) pre-bottleneck (51 kya–23 kya) and (B) post-bottleneck (23 kya–current) epochs, with 10000 start times in each bin. All samples were simulated with n = 200, θ = 48, ρ = 25. Samples were simulated with selection coefficient s = 0.005 in the pre-bottleneck epoch and s = 0.02 in the post-bottleneck epoch.

Mentions: Finally, we tested PreCIOSS on a popular model of European demography [50]. The model (S11 Fig) suggests an Out-of-Africa migration 51 kya (51 thousand years ago), followed by a European and East Asian split 23 kya. It also suggests bottlenecks that reduced the effective population sizes of the European (NEu0 = 1032), and East-Asian (NAs0 = 550) populations, and exponential growth in the populations following the bottleneck events. We simulated populations based on this model, as well as selection events (hard sweep) at different times after the Out-of-Africa migration, and partitioned all samples into two categories depending on whether the selection event happened before or after the bottleneck. These scenarios are challenging for most tests of adaptation (see, e.g., [23]). However, there are still significant differences in the 1-HAF scores of carriers and non-carriers. The balanced accuracy of PreCIOSS is shown in Fig 6A for ancient selection and Fig 6B for recent (after bottleneck) selection. The performance is quite robust, although somewhat worse in the early stages of the sweep. Once the favored allele frequency reaches 60%, the median accuracy is at 0.9. The accuracy is improved for recent adaptation, compared to ancient adaptation. Even for very recent sweeps, where the carrier frequency is 30–40%, the median balanced accuracy is close to 0.8. We used a lower selection coefficient for ancient selection compared to recent selection to ensure that we have sufficient cases of incomplete sweeps. Not surprisingly, the performance of PreCIOSS is worse for ancient selection compared to recent selection.


Predicting Carriers of Ongoing Selective Sweeps without Knowledge of the Favored Allele.

Ronen R, Tesler G, Akbari A, Zakov S, Rosenberg NA, Bafna V - PLoS Genet. (2015)

Balanced accuracy of PreCIOSS on a model of European demography.Populations were simulated for a popular model of human demography (S11 Fig and Gravel et al. (2011) [50]). The onset times of selection were separated into (A) pre-bottleneck (51 kya–23 kya) and (B) post-bottleneck (23 kya–current) epochs, with 10000 start times in each bin. All samples were simulated with n = 200, θ = 48, ρ = 25. Samples were simulated with selection coefficient s = 0.005 in the pre-bottleneck epoch and s = 0.02 in the post-bottleneck epoch.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4581834&req=5

pgen.1005527.g006: Balanced accuracy of PreCIOSS on a model of European demography.Populations were simulated for a popular model of human demography (S11 Fig and Gravel et al. (2011) [50]). The onset times of selection were separated into (A) pre-bottleneck (51 kya–23 kya) and (B) post-bottleneck (23 kya–current) epochs, with 10000 start times in each bin. All samples were simulated with n = 200, θ = 48, ρ = 25. Samples were simulated with selection coefficient s = 0.005 in the pre-bottleneck epoch and s = 0.02 in the post-bottleneck epoch.
Mentions: Finally, we tested PreCIOSS on a popular model of European demography [50]. The model (S11 Fig) suggests an Out-of-Africa migration 51 kya (51 thousand years ago), followed by a European and East Asian split 23 kya. It also suggests bottlenecks that reduced the effective population sizes of the European (NEu0 = 1032), and East-Asian (NAs0 = 550) populations, and exponential growth in the populations following the bottleneck events. We simulated populations based on this model, as well as selection events (hard sweep) at different times after the Out-of-Africa migration, and partitioned all samples into two categories depending on whether the selection event happened before or after the bottleneck. These scenarios are challenging for most tests of adaptation (see, e.g., [23]). However, there are still significant differences in the 1-HAF scores of carriers and non-carriers. The balanced accuracy of PreCIOSS is shown in Fig 6A for ancient selection and Fig 6B for recent (after bottleneck) selection. The performance is quite robust, although somewhat worse in the early stages of the sweep. Once the favored allele frequency reaches 60%, the median accuracy is at 0.9. The accuracy is improved for recent adaptation, compared to ancient adaptation. Even for very recent sweeps, where the carrier frequency is 30–40%, the median balanced accuracy is close to 0.8. We used a lower selection coefficient for ancient selection compared to recent selection to ensure that we have sufficient cases of incomplete sweeps. Not surprisingly, the performance of PreCIOSS is worse for ancient selection compared to recent selection.

Bottom Line: The HAF score, assigned to individual haplotypes in a sample, naturally captures many of the properties shared by haplotypes carrying a favored allele.We provide a theoretical framework for computing expected HAF scores under different evolutionary scenarios, and we validate the theoretical predictions with simulations.As an application of HAF score computations, we develop an algorithm (PreCIOSS: Predicting Carriers of Ongoing Selective Sweeps) to identify carriers of the favored allele in selective sweeps, and we demonstrate its power on simulations of both hard and soft sweeps, as well as on data from well-known sweeps in human populations.

View Article: PubMed Central - PubMed

Affiliation: Bioinformatics Graduate Program, University of California, San Diego, La Jolla, California, United States of America.

ABSTRACT
Methods for detecting the genomic signatures of natural selection have been heavily studied, and they have been successful in identifying many selective sweeps. For most of these sweeps, the favored allele remains unknown, making it difficult to distinguish carriers of the sweep from non-carriers. In an ongoing selective sweep, carriers of the favored allele are likely to contain a future most recent common ancestor. Therefore, identifying them may prove useful in predicting the evolutionary trajectory--for example, in contexts involving drug-resistant pathogen strains or cancer subclones. The main contribution of this paper is the development and analysis of a new statistic, the Haplotype Allele Frequency (HAF) score. The HAF score, assigned to individual haplotypes in a sample, naturally captures many of the properties shared by haplotypes carrying a favored allele. We provide a theoretical framework for computing expected HAF scores under different evolutionary scenarios, and we validate the theoretical predictions with simulations. As an application of HAF score computations, we develop an algorithm (PreCIOSS: Predicting Carriers of Ongoing Selective Sweeps) to identify carriers of the favored allele in selective sweeps, and we demonstrate its power on simulations of both hard and soft sweeps, as well as on data from well-known sweeps in human populations.

No MeSH data available.


Related in: MedlinePlus