Limits...
Predicting Carriers of Ongoing Selective Sweeps without Knowledge of the Favored Allele.

Ronen R, Tesler G, Akbari A, Zakov S, Rosenberg NA, Bafna V - PLoS Genet. (2015)

Bottom Line: The HAF score, assigned to individual haplotypes in a sample, naturally captures many of the properties shared by haplotypes carrying a favored allele.We provide a theoretical framework for computing expected HAF scores under different evolutionary scenarios, and we validate the theoretical predictions with simulations.As an application of HAF score computations, we develop an algorithm (PreCIOSS: Predicting Carriers of Ongoing Selective Sweeps) to identify carriers of the favored allele in selective sweeps, and we demonstrate its power on simulations of both hard and soft sweeps, as well as on data from well-known sweeps in human populations.

View Article: PubMed Central - PubMed

Affiliation: Bioinformatics Graduate Program, University of California, San Diego, La Jolla, California, United States of America.

ABSTRACT
Methods for detecting the genomic signatures of natural selection have been heavily studied, and they have been successful in identifying many selective sweeps. For most of these sweeps, the favored allele remains unknown, making it difficult to distinguish carriers of the sweep from non-carriers. In an ongoing selective sweep, carriers of the favored allele are likely to contain a future most recent common ancestor. Therefore, identifying them may prove useful in predicting the evolutionary trajectory--for example, in contexts involving drug-resistant pathogen strains or cancer subclones. The main contribution of this paper is the development and analysis of a new statistic, the Haplotype Allele Frequency (HAF) score. The HAF score, assigned to individual haplotypes in a sample, naturally captures many of the properties shared by haplotypes carrying a favored allele. We provide a theoretical framework for computing expected HAF scores under different evolutionary scenarios, and we validate the theoretical predictions with simulations. As an application of HAF score computations, we develop an algorithm (PreCIOSS: Predicting Carriers of Ongoing Selective Sweeps) to identify carriers of the favored allele in selective sweeps, and we demonstrate its power on simulations of both hard and soft sweeps, as well as on data from well-known sweeps in human populations.

No MeSH data available.


Related in: MedlinePlus

HAF score dynamics in ongoing selective sweeps.HAF scores were computed from 250 simulated population samples (n = 200) undergoing a hard sweep (θ = 48, ρ = 25, s = 0.01), using the simulation software msms [47]. (A) Each violin shows the Gaussian kernel density estimation (KDE) of 1-HAF scores in carriers (blue) and non-carriers (red) of the favored allele, as the sweep progresses in frequency. A standard box plot is overlaid on each violin to mark the 25th, 50th, and 75th percentiles, with means indicated by asterisks. The horizontal dashed line represents the expected 1-HAF scores under neutrality (Eq (4)). (B) Corresponding violins showing the in-sample percentile rank of 1-HAF scores. (C) −log2(P) values for Wilcoxon rank sum tests rejecting the  hypothesis of identically distributed 1-HAF scores among carriers and non-carriers within each population sample. The number above each bin indicates the fraction of significant tests (where P < 0.05, shown by the dashed line).
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4581834&req=5

pgen.1005527.g004: HAF score dynamics in ongoing selective sweeps.HAF scores were computed from 250 simulated population samples (n = 200) undergoing a hard sweep (θ = 48, ρ = 25, s = 0.01), using the simulation software msms [47]. (A) Each violin shows the Gaussian kernel density estimation (KDE) of 1-HAF scores in carriers (blue) and non-carriers (red) of the favored allele, as the sweep progresses in frequency. A standard box plot is overlaid on each violin to mark the 25th, 50th, and 75th percentiles, with means indicated by asterisks. The horizontal dashed line represents the expected 1-HAF scores under neutrality (Eq (4)). (B) Corresponding violins showing the in-sample percentile rank of 1-HAF scores. (C) −log2(P) values for Wilcoxon rank sum tests rejecting the hypothesis of identically distributed 1-HAF scores among carriers and non-carriers within each population sample. The number above each bin indicates the fraction of significant tests (where P < 0.05, shown by the dashed line).

Mentions: In Fig 4A, we show the distributions of haplotype 1-HAF scores aggregated from 500 simulated populations undergoing a hard selective sweep (see ‘Simulations’ in Methods for detailed parameter choices). Scores were computed for random samples of n = 200 haplotypes taken at regular time intervals. They are stratified by the frequency of the favored allele at the time of sampling. Further, scores are stratified into carrier and non-carrier classes (of the favored allele). As with a single population, HAF scores of carriers and non-carriers diverge as the sweep progresses in frequency. We note, however, that even close to fixation (frequencies 80–100%) the distributions of HAF scores between carriers and non-carriers maintain considerable overlap. The high variance in HAF scores makes them only weakly informative of sweep carrier status when comparing across population samples (or genomic regions within a single population). Within a single population sample, however, the HAF scores are highly informative of the carrier status. This is illustrated in Fig 4B, showing the distributions of HAF score percentile rank within their respective samples. We observe that the rank distributions have minimal overlap for carriers and non-carriers of the favored allele. Any remaining overlap in the percentile rank distributions in the final stages of a sweep (favored allele frequency ≥ 70%) stems mostly from recombination, which allows the favored allele to recombine onto haplotypes outside the selected clade (creating low HAF score carriers) and vice-versa (creating high HAF score non-carriers). The overall strong separation between carriers and non-carriers is further illustrated by the highly significant P-values of Wilcoxon rank sum tests rejecting the hypothesis of identically distributed HAF scores among carriers and non-carriers within each population sample (Fig 4C).


Predicting Carriers of Ongoing Selective Sweeps without Knowledge of the Favored Allele.

Ronen R, Tesler G, Akbari A, Zakov S, Rosenberg NA, Bafna V - PLoS Genet. (2015)

HAF score dynamics in ongoing selective sweeps.HAF scores were computed from 250 simulated population samples (n = 200) undergoing a hard sweep (θ = 48, ρ = 25, s = 0.01), using the simulation software msms [47]. (A) Each violin shows the Gaussian kernel density estimation (KDE) of 1-HAF scores in carriers (blue) and non-carriers (red) of the favored allele, as the sweep progresses in frequency. A standard box plot is overlaid on each violin to mark the 25th, 50th, and 75th percentiles, with means indicated by asterisks. The horizontal dashed line represents the expected 1-HAF scores under neutrality (Eq (4)). (B) Corresponding violins showing the in-sample percentile rank of 1-HAF scores. (C) −log2(P) values for Wilcoxon rank sum tests rejecting the  hypothesis of identically distributed 1-HAF scores among carriers and non-carriers within each population sample. The number above each bin indicates the fraction of significant tests (where P < 0.05, shown by the dashed line).
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4581834&req=5

pgen.1005527.g004: HAF score dynamics in ongoing selective sweeps.HAF scores were computed from 250 simulated population samples (n = 200) undergoing a hard sweep (θ = 48, ρ = 25, s = 0.01), using the simulation software msms [47]. (A) Each violin shows the Gaussian kernel density estimation (KDE) of 1-HAF scores in carriers (blue) and non-carriers (red) of the favored allele, as the sweep progresses in frequency. A standard box plot is overlaid on each violin to mark the 25th, 50th, and 75th percentiles, with means indicated by asterisks. The horizontal dashed line represents the expected 1-HAF scores under neutrality (Eq (4)). (B) Corresponding violins showing the in-sample percentile rank of 1-HAF scores. (C) −log2(P) values for Wilcoxon rank sum tests rejecting the hypothesis of identically distributed 1-HAF scores among carriers and non-carriers within each population sample. The number above each bin indicates the fraction of significant tests (where P < 0.05, shown by the dashed line).
Mentions: In Fig 4A, we show the distributions of haplotype 1-HAF scores aggregated from 500 simulated populations undergoing a hard selective sweep (see ‘Simulations’ in Methods for detailed parameter choices). Scores were computed for random samples of n = 200 haplotypes taken at regular time intervals. They are stratified by the frequency of the favored allele at the time of sampling. Further, scores are stratified into carrier and non-carrier classes (of the favored allele). As with a single population, HAF scores of carriers and non-carriers diverge as the sweep progresses in frequency. We note, however, that even close to fixation (frequencies 80–100%) the distributions of HAF scores between carriers and non-carriers maintain considerable overlap. The high variance in HAF scores makes them only weakly informative of sweep carrier status when comparing across population samples (or genomic regions within a single population). Within a single population sample, however, the HAF scores are highly informative of the carrier status. This is illustrated in Fig 4B, showing the distributions of HAF score percentile rank within their respective samples. We observe that the rank distributions have minimal overlap for carriers and non-carriers of the favored allele. Any remaining overlap in the percentile rank distributions in the final stages of a sweep (favored allele frequency ≥ 70%) stems mostly from recombination, which allows the favored allele to recombine onto haplotypes outside the selected clade (creating low HAF score carriers) and vice-versa (creating high HAF score non-carriers). The overall strong separation between carriers and non-carriers is further illustrated by the highly significant P-values of Wilcoxon rank sum tests rejecting the hypothesis of identically distributed HAF scores among carriers and non-carriers within each population sample (Fig 4C).

Bottom Line: The HAF score, assigned to individual haplotypes in a sample, naturally captures many of the properties shared by haplotypes carrying a favored allele.We provide a theoretical framework for computing expected HAF scores under different evolutionary scenarios, and we validate the theoretical predictions with simulations.As an application of HAF score computations, we develop an algorithm (PreCIOSS: Predicting Carriers of Ongoing Selective Sweeps) to identify carriers of the favored allele in selective sweeps, and we demonstrate its power on simulations of both hard and soft sweeps, as well as on data from well-known sweeps in human populations.

View Article: PubMed Central - PubMed

Affiliation: Bioinformatics Graduate Program, University of California, San Diego, La Jolla, California, United States of America.

ABSTRACT
Methods for detecting the genomic signatures of natural selection have been heavily studied, and they have been successful in identifying many selective sweeps. For most of these sweeps, the favored allele remains unknown, making it difficult to distinguish carriers of the sweep from non-carriers. In an ongoing selective sweep, carriers of the favored allele are likely to contain a future most recent common ancestor. Therefore, identifying them may prove useful in predicting the evolutionary trajectory--for example, in contexts involving drug-resistant pathogen strains or cancer subclones. The main contribution of this paper is the development and analysis of a new statistic, the Haplotype Allele Frequency (HAF) score. The HAF score, assigned to individual haplotypes in a sample, naturally captures many of the properties shared by haplotypes carrying a favored allele. We provide a theoretical framework for computing expected HAF scores under different evolutionary scenarios, and we validate the theoretical predictions with simulations. As an application of HAF score computations, we develop an algorithm (PreCIOSS: Predicting Carriers of Ongoing Selective Sweeps) to identify carriers of the favored allele in selective sweeps, and we demonstrate its power on simulations of both hard and soft sweeps, as well as on data from well-known sweeps in human populations.

No MeSH data available.


Related in: MedlinePlus