Limits...
Predicting Carriers of Ongoing Selective Sweeps without Knowledge of the Favored Allele.

Ronen R, Tesler G, Akbari A, Zakov S, Rosenberg NA, Bafna V - PLoS Genet. (2015)

Bottom Line: The HAF score, assigned to individual haplotypes in a sample, naturally captures many of the properties shared by haplotypes carrying a favored allele.We provide a theoretical framework for computing expected HAF scores under different evolutionary scenarios, and we validate the theoretical predictions with simulations.As an application of HAF score computations, we develop an algorithm (PreCIOSS: Predicting Carriers of Ongoing Selective Sweeps) to identify carriers of the favored allele in selective sweeps, and we demonstrate its power on simulations of both hard and soft sweeps, as well as on data from well-known sweeps in human populations.

View Article: PubMed Central - PubMed

Affiliation: Bioinformatics Graduate Program, University of California, San Diego, La Jolla, California, United States of America.

ABSTRACT
Methods for detecting the genomic signatures of natural selection have been heavily studied, and they have been successful in identifying many selective sweeps. For most of these sweeps, the favored allele remains unknown, making it difficult to distinguish carriers of the sweep from non-carriers. In an ongoing selective sweep, carriers of the favored allele are likely to contain a future most recent common ancestor. Therefore, identifying them may prove useful in predicting the evolutionary trajectory--for example, in contexts involving drug-resistant pathogen strains or cancer subclones. The main contribution of this paper is the development and analysis of a new statistic, the Haplotype Allele Frequency (HAF) score. The HAF score, assigned to individual haplotypes in a sample, naturally captures many of the properties shared by haplotypes carrying a favored allele. We provide a theoretical framework for computing expected HAF scores under different evolutionary scenarios, and we validate the theoretical predictions with simulations. As an application of HAF score computations, we develop an algorithm (PreCIOSS: Predicting Carriers of Ongoing Selective Sweeps) to identify carriers of the favored allele in selective sweeps, and we demonstrate its power on simulations of both hard and soft sweeps, as well as on data from well-known sweeps in human populations.

No MeSH data available.


Related in: MedlinePlus

Predicting carriers of hard and soft sweeps.Balanced accuracy (Eq (16)) of PreCIOSS in populations undergoing hard and soft sweeps. For each frequency bin, (A) 200 samples were simulated (n = 200, θ = 48, ρ = 25) undergoing a hard sweep (s = 0.01, ν0 = 1/20000), and (B) 200 samples were simulated undergoing a soft sweep (s = 0.01, ν0 = 0.02). We split each sweep into intervals as ν progresses ([0.0, 0.1] through [0.9, 1.0]). For each ν interval, we show the distribution of balanced accuracy using standard violin plots (blue). For comparison, we also plotted the balanced accuracy of iHS adapted to predicting carrier haplotypes (red).
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4581834&req=5

pgen.1005527.g005: Predicting carriers of hard and soft sweeps.Balanced accuracy (Eq (16)) of PreCIOSS in populations undergoing hard and soft sweeps. For each frequency bin, (A) 200 samples were simulated (n = 200, θ = 48, ρ = 25) undergoing a hard sweep (s = 0.01, ν0 = 1/20000), and (B) 200 samples were simulated undergoing a soft sweep (s = 0.01, ν0 = 0.02). We split each sweep into intervals as ν progresses ([0.0, 0.1] through [0.9, 1.0]). For each ν interval, we show the distribution of balanced accuracy using standard violin plots (blue). For comparison, we also plotted the balanced accuracy of iHS adapted to predicting carrier haplotypes (red).

Mentions: While there are no tools currently available that directly predict the carrier state of a haplotype, some approaches are relevant. For example, Grossman et al. [49] developed a ‘composite of multiple signals’ (CMS) statistic to reduce the number of candidates for the favored mutation, but CMS cannot directly be used to identify carriers of the favored mutation. Similarly, the iHS statistic uses the dominant haplotype frequency decay in a window centered around each locus, as a test for recent positive selection [30]. As a comparison, we used iHS to distinguish carriers from non-carriers based on segregating alleles at the locus with peak iHS score. The balanced accuracy of PreCIOSS on hard sweeps is shown in Fig 5A for a specific choice of parameters (200 samples with n = 200, θ = 48, ρ = 25, s = 0.01). Once the sweep reaches frequencies above 30%, the balanced accuracy increases (median ∼70%) and remains high (median ∼90%) for the remainder of the sweep. At the beginning of the sweep, the balanced accuracy, despite being asymptotically unbiased, suffers from high variance due to the severe class imbalance (few carriers in the beginning, few non-carriers at the end). The accuracy is reduced for soft sweeps (Fig 5B, run with similar parameters), as increasing the carrier haplotype frequency leads to higher variance in 1-HAF scores.


Predicting Carriers of Ongoing Selective Sweeps without Knowledge of the Favored Allele.

Ronen R, Tesler G, Akbari A, Zakov S, Rosenberg NA, Bafna V - PLoS Genet. (2015)

Predicting carriers of hard and soft sweeps.Balanced accuracy (Eq (16)) of PreCIOSS in populations undergoing hard and soft sweeps. For each frequency bin, (A) 200 samples were simulated (n = 200, θ = 48, ρ = 25) undergoing a hard sweep (s = 0.01, ν0 = 1/20000), and (B) 200 samples were simulated undergoing a soft sweep (s = 0.01, ν0 = 0.02). We split each sweep into intervals as ν progresses ([0.0, 0.1] through [0.9, 1.0]). For each ν interval, we show the distribution of balanced accuracy using standard violin plots (blue). For comparison, we also plotted the balanced accuracy of iHS adapted to predicting carrier haplotypes (red).
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4581834&req=5

pgen.1005527.g005: Predicting carriers of hard and soft sweeps.Balanced accuracy (Eq (16)) of PreCIOSS in populations undergoing hard and soft sweeps. For each frequency bin, (A) 200 samples were simulated (n = 200, θ = 48, ρ = 25) undergoing a hard sweep (s = 0.01, ν0 = 1/20000), and (B) 200 samples were simulated undergoing a soft sweep (s = 0.01, ν0 = 0.02). We split each sweep into intervals as ν progresses ([0.0, 0.1] through [0.9, 1.0]). For each ν interval, we show the distribution of balanced accuracy using standard violin plots (blue). For comparison, we also plotted the balanced accuracy of iHS adapted to predicting carrier haplotypes (red).
Mentions: While there are no tools currently available that directly predict the carrier state of a haplotype, some approaches are relevant. For example, Grossman et al. [49] developed a ‘composite of multiple signals’ (CMS) statistic to reduce the number of candidates for the favored mutation, but CMS cannot directly be used to identify carriers of the favored mutation. Similarly, the iHS statistic uses the dominant haplotype frequency decay in a window centered around each locus, as a test for recent positive selection [30]. As a comparison, we used iHS to distinguish carriers from non-carriers based on segregating alleles at the locus with peak iHS score. The balanced accuracy of PreCIOSS on hard sweeps is shown in Fig 5A for a specific choice of parameters (200 samples with n = 200, θ = 48, ρ = 25, s = 0.01). Once the sweep reaches frequencies above 30%, the balanced accuracy increases (median ∼70%) and remains high (median ∼90%) for the remainder of the sweep. At the beginning of the sweep, the balanced accuracy, despite being asymptotically unbiased, suffers from high variance due to the severe class imbalance (few carriers in the beginning, few non-carriers at the end). The accuracy is reduced for soft sweeps (Fig 5B, run with similar parameters), as increasing the carrier haplotype frequency leads to higher variance in 1-HAF scores.

Bottom Line: The HAF score, assigned to individual haplotypes in a sample, naturally captures many of the properties shared by haplotypes carrying a favored allele.We provide a theoretical framework for computing expected HAF scores under different evolutionary scenarios, and we validate the theoretical predictions with simulations.As an application of HAF score computations, we develop an algorithm (PreCIOSS: Predicting Carriers of Ongoing Selective Sweeps) to identify carriers of the favored allele in selective sweeps, and we demonstrate its power on simulations of both hard and soft sweeps, as well as on data from well-known sweeps in human populations.

View Article: PubMed Central - PubMed

Affiliation: Bioinformatics Graduate Program, University of California, San Diego, La Jolla, California, United States of America.

ABSTRACT
Methods for detecting the genomic signatures of natural selection have been heavily studied, and they have been successful in identifying many selective sweeps. For most of these sweeps, the favored allele remains unknown, making it difficult to distinguish carriers of the sweep from non-carriers. In an ongoing selective sweep, carriers of the favored allele are likely to contain a future most recent common ancestor. Therefore, identifying them may prove useful in predicting the evolutionary trajectory--for example, in contexts involving drug-resistant pathogen strains or cancer subclones. The main contribution of this paper is the development and analysis of a new statistic, the Haplotype Allele Frequency (HAF) score. The HAF score, assigned to individual haplotypes in a sample, naturally captures many of the properties shared by haplotypes carrying a favored allele. We provide a theoretical framework for computing expected HAF scores under different evolutionary scenarios, and we validate the theoretical predictions with simulations. As an application of HAF score computations, we develop an algorithm (PreCIOSS: Predicting Carriers of Ongoing Selective Sweeps) to identify carriers of the favored allele in selective sweeps, and we demonstrate its power on simulations of both hard and soft sweeps, as well as on data from well-known sweeps in human populations.

No MeSH data available.


Related in: MedlinePlus