Limits...
Predicting Carriers of Ongoing Selective Sweeps without Knowledge of the Favored Allele.

Ronen R, Tesler G, Akbari A, Zakov S, Rosenberg NA, Bafna V - PLoS Genet. (2015)

Bottom Line: The HAF score, assigned to individual haplotypes in a sample, naturally captures many of the properties shared by haplotypes carrying a favored allele.We provide a theoretical framework for computing expected HAF scores under different evolutionary scenarios, and we validate the theoretical predictions with simulations.As an application of HAF score computations, we develop an algorithm (PreCIOSS: Predicting Carriers of Ongoing Selective Sweeps) to identify carriers of the favored allele in selective sweeps, and we demonstrate its power on simulations of both hard and soft sweeps, as well as on data from well-known sweeps in human populations.

View Article: PubMed Central - PubMed

Affiliation: Bioinformatics Graduate Program, University of California, San Diego, La Jolla, California, United States of America.

ABSTRACT
Methods for detecting the genomic signatures of natural selection have been heavily studied, and they have been successful in identifying many selective sweeps. For most of these sweeps, the favored allele remains unknown, making it difficult to distinguish carriers of the sweep from non-carriers. In an ongoing selective sweep, carriers of the favored allele are likely to contain a future most recent common ancestor. Therefore, identifying them may prove useful in predicting the evolutionary trajectory--for example, in contexts involving drug-resistant pathogen strains or cancer subclones. The main contribution of this paper is the development and analysis of a new statistic, the Haplotype Allele Frequency (HAF) score. The HAF score, assigned to individual haplotypes in a sample, naturally captures many of the properties shared by haplotypes carrying a favored allele. We provide a theoretical framework for computing expected HAF scores under different evolutionary scenarios, and we validate the theoretical predictions with simulations. As an application of HAF score computations, we develop an algorithm (PreCIOSS: Predicting Carriers of Ongoing Selective Sweeps) to identify carriers of the favored allele in selective sweeps, and we demonstrate its power on simulations of both hard and soft sweeps, as well as on data from well-known sweeps in human populations.

No MeSH data available.


Related in: MedlinePlus

HAF scores in a selective sweep ‘peak’ and ‘trough’.(A) Observed values (red) of the mean ‘trough’ 1-HAF scores in simulated selective sweeps with coefficients s ∈ [0.005, 0.040]. Theoretical values (blue) of expected 1-HAF scores under exponential population growth with population-scaled rates α ∈ [100, 600] given by Eq (12). Simulated 1-HAF scores (red) represent the mean of 2000 simulated population samples for each value of s, with θ = 48, n = 200. (B) Observed mean 1-HAF peak, trough, and difference (peak minus trough) for selective sweeps with coefficients s ∈ [0.005, 0.040]. The dashed line represents the approximate value of the peak 1-HAF score given by Eq (15). (C) Dynamics of the expected value of 1-HAFcar (1-HAF score of haplotypes carrying the favored allele) plotted as a function of the fraction of carriers (ν) in the sample during a selective sweep. For each (θ, n, ν) with θ ∈ {24, 48}, n ∈ {20, 50, 100, 200}, , s = 0.01, and N = 20000, we plotted the mean value of (1-HAFcar)/(θn) over 1000 trials, and compared against the theoretical values (Eq (13)).
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4581834&req=5

pgen.1005527.g003: HAF scores in a selective sweep ‘peak’ and ‘trough’.(A) Observed values (red) of the mean ‘trough’ 1-HAF scores in simulated selective sweeps with coefficients s ∈ [0.005, 0.040]. Theoretical values (blue) of expected 1-HAF scores under exponential population growth with population-scaled rates α ∈ [100, 600] given by Eq (12). Simulated 1-HAF scores (red) represent the mean of 2000 simulated population samples for each value of s, with θ = 48, n = 200. (B) Observed mean 1-HAF peak, trough, and difference (peak minus trough) for selective sweeps with coefficients s ∈ [0.005, 0.040]. The dashed line represents the approximate value of the peak 1-HAF score given by Eq (15). (C) Dynamics of the expected value of 1-HAFcar (1-HAF score of haplotypes carrying the favored allele) plotted as a function of the fraction of carriers (ν) in the sample during a selective sweep. For each (θ, n, ν) with θ ∈ {24, 48}, n ∈ {20, 50, 100, 200}, , s = 0.01, and N = 20000, we plotted the mean value of (1-HAFcar)/(θn) over 1000 trials, and compared against the theoretical values (Eq (13)).

Mentions: The HAF-trough of a sweep is the value of 1-HAF at fixation. We took the mean of the HAF-trough values over 200 populations simulated under selective sweeps with coefficients s ∈ [0.005, 0.040] (see ‘Simulations’ in Methods), and compared it to 1-HAF values in simulated neutral populations growing exponentially at rates α ∈ [100, 600]. Fig 3A shows a close similarity between the 1-HAF values under exponential growth (blue) and the selective sweep trough (red).


Predicting Carriers of Ongoing Selective Sweeps without Knowledge of the Favored Allele.

Ronen R, Tesler G, Akbari A, Zakov S, Rosenberg NA, Bafna V - PLoS Genet. (2015)

HAF scores in a selective sweep ‘peak’ and ‘trough’.(A) Observed values (red) of the mean ‘trough’ 1-HAF scores in simulated selective sweeps with coefficients s ∈ [0.005, 0.040]. Theoretical values (blue) of expected 1-HAF scores under exponential population growth with population-scaled rates α ∈ [100, 600] given by Eq (12). Simulated 1-HAF scores (red) represent the mean of 2000 simulated population samples for each value of s, with θ = 48, n = 200. (B) Observed mean 1-HAF peak, trough, and difference (peak minus trough) for selective sweeps with coefficients s ∈ [0.005, 0.040]. The dashed line represents the approximate value of the peak 1-HAF score given by Eq (15). (C) Dynamics of the expected value of 1-HAFcar (1-HAF score of haplotypes carrying the favored allele) plotted as a function of the fraction of carriers (ν) in the sample during a selective sweep. For each (θ, n, ν) with θ ∈ {24, 48}, n ∈ {20, 50, 100, 200}, , s = 0.01, and N = 20000, we plotted the mean value of (1-HAFcar)/(θn) over 1000 trials, and compared against the theoretical values (Eq (13)).
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4581834&req=5

pgen.1005527.g003: HAF scores in a selective sweep ‘peak’ and ‘trough’.(A) Observed values (red) of the mean ‘trough’ 1-HAF scores in simulated selective sweeps with coefficients s ∈ [0.005, 0.040]. Theoretical values (blue) of expected 1-HAF scores under exponential population growth with population-scaled rates α ∈ [100, 600] given by Eq (12). Simulated 1-HAF scores (red) represent the mean of 2000 simulated population samples for each value of s, with θ = 48, n = 200. (B) Observed mean 1-HAF peak, trough, and difference (peak minus trough) for selective sweeps with coefficients s ∈ [0.005, 0.040]. The dashed line represents the approximate value of the peak 1-HAF score given by Eq (15). (C) Dynamics of the expected value of 1-HAFcar (1-HAF score of haplotypes carrying the favored allele) plotted as a function of the fraction of carriers (ν) in the sample during a selective sweep. For each (θ, n, ν) with θ ∈ {24, 48}, n ∈ {20, 50, 100, 200}, , s = 0.01, and N = 20000, we plotted the mean value of (1-HAFcar)/(θn) over 1000 trials, and compared against the theoretical values (Eq (13)).
Mentions: The HAF-trough of a sweep is the value of 1-HAF at fixation. We took the mean of the HAF-trough values over 200 populations simulated under selective sweeps with coefficients s ∈ [0.005, 0.040] (see ‘Simulations’ in Methods), and compared it to 1-HAF values in simulated neutral populations growing exponentially at rates α ∈ [100, 600]. Fig 3A shows a close similarity between the 1-HAF values under exponential growth (blue) and the selective sweep trough (red).

Bottom Line: The HAF score, assigned to individual haplotypes in a sample, naturally captures many of the properties shared by haplotypes carrying a favored allele.We provide a theoretical framework for computing expected HAF scores under different evolutionary scenarios, and we validate the theoretical predictions with simulations.As an application of HAF score computations, we develop an algorithm (PreCIOSS: Predicting Carriers of Ongoing Selective Sweeps) to identify carriers of the favored allele in selective sweeps, and we demonstrate its power on simulations of both hard and soft sweeps, as well as on data from well-known sweeps in human populations.

View Article: PubMed Central - PubMed

Affiliation: Bioinformatics Graduate Program, University of California, San Diego, La Jolla, California, United States of America.

ABSTRACT
Methods for detecting the genomic signatures of natural selection have been heavily studied, and they have been successful in identifying many selective sweeps. For most of these sweeps, the favored allele remains unknown, making it difficult to distinguish carriers of the sweep from non-carriers. In an ongoing selective sweep, carriers of the favored allele are likely to contain a future most recent common ancestor. Therefore, identifying them may prove useful in predicting the evolutionary trajectory--for example, in contexts involving drug-resistant pathogen strains or cancer subclones. The main contribution of this paper is the development and analysis of a new statistic, the Haplotype Allele Frequency (HAF) score. The HAF score, assigned to individual haplotypes in a sample, naturally captures many of the properties shared by haplotypes carrying a favored allele. We provide a theoretical framework for computing expected HAF scores under different evolutionary scenarios, and we validate the theoretical predictions with simulations. As an application of HAF score computations, we develop an algorithm (PreCIOSS: Predicting Carriers of Ongoing Selective Sweeps) to identify carriers of the favored allele in selective sweeps, and we demonstrate its power on simulations of both hard and soft sweeps, as well as on data from well-known sweeps in human populations.

No MeSH data available.


Related in: MedlinePlus