Limits...
Predicting Carriers of Ongoing Selective Sweeps without Knowledge of the Favored Allele.

Ronen R, Tesler G, Akbari A, Zakov S, Rosenberg NA, Bafna V - PLoS Genet. (2015)

Bottom Line: The HAF score, assigned to individual haplotypes in a sample, naturally captures many of the properties shared by haplotypes carrying a favored allele.We provide a theoretical framework for computing expected HAF scores under different evolutionary scenarios, and we validate the theoretical predictions with simulations.As an application of HAF score computations, we develop an algorithm (PreCIOSS: Predicting Carriers of Ongoing Selective Sweeps) to identify carriers of the favored allele in selective sweeps, and we demonstrate its power on simulations of both hard and soft sweeps, as well as on data from well-known sweeps in human populations.

View Article: PubMed Central - PubMed

Affiliation: Bioinformatics Graduate Program, University of California, San Diego, La Jolla, California, United States of America.

ABSTRACT
Methods for detecting the genomic signatures of natural selection have been heavily studied, and they have been successful in identifying many selective sweeps. For most of these sweeps, the favored allele remains unknown, making it difficult to distinguish carriers of the sweep from non-carriers. In an ongoing selective sweep, carriers of the favored allele are likely to contain a future most recent common ancestor. Therefore, identifying them may prove useful in predicting the evolutionary trajectory--for example, in contexts involving drug-resistant pathogen strains or cancer subclones. The main contribution of this paper is the development and analysis of a new statistic, the Haplotype Allele Frequency (HAF) score. The HAF score, assigned to individual haplotypes in a sample, naturally captures many of the properties shared by haplotypes carrying a favored allele. We provide a theoretical framework for computing expected HAF scores under different evolutionary scenarios, and we validate the theoretical predictions with simulations. As an application of HAF score computations, we develop an algorithm (PreCIOSS: Predicting Carriers of Ongoing Selective Sweeps) to identify carriers of the favored allele in selective sweeps, and we demonstrate its power on simulations of both hard and soft sweeps, as well as on data from well-known sweeps in human populations.

No MeSH data available.


Related in: MedlinePlus

Schematic of HAF score dynamics.We consider HAF scores in 50 kb segments, examining n = 200 haplotypes sampled from a constant-sized (N = 20000 haploids) population, evolving with population-scaled mutation rate θ = 48 and selection coefficient s = 0.05. We do forward simulations, with time t = 0 at the onset of selection and t increasing towards the present time. Snapshots of generations are shown at specific times indicated at tick marks on the x-axis. Note that these times are increasing but neither consecutive nor regularly spaced. Each selected generation is depicted as a tall thin rectangle. The number in each rectangle is the frequency of the favored allele (carriers). A few rectangles are shown for each phase of a simulated population undergoing a selective sweep. Each point within a rectangle represents the 1-HAF score of a randomly chosen haplotype. Red points represent carriers of the favored allele and blue points represent non-carriers. Points are scattered randomly on the x-axis within each rectangle, but all points within the same rectangle represent the same generation at the time indicated by the tick mark on the x-axis, regardless of their horizontal position within the rectangle. Darker shades of red or blue indicate a higher density of points at that level. The dotted line represents the expected 1-HAF score in the neutral population. (A) Simulation of a non-recombining segment. (B) Simulation with population-scaled recombination rate ρ = 25 (see Methods).
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4581834&req=5

pgen.1005527.g002: Schematic of HAF score dynamics.We consider HAF scores in 50 kb segments, examining n = 200 haplotypes sampled from a constant-sized (N = 20000 haploids) population, evolving with population-scaled mutation rate θ = 48 and selection coefficient s = 0.05. We do forward simulations, with time t = 0 at the onset of selection and t increasing towards the present time. Snapshots of generations are shown at specific times indicated at tick marks on the x-axis. Note that these times are increasing but neither consecutive nor regularly spaced. Each selected generation is depicted as a tall thin rectangle. The number in each rectangle is the frequency of the favored allele (carriers). A few rectangles are shown for each phase of a simulated population undergoing a selective sweep. Each point within a rectangle represents the 1-HAF score of a randomly chosen haplotype. Red points represent carriers of the favored allele and blue points represent non-carriers. Points are scattered randomly on the x-axis within each rectangle, but all points within the same rectangle represent the same generation at the time indicated by the tick mark on the x-axis, regardless of their horizontal position within the rectangle. Darker shades of red or blue indicate a higher density of points at that level. The dotted line represents the expected 1-HAF score in the neutral population. (A) Simulation of a non-recombining segment. (B) Simulation with population-scaled recombination rate ρ = 25 (see Methods).

Mentions: We now consider the dynamics of HAF scores in a population undergoing a selective sweep. To do this, we use data simulated under several scenarios. Fig 2 illustrates the HAF score dynamics in a single simulated population undergoing a hard sweep, with selection coefficient s = 0.05. See ‘Simulations’ in Methods for a detailed description of the simulation parameters. Initially (leftmost, time 0) the HAF scores of carriers and non-carriers of the favored allele are similar. As the sweep progresses (times 100–450), carrier HAF scores increase to a peak value (HAF-peak). Soon after fixation (time ∼450), we observe a sharp decline in HAF scores (HAF-trough), followed by slow and steady recovery due to new mutation and drift (times 500–50000). We observe similar behavior for the HAF score dynamics in an exponentially growing population, and soft sweep scenarios (S5 and S6 Figs). Though soft sweeps can arise under different circumstances, we restrict our attention to soft sweeps arising from standing variation. While the behavior is similar, we note that during a soft sweep, the HAF scores do not have as sharp a decline as in the hard sweep scenarios.


Predicting Carriers of Ongoing Selective Sweeps without Knowledge of the Favored Allele.

Ronen R, Tesler G, Akbari A, Zakov S, Rosenberg NA, Bafna V - PLoS Genet. (2015)

Schematic of HAF score dynamics.We consider HAF scores in 50 kb segments, examining n = 200 haplotypes sampled from a constant-sized (N = 20000 haploids) population, evolving with population-scaled mutation rate θ = 48 and selection coefficient s = 0.05. We do forward simulations, with time t = 0 at the onset of selection and t increasing towards the present time. Snapshots of generations are shown at specific times indicated at tick marks on the x-axis. Note that these times are increasing but neither consecutive nor regularly spaced. Each selected generation is depicted as a tall thin rectangle. The number in each rectangle is the frequency of the favored allele (carriers). A few rectangles are shown for each phase of a simulated population undergoing a selective sweep. Each point within a rectangle represents the 1-HAF score of a randomly chosen haplotype. Red points represent carriers of the favored allele and blue points represent non-carriers. Points are scattered randomly on the x-axis within each rectangle, but all points within the same rectangle represent the same generation at the time indicated by the tick mark on the x-axis, regardless of their horizontal position within the rectangle. Darker shades of red or blue indicate a higher density of points at that level. The dotted line represents the expected 1-HAF score in the neutral population. (A) Simulation of a non-recombining segment. (B) Simulation with population-scaled recombination rate ρ = 25 (see Methods).
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4581834&req=5

pgen.1005527.g002: Schematic of HAF score dynamics.We consider HAF scores in 50 kb segments, examining n = 200 haplotypes sampled from a constant-sized (N = 20000 haploids) population, evolving with population-scaled mutation rate θ = 48 and selection coefficient s = 0.05. We do forward simulations, with time t = 0 at the onset of selection and t increasing towards the present time. Snapshots of generations are shown at specific times indicated at tick marks on the x-axis. Note that these times are increasing but neither consecutive nor regularly spaced. Each selected generation is depicted as a tall thin rectangle. The number in each rectangle is the frequency of the favored allele (carriers). A few rectangles are shown for each phase of a simulated population undergoing a selective sweep. Each point within a rectangle represents the 1-HAF score of a randomly chosen haplotype. Red points represent carriers of the favored allele and blue points represent non-carriers. Points are scattered randomly on the x-axis within each rectangle, but all points within the same rectangle represent the same generation at the time indicated by the tick mark on the x-axis, regardless of their horizontal position within the rectangle. Darker shades of red or blue indicate a higher density of points at that level. The dotted line represents the expected 1-HAF score in the neutral population. (A) Simulation of a non-recombining segment. (B) Simulation with population-scaled recombination rate ρ = 25 (see Methods).
Mentions: We now consider the dynamics of HAF scores in a population undergoing a selective sweep. To do this, we use data simulated under several scenarios. Fig 2 illustrates the HAF score dynamics in a single simulated population undergoing a hard sweep, with selection coefficient s = 0.05. See ‘Simulations’ in Methods for a detailed description of the simulation parameters. Initially (leftmost, time 0) the HAF scores of carriers and non-carriers of the favored allele are similar. As the sweep progresses (times 100–450), carrier HAF scores increase to a peak value (HAF-peak). Soon after fixation (time ∼450), we observe a sharp decline in HAF scores (HAF-trough), followed by slow and steady recovery due to new mutation and drift (times 500–50000). We observe similar behavior for the HAF score dynamics in an exponentially growing population, and soft sweep scenarios (S5 and S6 Figs). Though soft sweeps can arise under different circumstances, we restrict our attention to soft sweeps arising from standing variation. While the behavior is similar, we note that during a soft sweep, the HAF scores do not have as sharp a decline as in the hard sweep scenarios.

Bottom Line: The HAF score, assigned to individual haplotypes in a sample, naturally captures many of the properties shared by haplotypes carrying a favored allele.We provide a theoretical framework for computing expected HAF scores under different evolutionary scenarios, and we validate the theoretical predictions with simulations.As an application of HAF score computations, we develop an algorithm (PreCIOSS: Predicting Carriers of Ongoing Selective Sweeps) to identify carriers of the favored allele in selective sweeps, and we demonstrate its power on simulations of both hard and soft sweeps, as well as on data from well-known sweeps in human populations.

View Article: PubMed Central - PubMed

Affiliation: Bioinformatics Graduate Program, University of California, San Diego, La Jolla, California, United States of America.

ABSTRACT
Methods for detecting the genomic signatures of natural selection have been heavily studied, and they have been successful in identifying many selective sweeps. For most of these sweeps, the favored allele remains unknown, making it difficult to distinguish carriers of the sweep from non-carriers. In an ongoing selective sweep, carriers of the favored allele are likely to contain a future most recent common ancestor. Therefore, identifying them may prove useful in predicting the evolutionary trajectory--for example, in contexts involving drug-resistant pathogen strains or cancer subclones. The main contribution of this paper is the development and analysis of a new statistic, the Haplotype Allele Frequency (HAF) score. The HAF score, assigned to individual haplotypes in a sample, naturally captures many of the properties shared by haplotypes carrying a favored allele. We provide a theoretical framework for computing expected HAF scores under different evolutionary scenarios, and we validate the theoretical predictions with simulations. As an application of HAF score computations, we develop an algorithm (PreCIOSS: Predicting Carriers of Ongoing Selective Sweeps) to identify carriers of the favored allele in selective sweeps, and we demonstrate its power on simulations of both hard and soft sweeps, as well as on data from well-known sweeps in human populations.

No MeSH data available.


Related in: MedlinePlus