Limits...
Predicting Carriers of Ongoing Selective Sweeps without Knowledge of the Favored Allele.

Ronen R, Tesler G, Akbari A, Zakov S, Rosenberg NA, Bafna V - PLoS Genet. (2015)

Bottom Line: The HAF score, assigned to individual haplotypes in a sample, naturally captures many of the properties shared by haplotypes carrying a favored allele.We provide a theoretical framework for computing expected HAF scores under different evolutionary scenarios, and we validate the theoretical predictions with simulations.As an application of HAF score computations, we develop an algorithm (PreCIOSS: Predicting Carriers of Ongoing Selective Sweeps) to identify carriers of the favored allele in selective sweeps, and we demonstrate its power on simulations of both hard and soft sweeps, as well as on data from well-known sweeps in human populations.

View Article: PubMed Central - PubMed

Affiliation: Bioinformatics Graduate Program, University of California, San Diego, La Jolla, California, United States of America.

ABSTRACT
Methods for detecting the genomic signatures of natural selection have been heavily studied, and they have been successful in identifying many selective sweeps. For most of these sweeps, the favored allele remains unknown, making it difficult to distinguish carriers of the sweep from non-carriers. In an ongoing selective sweep, carriers of the favored allele are likely to contain a future most recent common ancestor. Therefore, identifying them may prove useful in predicting the evolutionary trajectory--for example, in contexts involving drug-resistant pathogen strains or cancer subclones. The main contribution of this paper is the development and analysis of a new statistic, the Haplotype Allele Frequency (HAF) score. The HAF score, assigned to individual haplotypes in a sample, naturally captures many of the properties shared by haplotypes carrying a favored allele. We provide a theoretical framework for computing expected HAF scores under different evolutionary scenarios, and we validate the theoretical predictions with simulations. As an application of HAF score computations, we develop an algorithm (PreCIOSS: Predicting Carriers of Ongoing Selective Sweeps) to identify carriers of the favored allele in selective sweeps, and we demonstrate its power on simulations of both hard and soft sweeps, as well as on data from well-known sweeps in human populations.

No MeSH data available.


Related in: MedlinePlus

The HAF score.Genealogies of three samples (n = 6) progressing through a selective sweep, from left to right. Neutral mutations are shown as red circles, and are numbered in red; the favored allele is shown as a red star. The 1-HAF score of each haplotype is shown below its corresponding leaf, in black. For the rightmost haplotype in (A), the binary haplotype vector h is shown along with its HAF-vector c, and 1-HAF and 2-HAF scores. Vector wall lists the frequencies of all mutations. (A) The favored allele appears on a single haplotype. At this point in time, both the genealogy and the HAF score distribution are largely neutral. Coalescence times (T2, …, T6) are shown on the left, where Tk spans the epoch with exactly k lineages. (B) Carriers of the favored allele are distinguished by high HAF scores (in large part due to the long branch of high-frequency hitchhiking variation); non-carriers have low HAF scores. (C) After fixation, there is a sharp loss of diversity causing low HAF scores across the sample.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4581834&req=5

pgen.1005527.g001: The HAF score.Genealogies of three samples (n = 6) progressing through a selective sweep, from left to right. Neutral mutations are shown as red circles, and are numbered in red; the favored allele is shown as a red star. The 1-HAF score of each haplotype is shown below its corresponding leaf, in black. For the rightmost haplotype in (A), the binary haplotype vector h is shown along with its HAF-vector c, and 1-HAF and 2-HAF scores. Vector wall lists the frequencies of all mutations. (A) The favored allele appears on a single haplotype. At this point in time, both the genealogy and the HAF score distribution are largely neutral. Coalescence times (T2, …, T6) are shown on the left, where Tk spans the epoch with exactly k lineages. (B) Carriers of the favored allele are distinguished by high HAF scores (in large part due to the long branch of high-frequency hitchhiking variation); non-carriers have low HAF scores. (C) After fixation, there is a sharp loss of diversity causing low HAF scores across the sample.

Mentions: Consider a sample of haplotypes in a genomic region. We assume that all sites are biallelic, and at each site, we denote ancestral alleles by 0 and derived alleles by 1. We also assume that all sites are polymorphic in the sample. The HAF vector of a haplotype h, denoted c, is obtained by taking the binary haplotype vector and replacing non-zero entries (derived alleles carried by the haplotype) with their respective frequencies in the sample (Fig 1A). For parameter ℓ, we define the ℓ-HAF score of c as:ℓ-HAF(c)=∑jcjℓ(1)where the sum proceeds over all segregating sites j in the genomic region. The 1-HAF score of a haplotype amounts to the sum of frequencies of all derived alleles carried by the haplotype. The ℓ-HAF score is equivalent to the ℓ-norm of c raised to the ℓth power, or . We will show that during a selective sweep, the HAF score of a haplotype serves as a proxy to its relative fitness.


Predicting Carriers of Ongoing Selective Sweeps without Knowledge of the Favored Allele.

Ronen R, Tesler G, Akbari A, Zakov S, Rosenberg NA, Bafna V - PLoS Genet. (2015)

The HAF score.Genealogies of three samples (n = 6) progressing through a selective sweep, from left to right. Neutral mutations are shown as red circles, and are numbered in red; the favored allele is shown as a red star. The 1-HAF score of each haplotype is shown below its corresponding leaf, in black. For the rightmost haplotype in (A), the binary haplotype vector h is shown along with its HAF-vector c, and 1-HAF and 2-HAF scores. Vector wall lists the frequencies of all mutations. (A) The favored allele appears on a single haplotype. At this point in time, both the genealogy and the HAF score distribution are largely neutral. Coalescence times (T2, …, T6) are shown on the left, where Tk spans the epoch with exactly k lineages. (B) Carriers of the favored allele are distinguished by high HAF scores (in large part due to the long branch of high-frequency hitchhiking variation); non-carriers have low HAF scores. (C) After fixation, there is a sharp loss of diversity causing low HAF scores across the sample.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4581834&req=5

pgen.1005527.g001: The HAF score.Genealogies of three samples (n = 6) progressing through a selective sweep, from left to right. Neutral mutations are shown as red circles, and are numbered in red; the favored allele is shown as a red star. The 1-HAF score of each haplotype is shown below its corresponding leaf, in black. For the rightmost haplotype in (A), the binary haplotype vector h is shown along with its HAF-vector c, and 1-HAF and 2-HAF scores. Vector wall lists the frequencies of all mutations. (A) The favored allele appears on a single haplotype. At this point in time, both the genealogy and the HAF score distribution are largely neutral. Coalescence times (T2, …, T6) are shown on the left, where Tk spans the epoch with exactly k lineages. (B) Carriers of the favored allele are distinguished by high HAF scores (in large part due to the long branch of high-frequency hitchhiking variation); non-carriers have low HAF scores. (C) After fixation, there is a sharp loss of diversity causing low HAF scores across the sample.
Mentions: Consider a sample of haplotypes in a genomic region. We assume that all sites are biallelic, and at each site, we denote ancestral alleles by 0 and derived alleles by 1. We also assume that all sites are polymorphic in the sample. The HAF vector of a haplotype h, denoted c, is obtained by taking the binary haplotype vector and replacing non-zero entries (derived alleles carried by the haplotype) with their respective frequencies in the sample (Fig 1A). For parameter ℓ, we define the ℓ-HAF score of c as:ℓ-HAF(c)=∑jcjℓ(1)where the sum proceeds over all segregating sites j in the genomic region. The 1-HAF score of a haplotype amounts to the sum of frequencies of all derived alleles carried by the haplotype. The ℓ-HAF score is equivalent to the ℓ-norm of c raised to the ℓth power, or . We will show that during a selective sweep, the HAF score of a haplotype serves as a proxy to its relative fitness.

Bottom Line: The HAF score, assigned to individual haplotypes in a sample, naturally captures many of the properties shared by haplotypes carrying a favored allele.We provide a theoretical framework for computing expected HAF scores under different evolutionary scenarios, and we validate the theoretical predictions with simulations.As an application of HAF score computations, we develop an algorithm (PreCIOSS: Predicting Carriers of Ongoing Selective Sweeps) to identify carriers of the favored allele in selective sweeps, and we demonstrate its power on simulations of both hard and soft sweeps, as well as on data from well-known sweeps in human populations.

View Article: PubMed Central - PubMed

Affiliation: Bioinformatics Graduate Program, University of California, San Diego, La Jolla, California, United States of America.

ABSTRACT
Methods for detecting the genomic signatures of natural selection have been heavily studied, and they have been successful in identifying many selective sweeps. For most of these sweeps, the favored allele remains unknown, making it difficult to distinguish carriers of the sweep from non-carriers. In an ongoing selective sweep, carriers of the favored allele are likely to contain a future most recent common ancestor. Therefore, identifying them may prove useful in predicting the evolutionary trajectory--for example, in contexts involving drug-resistant pathogen strains or cancer subclones. The main contribution of this paper is the development and analysis of a new statistic, the Haplotype Allele Frequency (HAF) score. The HAF score, assigned to individual haplotypes in a sample, naturally captures many of the properties shared by haplotypes carrying a favored allele. We provide a theoretical framework for computing expected HAF scores under different evolutionary scenarios, and we validate the theoretical predictions with simulations. As an application of HAF score computations, we develop an algorithm (PreCIOSS: Predicting Carriers of Ongoing Selective Sweeps) to identify carriers of the favored allele in selective sweeps, and we demonstrate its power on simulations of both hard and soft sweeps, as well as on data from well-known sweeps in human populations.

No MeSH data available.


Related in: MedlinePlus