Limits...
Predicting Carriers of Ongoing Selective Sweeps without Knowledge of the Favored Allele.

Ronen R, Tesler G, Akbari A, Zakov S, Rosenberg NA, Bafna V - PLoS Genet. (2015)

Bottom Line: The HAF score, assigned to individual haplotypes in a sample, naturally captures many of the properties shared by haplotypes carrying a favored allele.We provide a theoretical framework for computing expected HAF scores under different evolutionary scenarios, and we validate the theoretical predictions with simulations.As an application of HAF score computations, we develop an algorithm (PreCIOSS: Predicting Carriers of Ongoing Selective Sweeps) to identify carriers of the favored allele in selective sweeps, and we demonstrate its power on simulations of both hard and soft sweeps, as well as on data from well-known sweeps in human populations.

View Article: PubMed Central - PubMed

Affiliation: Bioinformatics Graduate Program, University of California, San Diego, La Jolla, California, United States of America.

ABSTRACT
Methods for detecting the genomic signatures of natural selection have been heavily studied, and they have been successful in identifying many selective sweeps. For most of these sweeps, the favored allele remains unknown, making it difficult to distinguish carriers of the sweep from non-carriers. In an ongoing selective sweep, carriers of the favored allele are likely to contain a future most recent common ancestor. Therefore, identifying them may prove useful in predicting the evolutionary trajectory--for example, in contexts involving drug-resistant pathogen strains or cancer subclones. The main contribution of this paper is the development and analysis of a new statistic, the Haplotype Allele Frequency (HAF) score. The HAF score, assigned to individual haplotypes in a sample, naturally captures many of the properties shared by haplotypes carrying a favored allele. We provide a theoretical framework for computing expected HAF scores under different evolutionary scenarios, and we validate the theoretical predictions with simulations. As an application of HAF score computations, we develop an algorithm (PreCIOSS: Predicting Carriers of Ongoing Selective Sweeps) to identify carriers of the favored allele in selective sweeps, and we demonstrate its power on simulations of both hard and soft sweeps, as well as on data from well-known sweeps in human populations.

No MeSH data available.


Related in: MedlinePlus

Predicting carriers of well-known selective sweeps.(Left): Haplotype 1-HAF scores in a 50 kb window centered at known favored sites indicated by the gene name, and the SNP identifier. (A) LCT/rs4988235, (C) TRPV/rs4987682, (E) PSCA/rs2294008, (G) ADH1B/rs1229984, and (I) EDAR/rs3827760. Points represent haplotype 1-HAF scores, red indicating a carrier of the favored allele and blue indicating a non-carrier. At the top of each panel, the number of haplotypes, n, is shown, with the number of carriers in parenthesis. Areas shaded in gray indicate haplotypes designated as ‘carrier’ by PreCIOSS. (Right) classification Balanced Accuracy (black) and −log2(P) values (blue) as function of window size around the favored allele in (B) LCT, (D) TRPV6, (F) PSCA, (H) ADH1B, and (J) EDAR. P-values are for Wilcoxon rank sum tests rejecting the  hypothesis of identically distributed 1-HAF scores among carriers and non-carriers. Red circles indicate the 50 kb windows shown on the left.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4581834&req=5

pgen.1005527.g007: Predicting carriers of well-known selective sweeps.(Left): Haplotype 1-HAF scores in a 50 kb window centered at known favored sites indicated by the gene name, and the SNP identifier. (A) LCT/rs4988235, (C) TRPV/rs4987682, (E) PSCA/rs2294008, (G) ADH1B/rs1229984, and (I) EDAR/rs3827760. Points represent haplotype 1-HAF scores, red indicating a carrier of the favored allele and blue indicating a non-carrier. At the top of each panel, the number of haplotypes, n, is shown, with the number of carriers in parenthesis. Areas shaded in gray indicate haplotypes designated as ‘carrier’ by PreCIOSS. (Right) classification Balanced Accuracy (black) and −log2(P) values (blue) as function of window size around the favored allele in (B) LCT, (D) TRPV6, (F) PSCA, (H) ADH1B, and (J) EDAR. P-values are for Wilcoxon rank sum tests rejecting the hypothesis of identically distributed 1-HAF scores among carriers and non-carriers. Red circles indicate the 50 kb windows shown on the left.

Mentions: Our results suggest that for cases of recent adaptation (e.g., lactase adaptation, shown in Fig 7A, which happened between 2 kya and 20 kya and rapidly spread to high frequencies in the European population), PreCIOSS would show good performance in separating the carriers and non-carriers.


Predicting Carriers of Ongoing Selective Sweeps without Knowledge of the Favored Allele.

Ronen R, Tesler G, Akbari A, Zakov S, Rosenberg NA, Bafna V - PLoS Genet. (2015)

Predicting carriers of well-known selective sweeps.(Left): Haplotype 1-HAF scores in a 50 kb window centered at known favored sites indicated by the gene name, and the SNP identifier. (A) LCT/rs4988235, (C) TRPV/rs4987682, (E) PSCA/rs2294008, (G) ADH1B/rs1229984, and (I) EDAR/rs3827760. Points represent haplotype 1-HAF scores, red indicating a carrier of the favored allele and blue indicating a non-carrier. At the top of each panel, the number of haplotypes, n, is shown, with the number of carriers in parenthesis. Areas shaded in gray indicate haplotypes designated as ‘carrier’ by PreCIOSS. (Right) classification Balanced Accuracy (black) and −log2(P) values (blue) as function of window size around the favored allele in (B) LCT, (D) TRPV6, (F) PSCA, (H) ADH1B, and (J) EDAR. P-values are for Wilcoxon rank sum tests rejecting the  hypothesis of identically distributed 1-HAF scores among carriers and non-carriers. Red circles indicate the 50 kb windows shown on the left.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4581834&req=5

pgen.1005527.g007: Predicting carriers of well-known selective sweeps.(Left): Haplotype 1-HAF scores in a 50 kb window centered at known favored sites indicated by the gene name, and the SNP identifier. (A) LCT/rs4988235, (C) TRPV/rs4987682, (E) PSCA/rs2294008, (G) ADH1B/rs1229984, and (I) EDAR/rs3827760. Points represent haplotype 1-HAF scores, red indicating a carrier of the favored allele and blue indicating a non-carrier. At the top of each panel, the number of haplotypes, n, is shown, with the number of carriers in parenthesis. Areas shaded in gray indicate haplotypes designated as ‘carrier’ by PreCIOSS. (Right) classification Balanced Accuracy (black) and −log2(P) values (blue) as function of window size around the favored allele in (B) LCT, (D) TRPV6, (F) PSCA, (H) ADH1B, and (J) EDAR. P-values are for Wilcoxon rank sum tests rejecting the hypothesis of identically distributed 1-HAF scores among carriers and non-carriers. Red circles indicate the 50 kb windows shown on the left.
Mentions: Our results suggest that for cases of recent adaptation (e.g., lactase adaptation, shown in Fig 7A, which happened between 2 kya and 20 kya and rapidly spread to high frequencies in the European population), PreCIOSS would show good performance in separating the carriers and non-carriers.

Bottom Line: The HAF score, assigned to individual haplotypes in a sample, naturally captures many of the properties shared by haplotypes carrying a favored allele.We provide a theoretical framework for computing expected HAF scores under different evolutionary scenarios, and we validate the theoretical predictions with simulations.As an application of HAF score computations, we develop an algorithm (PreCIOSS: Predicting Carriers of Ongoing Selective Sweeps) to identify carriers of the favored allele in selective sweeps, and we demonstrate its power on simulations of both hard and soft sweeps, as well as on data from well-known sweeps in human populations.

View Article: PubMed Central - PubMed

Affiliation: Bioinformatics Graduate Program, University of California, San Diego, La Jolla, California, United States of America.

ABSTRACT
Methods for detecting the genomic signatures of natural selection have been heavily studied, and they have been successful in identifying many selective sweeps. For most of these sweeps, the favored allele remains unknown, making it difficult to distinguish carriers of the sweep from non-carriers. In an ongoing selective sweep, carriers of the favored allele are likely to contain a future most recent common ancestor. Therefore, identifying them may prove useful in predicting the evolutionary trajectory--for example, in contexts involving drug-resistant pathogen strains or cancer subclones. The main contribution of this paper is the development and analysis of a new statistic, the Haplotype Allele Frequency (HAF) score. The HAF score, assigned to individual haplotypes in a sample, naturally captures many of the properties shared by haplotypes carrying a favored allele. We provide a theoretical framework for computing expected HAF scores under different evolutionary scenarios, and we validate the theoretical predictions with simulations. As an application of HAF score computations, we develop an algorithm (PreCIOSS: Predicting Carriers of Ongoing Selective Sweeps) to identify carriers of the favored allele in selective sweeps, and we demonstrate its power on simulations of both hard and soft sweeps, as well as on data from well-known sweeps in human populations.

No MeSH data available.


Related in: MedlinePlus