Limits...
The impact of equilibrium assumptions on tests of selection.

Crisci JL, Poh YP, Mahajan S, Jensen JD - Front Genet (2013)

Bottom Line: Moreover, the performance of these statistics is often not evaluated for different biologically relevant scenarios (e.g., population size change, population structure), nor for the effect of differing data sizes (i.e., genomic vs. sub-genomic).We demonstrate that the LD-based OmegaPlus performs best in terms of power to reject the neutral model under both equilibrium and non-equilibrium conditions-an important result regarding the relative effectiveness of linkage disequilibrium relative to site frequency spectrum based statics.The results presented here ought to serve as a useful guide for future empirical studies, and provides a guide for statistical choice depending on the history of the population under consideration.

View Article: PubMed Central - PubMed

Affiliation: Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School Worcester, MA, USA ; Swiss Institute of Bioinformatics Lausanne, Switzerland.

ABSTRACT
With the increasing availability and quality of whole genome population data, various methodologies of population genetic inference are being utilized in order to identify and quantify recent population-level selective events. Though there has been a great proliferation of such methodology, the type-I and type-II error rates of many proposed statistics have not been well-described. Moreover, the performance of these statistics is often not evaluated for different biologically relevant scenarios (e.g., population size change, population structure), nor for the effect of differing data sizes (i.e., genomic vs. sub-genomic). The absence of the above information makes it difficult to evaluate newly available statistics relative to one another, and thus, difficult to choose the proper toolset for a given empirical analysis. Thus, we here describe and compare the performance of four widely used tests of selection: SweepFinder, SweeD, OmegaPlus, and iHS. In order to consider the above questions, we utilize simulated data spanning a variety of selection coefficients and beneficial mutation rates. We demonstrate that the LD-based OmegaPlus performs best in terms of power to reject the neutral model under both equilibrium and non-equilibrium conditions-an important result regarding the relative effectiveness of linkage disequilibrium relative to site frequency spectrum based statics. The results presented here ought to serve as a useful guide for future empirical studies, and provides a guide for statistical choice depending on the history of the population under consideration. Moreover, the parameter space investigated and the Type-I and Type-II error rates calculated, represent a natural benchmark by which future statistics may be assessed.

No MeSH data available.


Related in: MedlinePlus

Percentage of Sequences that Contain Selective Sweeps. Selective sweep detection using iHS for five model categories: bottlenecks (BN), recurrent hitchihiking (RHH), single hitchhiking (SHH), and joint RHH and SHH bottleneck models. RHH models were simulated with sfcode, and SHH models were simulated with msms. These are the same models that were presented in Tables 1–6, and sequences with various selection and/or bottleneck parameters were pooled under each category. Percentages represent the number of replicates that were correctly identified as selection by iHS in each category. This plot suggests that iHS is more effective at identifying SHH events correctly, but actually many RHH replicates were eliminated due to low SNP density (see text).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3822286&req=5

Figure 2: Percentage of Sequences that Contain Selective Sweeps. Selective sweep detection using iHS for five model categories: bottlenecks (BN), recurrent hitchihiking (RHH), single hitchhiking (SHH), and joint RHH and SHH bottleneck models. RHH models were simulated with sfcode, and SHH models were simulated with msms. These are the same models that were presented in Tables 1–6, and sequences with various selection and/or bottleneck parameters were pooled under each category. Percentages represent the number of replicates that were correctly identified as selection by iHS in each category. This plot suggests that iHS is more effective at identifying SHH events correctly, but actually many RHH replicates were eliminated due to low SNP density (see text).

Mentions: For this reason we used a significance value derived from the entire dataset, following Voight et al. (2006). As they point out, this method can be useful for identifying regions of interest but does not serve as a formal significance test. To define a selective sweep signal we considered the top and bottom 1% of all iHS values as our cutoffs, and then searched for instances where iHS scores greater than or equal to these values occurred consecutively at 2 or more neighboring SNPs. In order to determine if the iHS test statistic is capable of distinguishing between a selective event and a bottleneck, we compared the fraction of sequences that contained a sweep in each model type (Figure 2).


The impact of equilibrium assumptions on tests of selection.

Crisci JL, Poh YP, Mahajan S, Jensen JD - Front Genet (2013)

Percentage of Sequences that Contain Selective Sweeps. Selective sweep detection using iHS for five model categories: bottlenecks (BN), recurrent hitchihiking (RHH), single hitchhiking (SHH), and joint RHH and SHH bottleneck models. RHH models were simulated with sfcode, and SHH models were simulated with msms. These are the same models that were presented in Tables 1–6, and sequences with various selection and/or bottleneck parameters were pooled under each category. Percentages represent the number of replicates that were correctly identified as selection by iHS in each category. This plot suggests that iHS is more effective at identifying SHH events correctly, but actually many RHH replicates were eliminated due to low SNP density (see text).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3822286&req=5

Figure 2: Percentage of Sequences that Contain Selective Sweeps. Selective sweep detection using iHS for five model categories: bottlenecks (BN), recurrent hitchihiking (RHH), single hitchhiking (SHH), and joint RHH and SHH bottleneck models. RHH models were simulated with sfcode, and SHH models were simulated with msms. These are the same models that were presented in Tables 1–6, and sequences with various selection and/or bottleneck parameters were pooled under each category. Percentages represent the number of replicates that were correctly identified as selection by iHS in each category. This plot suggests that iHS is more effective at identifying SHH events correctly, but actually many RHH replicates were eliminated due to low SNP density (see text).
Mentions: For this reason we used a significance value derived from the entire dataset, following Voight et al. (2006). As they point out, this method can be useful for identifying regions of interest but does not serve as a formal significance test. To define a selective sweep signal we considered the top and bottom 1% of all iHS values as our cutoffs, and then searched for instances where iHS scores greater than or equal to these values occurred consecutively at 2 or more neighboring SNPs. In order to determine if the iHS test statistic is capable of distinguishing between a selective event and a bottleneck, we compared the fraction of sequences that contained a sweep in each model type (Figure 2).

Bottom Line: Moreover, the performance of these statistics is often not evaluated for different biologically relevant scenarios (e.g., population size change, population structure), nor for the effect of differing data sizes (i.e., genomic vs. sub-genomic).We demonstrate that the LD-based OmegaPlus performs best in terms of power to reject the neutral model under both equilibrium and non-equilibrium conditions-an important result regarding the relative effectiveness of linkage disequilibrium relative to site frequency spectrum based statics.The results presented here ought to serve as a useful guide for future empirical studies, and provides a guide for statistical choice depending on the history of the population under consideration.

View Article: PubMed Central - PubMed

Affiliation: Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School Worcester, MA, USA ; Swiss Institute of Bioinformatics Lausanne, Switzerland.

ABSTRACT
With the increasing availability and quality of whole genome population data, various methodologies of population genetic inference are being utilized in order to identify and quantify recent population-level selective events. Though there has been a great proliferation of such methodology, the type-I and type-II error rates of many proposed statistics have not been well-described. Moreover, the performance of these statistics is often not evaluated for different biologically relevant scenarios (e.g., population size change, population structure), nor for the effect of differing data sizes (i.e., genomic vs. sub-genomic). The absence of the above information makes it difficult to evaluate newly available statistics relative to one another, and thus, difficult to choose the proper toolset for a given empirical analysis. Thus, we here describe and compare the performance of four widely used tests of selection: SweepFinder, SweeD, OmegaPlus, and iHS. In order to consider the above questions, we utilize simulated data spanning a variety of selection coefficients and beneficial mutation rates. We demonstrate that the LD-based OmegaPlus performs best in terms of power to reject the neutral model under both equilibrium and non-equilibrium conditions-an important result regarding the relative effectiveness of linkage disequilibrium relative to site frequency spectrum based statics. The results presented here ought to serve as a useful guide for future empirical studies, and provides a guide for statistical choice depending on the history of the population under consideration. Moreover, the parameter space investigated and the Type-I and Type-II error rates calculated, represent a natural benchmark by which future statistics may be assessed.

No MeSH data available.


Related in: MedlinePlus