Estimating genome-wide significance for whole-genome sequencing studies.
Bottom Line: Here we propose an empirical approach for estimating genome-wide significance thresholds for data arising from WGS studies, and we demonstrate that the empirical threshold can be efficiently estimated by extrapolating from calculations performed on a small genomic region.Because analysis of WGS may need to be repeated with different choices of test statistics or windows, this prediction approach makes it computationally feasible to estimate genome-wide significance thresholds for different analysis choices.Based on UK10K whole-genome sequence data, we derive genome-wide significance thresholds ranging between 2.5 × 10(-8) and 8 × 10(-8) for our analytic choices in window-based testing, and thresholds of 0.6 × 10(-8) -1.5 × 10(-8) for a combined analytic strategy of testing common variants using single-SNP tests together with rare variants analyzed with our sliding-window test strategy.
Affiliation: Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Canada; Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Canada.Show MeSH
Related in: MedlinePlus
Mentions: In Figure3, the estimated significance thresholds for window-based tests derived from simulations are displayed for three different MAF thresholds, and for three different test statistics for genomic sections increasing in length up to the entire length of chromosome 3. As in Figure2, it is clear that there is a linear relationship between the average estimated significance threshold and the Bonferroni threshold (on the −log10 scale) as the size of the genomic region analyzed increases. However, for a genomic region of a chosen size, there can be quite a lot of variability in the estimated significance thresholds, especially for smaller sizes. Figure3 also shows that significance thresholds for burden and SKAT-O statistics are closer to the Bonferroni correction than the SKAT statistic, and hence that these statistics are less correlated, confirming the result seen visually in Figure1 for the burden test. The effect of the MAF threshold on the conclusion is smaller, but for the SKAT statistic, it can be seen that higher MAF thresholds correspond to more interwindow correlation and hence to larger (less stringent) significance thresholds. This makes sense because there should be more linkage disequilibrium between the variants when a higher MAF threshold is used.
Affiliation: Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Canada; Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Canada.