Estimating genome-wide significance for whole-genome sequencing studies.
Bottom Line: Here we propose an empirical approach for estimating genome-wide significance thresholds for data arising from WGS studies, and we demonstrate that the empirical threshold can be efficiently estimated by extrapolating from calculations performed on a small genomic region.Because analysis of WGS may need to be repeated with different choices of test statistics or windows, this prediction approach makes it computationally feasible to estimate genome-wide significance thresholds for different analysis choices.Based on UK10K whole-genome sequence data, we derive genome-wide significance thresholds ranging between 2.5 × 10(-8) and 8 × 10(-8) for our analytic choices in window-based testing, and thresholds of 0.6 × 10(-8) -1.5 × 10(-8) for a combined analytic strategy of testing common variants using single-SNP tests together with rare variants analyzed with our sliding-window test strategy.
Affiliation: Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Canada; Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Canada.Show MeSH
Related in: MedlinePlus
Mentions: The simulation approach also enables us to study the significance thresholds for a combination of window-based tests of rare genetic variation and single-marker tests of common variation. In Figure4, the necessary significance thresholds for controlling FWER at 5% are shown for genomic sections of varying size for this combined strategy. The variability across chromosomal sections of the same size is shown, as well as the linear relationship. All estimates here are well below the line of equality, y = x, demonstrating the well-known effect of linkage disequilibrium between common variants.
Affiliation: Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Canada; Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Canada.