Limits...
Estimating genome-wide significance for whole-genome sequencing studies.

Xu C, Tachmazidou I, Walter K, Ciampi A, Zeggini E, Greenwood CM, UK10K Consorti - Genet. Epidemiol. (2014)

Bottom Line: Here we propose an empirical approach for estimating genome-wide significance thresholds for data arising from WGS studies, and we demonstrate that the empirical threshold can be efficiently estimated by extrapolating from calculations performed on a small genomic region.Because analysis of WGS may need to be repeated with different choices of test statistics or windows, this prediction approach makes it computationally feasible to estimate genome-wide significance thresholds for different analysis choices.Based on UK10K whole-genome sequence data, we derive genome-wide significance thresholds ranging between 2.5 × 10(-8) and 8 × 10(-8) for our analytic choices in window-based testing, and thresholds of 0.6 × 10(-8) -1.5 × 10(-8) for a combined analytic strategy of testing common variants using single-SNP tests together with rare variants analyzed with our sliding-window test strategy.

View Article: PubMed Central - PubMed

Affiliation: Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Canada; Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Canada.

Show MeSH

Related in: MedlinePlus

Estimates of genome-wide significance thresholds for window-based tests of rare variants, derived from simulations, for three MAF thresholds and three test statistics. The horizontal axis is −log10(0.05/m), for m tests on chromosome 3. Each point is the mean of −log10 of the estimated FWER at 5% for disjoint sections of chromosome 3 of the same size, and ±1.96*(SD) at each point. A linear regression was fitted to the points in each panel, and the gray line is the line of equality, y = x.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4489336&req=5

fig03: Estimates of genome-wide significance thresholds for window-based tests of rare variants, derived from simulations, for three MAF thresholds and three test statistics. The horizontal axis is −log10(0.05/m), for m tests on chromosome 3. Each point is the mean of −log10 of the estimated FWER at 5% for disjoint sections of chromosome 3 of the same size, and ±1.96*(SD) at each point. A linear regression was fitted to the points in each panel, and the gray line is the line of equality, y = x.

Mentions: In Figure3, the estimated significance thresholds for window-based tests derived from simulations are displayed for three different MAF thresholds, and for three different test statistics for genomic sections increasing in length up to the entire length of chromosome 3. As in Figure2, it is clear that there is a linear relationship between the average estimated significance threshold and the Bonferroni threshold (on the −log10 scale) as the size of the genomic region analyzed increases. However, for a genomic region of a chosen size, there can be quite a lot of variability in the estimated significance thresholds, especially for smaller sizes. Figure3 also shows that significance thresholds for burden and SKAT-O statistics are closer to the Bonferroni correction than the SKAT statistic, and hence that these statistics are less correlated, confirming the result seen visually in Figure1 for the burden test. The effect of the MAF threshold on the conclusion is smaller, but for the SKAT statistic, it can be seen that higher MAF thresholds correspond to more interwindow correlation and hence to larger (less stringent) significance thresholds. This makes sense because there should be more linkage disequilibrium between the variants when a higher MAF threshold is used.


Estimating genome-wide significance for whole-genome sequencing studies.

Xu C, Tachmazidou I, Walter K, Ciampi A, Zeggini E, Greenwood CM, UK10K Consorti - Genet. Epidemiol. (2014)

Estimates of genome-wide significance thresholds for window-based tests of rare variants, derived from simulations, for three MAF thresholds and three test statistics. The horizontal axis is −log10(0.05/m), for m tests on chromosome 3. Each point is the mean of −log10 of the estimated FWER at 5% for disjoint sections of chromosome 3 of the same size, and ±1.96*(SD) at each point. A linear regression was fitted to the points in each panel, and the gray line is the line of equality, y = x.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4489336&req=5

fig03: Estimates of genome-wide significance thresholds for window-based tests of rare variants, derived from simulations, for three MAF thresholds and three test statistics. The horizontal axis is −log10(0.05/m), for m tests on chromosome 3. Each point is the mean of −log10 of the estimated FWER at 5% for disjoint sections of chromosome 3 of the same size, and ±1.96*(SD) at each point. A linear regression was fitted to the points in each panel, and the gray line is the line of equality, y = x.
Mentions: In Figure3, the estimated significance thresholds for window-based tests derived from simulations are displayed for three different MAF thresholds, and for three different test statistics for genomic sections increasing in length up to the entire length of chromosome 3. As in Figure2, it is clear that there is a linear relationship between the average estimated significance threshold and the Bonferroni threshold (on the −log10 scale) as the size of the genomic region analyzed increases. However, for a genomic region of a chosen size, there can be quite a lot of variability in the estimated significance thresholds, especially for smaller sizes. Figure3 also shows that significance thresholds for burden and SKAT-O statistics are closer to the Bonferroni correction than the SKAT statistic, and hence that these statistics are less correlated, confirming the result seen visually in Figure1 for the burden test. The effect of the MAF threshold on the conclusion is smaller, but for the SKAT statistic, it can be seen that higher MAF thresholds correspond to more interwindow correlation and hence to larger (less stringent) significance thresholds. This makes sense because there should be more linkage disequilibrium between the variants when a higher MAF threshold is used.

Bottom Line: Here we propose an empirical approach for estimating genome-wide significance thresholds for data arising from WGS studies, and we demonstrate that the empirical threshold can be efficiently estimated by extrapolating from calculations performed on a small genomic region.Because analysis of WGS may need to be repeated with different choices of test statistics or windows, this prediction approach makes it computationally feasible to estimate genome-wide significance thresholds for different analysis choices.Based on UK10K whole-genome sequence data, we derive genome-wide significance thresholds ranging between 2.5 × 10(-8) and 8 × 10(-8) for our analytic choices in window-based testing, and thresholds of 0.6 × 10(-8) -1.5 × 10(-8) for a combined analytic strategy of testing common variants using single-SNP tests together with rare variants analyzed with our sliding-window test strategy.

View Article: PubMed Central - PubMed

Affiliation: Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Canada; Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Canada.

Show MeSH
Related in: MedlinePlus