Limits...
Estimation of significance thresholds for genomewide association scans.

Dudbridge F, Gusnanto A - Genet. Epidemiol. (2008)

Bottom Line: To reduce the computation time, we considered Patterson's eigenvalue estimator of the effective number of tests, but found it to be an order of magnitude too low for multiplicity correction.However, by fitting a Beta distribution to the minimum P-value from permutation replicates, we showed that the effective number is a useful heuristic and suggest that its estimation in this context is an open problem.We conclude that permutation is still needed to obtain genomewide significance thresholds, but with subsampling, extrapolation and estimation of an effective number of tests, the threshold can be standardized for all studies of the same population.

View Article: PubMed Central - PubMed

Affiliation: MRC Biostatistics Unit, Institute for Public Health, Cambridge, United Kingdom. frank.dudbridge@mrc-bsu.cam.ac.uk

ABSTRACT
The question of what significance threshold is appropriate for genomewide association studies is somewhat unresolved. Previous theoretical suggestions have yet to be validated in practice, whereas permutation testing does not resolve a discrepancy between the genomewide multiplicity of the experiment and the subset of markers actually tested. We used genotypes from the Wellcome Trust Case-Control Consortium to estimate a genomewide significance threshold for the UK Caucasian population. We subsampled the genotypes at increasing densities, using permutation to estimate the nominal P-value for 5% family-wise error. By extrapolating to infinite density, we estimated the genomewide significance threshold to be about 7.2 x 10(-8). To reduce the computation time, we considered Patterson's eigenvalue estimator of the effective number of tests, but found it to be an order of magnitude too low for multiplicity correction. However, by fitting a Beta distribution to the minimum P-value from permutation replicates, we showed that the effective number is a useful heuristic and suggest that its estimation in this context is an open problem. We conclude that permutation is still needed to obtain genomewide significance thresholds, but with subsampling, extrapolation and estimation of an effective number of tests, the threshold can be standardized for all studies of the same population.

Show MeSH
(a) Significance threshold as a function of marker density in combined NBS and 58BC sample from permutation procedure. At current density (359K single nucleotide polymorphisms typed) the significance threshold is about 2.2 × 10−7. The dotted line shows the estimated asymptote of 7.2 × 10−8. (b) Fitted Monod function to the effective number of tests associated with the significance threshold. At infinite density the number of tests is estimated at 693,138 giving the asymptote in (a).
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2573032&req=5

fig01: (a) Significance threshold as a function of marker density in combined NBS and 58BC sample from permutation procedure. At current density (359K single nucleotide polymorphisms typed) the significance threshold is about 2.2 × 10−7. The dotted line shows the estimated asymptote of 7.2 × 10−8. (b) Fitted Monod function to the effective number of tests associated with the significance threshold. At infinite density the number of tests is estimated at 693,138 giving the asymptote in (a).

Mentions: For the permutation procedure, Table I gives the estimated asymptote , half–saturation parameter and genomewide significance threshold for the NBS and 58BC samples separately and combined. It is clear that the estimates are similar for the separate cohorts, so they may be combined to give greater precision. Figure 1(a) shows the threshold for 5% family–wise error plotted as a function of marker density for the combined samples. Figure 1(b) shows the corresponding effective numbers of tests together with the fitted Monod function. The curve is a good fit, but it is clear that it is not at its asymptote at the current density, although some curvature is apparent when compared with the linear regression line. The estimated asymptote was = 651,550, which increases to 693,138 assuming that the autosomes comprise 94% of the total genome length. This gives our estimated genomewide significance threshold as(11)with 95% bootstrap confidence interval (6.3–8.9) × 10−8.


Estimation of significance thresholds for genomewide association scans.

Dudbridge F, Gusnanto A - Genet. Epidemiol. (2008)

(a) Significance threshold as a function of marker density in combined NBS and 58BC sample from permutation procedure. At current density (359K single nucleotide polymorphisms typed) the significance threshold is about 2.2 × 10−7. The dotted line shows the estimated asymptote of 7.2 × 10−8. (b) Fitted Monod function to the effective number of tests associated with the significance threshold. At infinite density the number of tests is estimated at 693,138 giving the asymptote in (a).
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2573032&req=5

fig01: (a) Significance threshold as a function of marker density in combined NBS and 58BC sample from permutation procedure. At current density (359K single nucleotide polymorphisms typed) the significance threshold is about 2.2 × 10−7. The dotted line shows the estimated asymptote of 7.2 × 10−8. (b) Fitted Monod function to the effective number of tests associated with the significance threshold. At infinite density the number of tests is estimated at 693,138 giving the asymptote in (a).
Mentions: For the permutation procedure, Table I gives the estimated asymptote , half–saturation parameter and genomewide significance threshold for the NBS and 58BC samples separately and combined. It is clear that the estimates are similar for the separate cohorts, so they may be combined to give greater precision. Figure 1(a) shows the threshold for 5% family–wise error plotted as a function of marker density for the combined samples. Figure 1(b) shows the corresponding effective numbers of tests together with the fitted Monod function. The curve is a good fit, but it is clear that it is not at its asymptote at the current density, although some curvature is apparent when compared with the linear regression line. The estimated asymptote was = 651,550, which increases to 693,138 assuming that the autosomes comprise 94% of the total genome length. This gives our estimated genomewide significance threshold as(11)with 95% bootstrap confidence interval (6.3–8.9) × 10−8.

Bottom Line: To reduce the computation time, we considered Patterson's eigenvalue estimator of the effective number of tests, but found it to be an order of magnitude too low for multiplicity correction.However, by fitting a Beta distribution to the minimum P-value from permutation replicates, we showed that the effective number is a useful heuristic and suggest that its estimation in this context is an open problem.We conclude that permutation is still needed to obtain genomewide significance thresholds, but with subsampling, extrapolation and estimation of an effective number of tests, the threshold can be standardized for all studies of the same population.

View Article: PubMed Central - PubMed

Affiliation: MRC Biostatistics Unit, Institute for Public Health, Cambridge, United Kingdom. frank.dudbridge@mrc-bsu.cam.ac.uk

ABSTRACT
The question of what significance threshold is appropriate for genomewide association studies is somewhat unresolved. Previous theoretical suggestions have yet to be validated in practice, whereas permutation testing does not resolve a discrepancy between the genomewide multiplicity of the experiment and the subset of markers actually tested. We used genotypes from the Wellcome Trust Case-Control Consortium to estimate a genomewide significance threshold for the UK Caucasian population. We subsampled the genotypes at increasing densities, using permutation to estimate the nominal P-value for 5% family-wise error. By extrapolating to infinite density, we estimated the genomewide significance threshold to be about 7.2 x 10(-8). To reduce the computation time, we considered Patterson's eigenvalue estimator of the effective number of tests, but found it to be an order of magnitude too low for multiplicity correction. However, by fitting a Beta distribution to the minimum P-value from permutation replicates, we showed that the effective number is a useful heuristic and suggest that its estimation in this context is an open problem. We conclude that permutation is still needed to obtain genomewide significance thresholds, but with subsampling, extrapolation and estimation of an effective number of tests, the threshold can be standardized for all studies of the same population.

Show MeSH