Limits...
Estimation of significance thresholds for genomewide association scans.

Dudbridge F, Gusnanto A - Genet. Epidemiol. (2008)

Bottom Line: To reduce the computation time, we considered Patterson's eigenvalue estimator of the effective number of tests, but found it to be an order of magnitude too low for multiplicity correction.However, by fitting a Beta distribution to the minimum P-value from permutation replicates, we showed that the effective number is a useful heuristic and suggest that its estimation in this context is an open problem.We conclude that permutation is still needed to obtain genomewide significance thresholds, but with subsampling, extrapolation and estimation of an effective number of tests, the threshold can be standardized for all studies of the same population.

View Article: PubMed Central - PubMed

Affiliation: MRC Biostatistics Unit, Institute for Public Health, Cambridge, United Kingdom. frank.dudbridge@mrc-bsu.cam.ac.uk

ABSTRACT
The question of what significance threshold is appropriate for genomewide association studies is somewhat unresolved. Previous theoretical suggestions have yet to be validated in practice, whereas permutation testing does not resolve a discrepancy between the genomewide multiplicity of the experiment and the subset of markers actually tested. We used genotypes from the Wellcome Trust Case-Control Consortium to estimate a genomewide significance threshold for the UK Caucasian population. We subsampled the genotypes at increasing densities, using permutation to estimate the nominal P-value for 5% family-wise error. By extrapolating to infinite density, we estimated the genomewide significance threshold to be about 7.2 x 10(-8). To reduce the computation time, we considered Patterson's eigenvalue estimator of the effective number of tests, but found it to be an order of magnitude too low for multiplicity correction. However, by fitting a Beta distribution to the minimum P-value from permutation replicates, we showed that the effective number is a useful heuristic and suggest that its estimation in this context is an open problem. We conclude that permutation is still needed to obtain genomewide significance thresholds, but with subsampling, extrapolation and estimation of an effective number of tests, the threshold can be standardized for all studies of the same population.

Show MeSH
(a) Significance thresholds from permutation procedure and Patterson's estimate of the effective number of tests. At current marker density, the estimates differ by an order of magnitude. (b) The effective numbers of tests based on the permutation procedure and Patterson's estimator. At current marker density, Patterson's estimate is too low (33,279) compared to that of the permutation procedure (227,838).
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2573032&req=5

fig02: (a) Significance thresholds from permutation procedure and Patterson's estimate of the effective number of tests. At current marker density, the estimates differ by an order of magnitude. (b) The effective numbers of tests based on the permutation procedure and Patterson's estimator. At current marker density, Patterson's estimate is too low (33,279) compared to that of the permutation procedure (227,838).

Mentions: For Patterson's estimator, Figure 2 shows the 5% family–wise error threshold and effective number of tests compared to the permutation procedure, over a uniform grid of 20 marker densities. There is clearly a wide discrepancy and at the current marker density the estimate is an order of magnitude too low: 33,279 compared to 227,838 for the permutation scheme, even though the latter allowed for correlation between chromosomes. The use of P–value thresholds based on this estimator will therefore inflate the false–positive rate. This result is not entirely surprising, as we have previously noted that the effective number of tests, if it exists, is a function of both the significance threshold and also of the type of analysis [Dudbridge and Koeleman, 2004]. Thus, it is not unexpected that an estimator that works well for analysis of population structure is not accurate for Bonferroni corrections.


Estimation of significance thresholds for genomewide association scans.

Dudbridge F, Gusnanto A - Genet. Epidemiol. (2008)

(a) Significance thresholds from permutation procedure and Patterson's estimate of the effective number of tests. At current marker density, the estimates differ by an order of magnitude. (b) The effective numbers of tests based on the permutation procedure and Patterson's estimator. At current marker density, Patterson's estimate is too low (33,279) compared to that of the permutation procedure (227,838).
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2573032&req=5

fig02: (a) Significance thresholds from permutation procedure and Patterson's estimate of the effective number of tests. At current marker density, the estimates differ by an order of magnitude. (b) The effective numbers of tests based on the permutation procedure and Patterson's estimator. At current marker density, Patterson's estimate is too low (33,279) compared to that of the permutation procedure (227,838).
Mentions: For Patterson's estimator, Figure 2 shows the 5% family–wise error threshold and effective number of tests compared to the permutation procedure, over a uniform grid of 20 marker densities. There is clearly a wide discrepancy and at the current marker density the estimate is an order of magnitude too low: 33,279 compared to 227,838 for the permutation scheme, even though the latter allowed for correlation between chromosomes. The use of P–value thresholds based on this estimator will therefore inflate the false–positive rate. This result is not entirely surprising, as we have previously noted that the effective number of tests, if it exists, is a function of both the significance threshold and also of the type of analysis [Dudbridge and Koeleman, 2004]. Thus, it is not unexpected that an estimator that works well for analysis of population structure is not accurate for Bonferroni corrections.

Bottom Line: To reduce the computation time, we considered Patterson's eigenvalue estimator of the effective number of tests, but found it to be an order of magnitude too low for multiplicity correction.However, by fitting a Beta distribution to the minimum P-value from permutation replicates, we showed that the effective number is a useful heuristic and suggest that its estimation in this context is an open problem.We conclude that permutation is still needed to obtain genomewide significance thresholds, but with subsampling, extrapolation and estimation of an effective number of tests, the threshold can be standardized for all studies of the same population.

View Article: PubMed Central - PubMed

Affiliation: MRC Biostatistics Unit, Institute for Public Health, Cambridge, United Kingdom. frank.dudbridge@mrc-bsu.cam.ac.uk

ABSTRACT
The question of what significance threshold is appropriate for genomewide association studies is somewhat unresolved. Previous theoretical suggestions have yet to be validated in practice, whereas permutation testing does not resolve a discrepancy between the genomewide multiplicity of the experiment and the subset of markers actually tested. We used genotypes from the Wellcome Trust Case-Control Consortium to estimate a genomewide significance threshold for the UK Caucasian population. We subsampled the genotypes at increasing densities, using permutation to estimate the nominal P-value for 5% family-wise error. By extrapolating to infinite density, we estimated the genomewide significance threshold to be about 7.2 x 10(-8). To reduce the computation time, we considered Patterson's eigenvalue estimator of the effective number of tests, but found it to be an order of magnitude too low for multiplicity correction. However, by fitting a Beta distribution to the minimum P-value from permutation replicates, we showed that the effective number is a useful heuristic and suggest that its estimation in this context is an open problem. We conclude that permutation is still needed to obtain genomewide significance thresholds, but with subsampling, extrapolation and estimation of an effective number of tests, the threshold can be standardized for all studies of the same population.

Show MeSH