Limits...
Estimation of significance thresholds for genomewide association scans.

Dudbridge F, Gusnanto A - Genet. Epidemiol. (2008)

Bottom Line: To reduce the computation time, we considered Patterson's eigenvalue estimator of the effective number of tests, but found it to be an order of magnitude too low for multiplicity correction.However, by fitting a Beta distribution to the minimum P-value from permutation replicates, we showed that the effective number is a useful heuristic and suggest that its estimation in this context is an open problem.We conclude that permutation is still needed to obtain genomewide significance thresholds, but with subsampling, extrapolation and estimation of an effective number of tests, the threshold can be standardized for all studies of the same population.

View Article: PubMed Central - PubMed

Affiliation: MRC Biostatistics Unit, Institute for Public Health, Cambridge, United Kingdom. frank.dudbridge@mrc-bsu.cam.ac.uk

ABSTRACT
The question of what significance threshold is appropriate for genomewide association studies is somewhat unresolved. Previous theoretical suggestions have yet to be validated in practice, whereas permutation testing does not resolve a discrepancy between the genomewide multiplicity of the experiment and the subset of markers actually tested. We used genotypes from the Wellcome Trust Case-Control Consortium to estimate a genomewide significance threshold for the UK Caucasian population. We subsampled the genotypes at increasing densities, using permutation to estimate the nominal P-value for 5% family-wise error. By extrapolating to infinite density, we estimated the genomewide significance threshold to be about 7.2 x 10(-8). To reduce the computation time, we considered Patterson's eigenvalue estimator of the effective number of tests, but found it to be an order of magnitude too low for multiplicity correction. However, by fitting a Beta distribution to the minimum P-value from permutation replicates, we showed that the effective number is a useful heuristic and suggest that its estimation in this context is an open problem. We conclude that permutation is still needed to obtain genomewide significance thresholds, but with subsampling, extrapolation and estimation of an effective number of tests, the threshold can be standardized for all studies of the same population.

Show MeSH
Quantile–xsquantile plot comparing fitted Beta distributions with minimum P-values from permutation replicates.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2573032&req=5

fig03: Quantile–xsquantile plot comparing fitted Beta distributions with minimum P-values from permutation replicates.

Mentions: However, the fitted Beta distributions do suggest that an effective number of tests exists and could be useful. Figure 3 compares the empirical distribution of the minimum P–value for the combined samples, to the fitted Beta (1,nE) and Beta (a,b) distributions. Both Beta distributions are clearly a good fit to the observed data. The maximum likelihood estimate = 0.97 is close to 1; the hypothesis of equality was formally rejected (P = 0.01), but this is not surprising given our high power to reject strict equality, and the test was not significant in the separate NBS and 58BC samples. This is in line with our results on an early version of HapMap [Dudbridge and Koeleman, 2004], in which the test of equality was extremely significant, suggesting that the effective number of tests is a worse fit at higher marker densities. The effective numbers of tests were similar to those estimated from the permutation procedure for both cohorts (Table II).


Estimation of significance thresholds for genomewide association scans.

Dudbridge F, Gusnanto A - Genet. Epidemiol. (2008)

Quantile–xsquantile plot comparing fitted Beta distributions with minimum P-values from permutation replicates.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2573032&req=5

fig03: Quantile–xsquantile plot comparing fitted Beta distributions with minimum P-values from permutation replicates.
Mentions: However, the fitted Beta distributions do suggest that an effective number of tests exists and could be useful. Figure 3 compares the empirical distribution of the minimum P–value for the combined samples, to the fitted Beta (1,nE) and Beta (a,b) distributions. Both Beta distributions are clearly a good fit to the observed data. The maximum likelihood estimate = 0.97 is close to 1; the hypothesis of equality was formally rejected (P = 0.01), but this is not surprising given our high power to reject strict equality, and the test was not significant in the separate NBS and 58BC samples. This is in line with our results on an early version of HapMap [Dudbridge and Koeleman, 2004], in which the test of equality was extremely significant, suggesting that the effective number of tests is a worse fit at higher marker densities. The effective numbers of tests were similar to those estimated from the permutation procedure for both cohorts (Table II).

Bottom Line: To reduce the computation time, we considered Patterson's eigenvalue estimator of the effective number of tests, but found it to be an order of magnitude too low for multiplicity correction.However, by fitting a Beta distribution to the minimum P-value from permutation replicates, we showed that the effective number is a useful heuristic and suggest that its estimation in this context is an open problem.We conclude that permutation is still needed to obtain genomewide significance thresholds, but with subsampling, extrapolation and estimation of an effective number of tests, the threshold can be standardized for all studies of the same population.

View Article: PubMed Central - PubMed

Affiliation: MRC Biostatistics Unit, Institute for Public Health, Cambridge, United Kingdom. frank.dudbridge@mrc-bsu.cam.ac.uk

ABSTRACT
The question of what significance threshold is appropriate for genomewide association studies is somewhat unresolved. Previous theoretical suggestions have yet to be validated in practice, whereas permutation testing does not resolve a discrepancy between the genomewide multiplicity of the experiment and the subset of markers actually tested. We used genotypes from the Wellcome Trust Case-Control Consortium to estimate a genomewide significance threshold for the UK Caucasian population. We subsampled the genotypes at increasing densities, using permutation to estimate the nominal P-value for 5% family-wise error. By extrapolating to infinite density, we estimated the genomewide significance threshold to be about 7.2 x 10(-8). To reduce the computation time, we considered Patterson's eigenvalue estimator of the effective number of tests, but found it to be an order of magnitude too low for multiplicity correction. However, by fitting a Beta distribution to the minimum P-value from permutation replicates, we showed that the effective number is a useful heuristic and suggest that its estimation in this context is an open problem. We conclude that permutation is still needed to obtain genomewide significance thresholds, but with subsampling, extrapolation and estimation of an effective number of tests, the threshold can be standardized for all studies of the same population.

Show MeSH