Limits...
Rapid and accurate multiple testing correction and power estimation for millions of correlated markers.

Han B, Kang HM, Eskin E - PLoS Genet. (2009)

Bottom Line: Our method accounts for all correlation within a sliding window and corrects for the departure of the true distribution of the statistic from the asymptotic distribution.In simulations using the Wellcome Trust Case Control Consortium data, the error rate of SLIDE's corrected p-values is more than 20 times smaller than the error rate of the previous MVN-based methods' corrected p-values, while SLIDE is orders of magnitude faster than the permutation test and other competing methods.We also extend the MVN framework to the problem of estimating the statistical power of an association study with correlated markers and propose an efficient and accurate power estimation method SLIP.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science and Engineering, University of California San Diego, La Jolla, California, USA.

ABSTRACT
With the development of high-throughput sequencing and genotyping technologies, the number of markers collected in genetic association studies is growing rapidly, increasing the importance of methods for correcting for multiple hypothesis testing. The permutation test is widely considered the gold standard for accurate multiple testing correction, but it is often computationally impractical for these large datasets. Recently, several studies proposed efficient alternative approaches to the permutation test based on the multivariate normal distribution (MVN). However, they cannot accurately correct for multiple testing in genome-wide association studies for two reasons. First, these methods require partitioning of the genome into many disjoint blocks and ignore all correlations between markers from different blocks. Second, the true distribution of the test statistic often fails to follow the asymptotic distribution at the tails of the distribution. We propose an accurate and efficient method for multiple testing correction in genome-wide association studies--SLIDE. Our method accounts for all correlation within a sliding window and corrects for the departure of the true distribution of the statistic from the asymptotic distribution. In simulations using the Wellcome Trust Case Control Consortium data, the error rate of SLIDE's corrected p-values is more than 20 times smaller than the error rate of the previous MVN-based methods' corrected p-values, while SLIDE is orders of magnitude faster than the permutation test and other competing methods. We also extend the MVN framework to the problem of estimating the statistical power of an association study with correlated markers and propose an efficient and accurate power estimation method SLIP. SLIP and SLIDE are available at http://slide.cs.ucla.edu.

Show MeSH
Probability density function of a bivariate MVN at two markers.The area outside the rectangle is the critical region. (A) Under the  hypothesis, the MVN is centered at zero. The outside-rectangle probability is the corrected p-value (or the significance level). (B) Under the alternative hypothesis, the MVN is shifted by the non-centrality parameter. The outside-rectangle probability is power.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2663787&req=5

pgen-1000456-g002: Probability density function of a bivariate MVN at two markers.The area outside the rectangle is the critical region. (A) Under the hypothesis, the MVN is centered at zero. The outside-rectangle probability is the corrected p-value (or the significance level). (B) Under the alternative hypothesis, the MVN is shifted by the non-centrality parameter. The outside-rectangle probability is power.

Mentions: Let be the covariance matrix between markers. By the multivariate central limit theorem [33], if is large, the vector of statistics asymptotically follows a MVN with mean zero and variance . Given a pointwise p-value , let be the rectangle with corners and where is the cumulative density function (c.d.f.) of the standard normal distribution and is the vector of ones. The corrected p-value is approximated as the outside-rectangle probability,(3)as shown in Figure 2A. Similarly, given a significance threshold , the per-marker threshold is approximated by searching for a pointwise p-value whose corrected p-value is .


Rapid and accurate multiple testing correction and power estimation for millions of correlated markers.

Han B, Kang HM, Eskin E - PLoS Genet. (2009)

Probability density function of a bivariate MVN at two markers.The area outside the rectangle is the critical region. (A) Under the  hypothesis, the MVN is centered at zero. The outside-rectangle probability is the corrected p-value (or the significance level). (B) Under the alternative hypothesis, the MVN is shifted by the non-centrality parameter. The outside-rectangle probability is power.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2663787&req=5

pgen-1000456-g002: Probability density function of a bivariate MVN at two markers.The area outside the rectangle is the critical region. (A) Under the hypothesis, the MVN is centered at zero. The outside-rectangle probability is the corrected p-value (or the significance level). (B) Under the alternative hypothesis, the MVN is shifted by the non-centrality parameter. The outside-rectangle probability is power.
Mentions: Let be the covariance matrix between markers. By the multivariate central limit theorem [33], if is large, the vector of statistics asymptotically follows a MVN with mean zero and variance . Given a pointwise p-value , let be the rectangle with corners and where is the cumulative density function (c.d.f.) of the standard normal distribution and is the vector of ones. The corrected p-value is approximated as the outside-rectangle probability,(3)as shown in Figure 2A. Similarly, given a significance threshold , the per-marker threshold is approximated by searching for a pointwise p-value whose corrected p-value is .

Bottom Line: Our method accounts for all correlation within a sliding window and corrects for the departure of the true distribution of the statistic from the asymptotic distribution.In simulations using the Wellcome Trust Case Control Consortium data, the error rate of SLIDE's corrected p-values is more than 20 times smaller than the error rate of the previous MVN-based methods' corrected p-values, while SLIDE is orders of magnitude faster than the permutation test and other competing methods.We also extend the MVN framework to the problem of estimating the statistical power of an association study with correlated markers and propose an efficient and accurate power estimation method SLIP.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science and Engineering, University of California San Diego, La Jolla, California, USA.

ABSTRACT
With the development of high-throughput sequencing and genotyping technologies, the number of markers collected in genetic association studies is growing rapidly, increasing the importance of methods for correcting for multiple hypothesis testing. The permutation test is widely considered the gold standard for accurate multiple testing correction, but it is often computationally impractical for these large datasets. Recently, several studies proposed efficient alternative approaches to the permutation test based on the multivariate normal distribution (MVN). However, they cannot accurately correct for multiple testing in genome-wide association studies for two reasons. First, these methods require partitioning of the genome into many disjoint blocks and ignore all correlations between markers from different blocks. Second, the true distribution of the test statistic often fails to follow the asymptotic distribution at the tails of the distribution. We propose an accurate and efficient method for multiple testing correction in genome-wide association studies--SLIDE. Our method accounts for all correlation within a sliding window and corrects for the departure of the true distribution of the statistic from the asymptotic distribution. In simulations using the Wellcome Trust Case Control Consortium data, the error rate of SLIDE's corrected p-values is more than 20 times smaller than the error rate of the previous MVN-based methods' corrected p-values, while SLIDE is orders of magnitude faster than the permutation test and other competing methods. We also extend the MVN framework to the problem of estimating the statistical power of an association study with correlated markers and propose an efficient and accurate power estimation method SLIP. SLIP and SLIDE are available at http://slide.cs.ucla.edu.

Show MeSH