Limits...
Statistical power of model selection strategies for genome-wide association studies.

Wu Z, Zhao H - PLoS Genet. (2009)

Bottom Line: Because many genes are potentially involved in common diseases and a large number of markers are analyzed, it is crucial to devise an effective strategy to identify truly associated variants that have individual and/or interactive effects, while controlling false positives at the desired level.After the accuracy of our analytical results is validated through simulations, they are utilized to systematically evaluate and compare the performance of these strategies in a wide class of genetic models.For a specific genetic model, our results clearly reveal how different factors, such as effect size, allele frequency, and interaction, jointly affect the statistical power of each strategy.

View Article: PubMed Central - PubMed

Affiliation: Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, Connecticut, United States of America.

ABSTRACT
Genome-wide association studies (GWAS) aim to identify genetic variants related to diseases by examining the associations between phenotypes and hundreds of thousands of genotyped markers. Because many genes are potentially involved in common diseases and a large number of markers are analyzed, it is crucial to devise an effective strategy to identify truly associated variants that have individual and/or interactive effects, while controlling false positives at the desired level. Although a number of model selection methods have been proposed in the literature, including marginal search, exhaustive search, and forward search, their relative performance has only been evaluated through limited simulations due to the lack of an analytical approach to calculating the power of these methods. This article develops a novel statistical approach for power calculation, derives accurate formulas for the power of different model selection strategies, and then uses the formulas to evaluate and compare these strategies in genetic model spaces. In contrast to previous studies, our theoretical framework allows for random genotypes, correlations among test statistics, and a false-positive control based on GWAS practice. After the accuracy of our analytical results is validated through simulations, they are utilized to systematically evaluate and compare the performance of these strategies in a wide class of genetic models. For a specific genetic model, our results clearly reveal how different factors, such as effect size, allele frequency, and interaction, jointly affect the statistical power of each strategy. An example is provided for the application of our approach to empirical research. The statistical approach used in our derivations is general and can be employed to address the model selection problems in other random predictor settings. We have developed an R package markerSearchPower to implement our formulas, which can be downloaded from the Comprehensive R Archive Network (CRAN) or http://bioinformatics.med.yale.edu/group/.

Show MeSH

Related in: MedlinePlus

Plots of model selection power with given observed marginal effects.Power comparisons of three model selection procedures over a sequence of epistatic effect b3: marginal search by black solid curve, exhaustive search by red dashed curve, and forward search by green dotted curve. We assume the true SNPs to be rs11107116 and rs10906982, which influence adult height with their marginal effects set to be the same as those observed in Weedon et al. 2008. Graphs A with R = 1 and C with R = 20 indicate the power of finding both SNPs; graphs B with R = 1 and D with R = 20 indicate the power of finding at least one of the two SNPs.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2712761&req=5

pgen-1000582-g004: Plots of model selection power with given observed marginal effects.Power comparisons of three model selection procedures over a sequence of epistatic effect b3: marginal search by black solid curve, exhaustive search by red dashed curve, and forward search by green dotted curve. We assume the true SNPs to be rs11107116 and rs10906982, which influence adult height with their marginal effects set to be the same as those observed in Weedon et al. 2008. Graphs A with R = 1 and C with R = 20 indicate the power of finding both SNPs; graphs B with R = 1 and D with R = 20 indicate the power of finding at least one of the two SNPs.

Mentions: Figure 4 shows the comparisons among the power of the three model selection methods over different values of b3. For the detection of both SNPs, graphs A (R = 1) and C (R = 20) indicate that if the magnitude of epistasis b3 is large, exhaustive search (red dashed curve) has significant advantage over forward search (green dotted curve), which is better than marginal search (black solid curve). If b3 is small, marginal search has higher power than the other two. For the detection of at least one of the two SNPs, graphs B (R = 1) and D (R = 20) indicate that marginal search is similar or better than forward search; both methods are not affected by the variation of b3. The relative performance of exhaustive search strongly depends on the magnitude of epistasis. Comparing graphs B (R = 1) and D (R = 20), it is clear that marginal search is superior over a larger region when a larger false discovery number R is tolerated.


Statistical power of model selection strategies for genome-wide association studies.

Wu Z, Zhao H - PLoS Genet. (2009)

Plots of model selection power with given observed marginal effects.Power comparisons of three model selection procedures over a sequence of epistatic effect b3: marginal search by black solid curve, exhaustive search by red dashed curve, and forward search by green dotted curve. We assume the true SNPs to be rs11107116 and rs10906982, which influence adult height with their marginal effects set to be the same as those observed in Weedon et al. 2008. Graphs A with R = 1 and C with R = 20 indicate the power of finding both SNPs; graphs B with R = 1 and D with R = 20 indicate the power of finding at least one of the two SNPs.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2712761&req=5

pgen-1000582-g004: Plots of model selection power with given observed marginal effects.Power comparisons of three model selection procedures over a sequence of epistatic effect b3: marginal search by black solid curve, exhaustive search by red dashed curve, and forward search by green dotted curve. We assume the true SNPs to be rs11107116 and rs10906982, which influence adult height with their marginal effects set to be the same as those observed in Weedon et al. 2008. Graphs A with R = 1 and C with R = 20 indicate the power of finding both SNPs; graphs B with R = 1 and D with R = 20 indicate the power of finding at least one of the two SNPs.
Mentions: Figure 4 shows the comparisons among the power of the three model selection methods over different values of b3. For the detection of both SNPs, graphs A (R = 1) and C (R = 20) indicate that if the magnitude of epistasis b3 is large, exhaustive search (red dashed curve) has significant advantage over forward search (green dotted curve), which is better than marginal search (black solid curve). If b3 is small, marginal search has higher power than the other two. For the detection of at least one of the two SNPs, graphs B (R = 1) and D (R = 20) indicate that marginal search is similar or better than forward search; both methods are not affected by the variation of b3. The relative performance of exhaustive search strongly depends on the magnitude of epistasis. Comparing graphs B (R = 1) and D (R = 20), it is clear that marginal search is superior over a larger region when a larger false discovery number R is tolerated.

Bottom Line: Because many genes are potentially involved in common diseases and a large number of markers are analyzed, it is crucial to devise an effective strategy to identify truly associated variants that have individual and/or interactive effects, while controlling false positives at the desired level.After the accuracy of our analytical results is validated through simulations, they are utilized to systematically evaluate and compare the performance of these strategies in a wide class of genetic models.For a specific genetic model, our results clearly reveal how different factors, such as effect size, allele frequency, and interaction, jointly affect the statistical power of each strategy.

View Article: PubMed Central - PubMed

Affiliation: Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, Connecticut, United States of America.

ABSTRACT
Genome-wide association studies (GWAS) aim to identify genetic variants related to diseases by examining the associations between phenotypes and hundreds of thousands of genotyped markers. Because many genes are potentially involved in common diseases and a large number of markers are analyzed, it is crucial to devise an effective strategy to identify truly associated variants that have individual and/or interactive effects, while controlling false positives at the desired level. Although a number of model selection methods have been proposed in the literature, including marginal search, exhaustive search, and forward search, their relative performance has only been evaluated through limited simulations due to the lack of an analytical approach to calculating the power of these methods. This article develops a novel statistical approach for power calculation, derives accurate formulas for the power of different model selection strategies, and then uses the formulas to evaluate and compare these strategies in genetic model spaces. In contrast to previous studies, our theoretical framework allows for random genotypes, correlations among test statistics, and a false-positive control based on GWAS practice. After the accuracy of our analytical results is validated through simulations, they are utilized to systematically evaluate and compare the performance of these strategies in a wide class of genetic models. For a specific genetic model, our results clearly reveal how different factors, such as effect size, allele frequency, and interaction, jointly affect the statistical power of each strategy. An example is provided for the application of our approach to empirical research. The statistical approach used in our derivations is general and can be employed to address the model selection problems in other random predictor settings. We have developed an R package markerSearchPower to implement our formulas, which can be downloaded from the Comprehensive R Archive Network (CRAN) or http://bioinformatics.med.yale.edu/group/.

Show MeSH
Related in: MedlinePlus