Limits...
Statistical power of model selection strategies for genome-wide association studies.

Wu Z, Zhao H - PLoS Genet. (2009)

Bottom Line: Because many genes are potentially involved in common diseases and a large number of markers are analyzed, it is crucial to devise an effective strategy to identify truly associated variants that have individual and/or interactive effects, while controlling false positives at the desired level.After the accuracy of our analytical results is validated through simulations, they are utilized to systematically evaluate and compare the performance of these strategies in a wide class of genetic models.For a specific genetic model, our results clearly reveal how different factors, such as effect size, allele frequency, and interaction, jointly affect the statistical power of each strategy.

View Article: PubMed Central - PubMed

Affiliation: Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, Connecticut, United States of America.

ABSTRACT
Genome-wide association studies (GWAS) aim to identify genetic variants related to diseases by examining the associations between phenotypes and hundreds of thousands of genotyped markers. Because many genes are potentially involved in common diseases and a large number of markers are analyzed, it is crucial to devise an effective strategy to identify truly associated variants that have individual and/or interactive effects, while controlling false positives at the desired level. Although a number of model selection methods have been proposed in the literature, including marginal search, exhaustive search, and forward search, their relative performance has only been evaluated through limited simulations due to the lack of an analytical approach to calculating the power of these methods. This article develops a novel statistical approach for power calculation, derives accurate formulas for the power of different model selection strategies, and then uses the formulas to evaluate and compare these strategies in genetic model spaces. In contrast to previous studies, our theoretical framework allows for random genotypes, correlations among test statistics, and a false-positive control based on GWAS practice. After the accuracy of our analytical results is validated through simulations, they are utilized to systematically evaluate and compare the performance of these strategies in a wide class of genetic models. For a specific genetic model, our results clearly reveal how different factors, such as effect size, allele frequency, and interaction, jointly affect the statistical power of each strategy. An example is provided for the application of our approach to empirical research. The statistical approach used in our derivations is general and can be employed to address the model selection problems in other random predictor settings. We have developed an R package markerSearchPower to implement our formulas, which can be downloaded from the Comprehensive R Archive Network (CRAN) or http://bioinformatics.med.yale.edu/group/.

Show MeSH
3D plots of statistical power over genetic model space.The results of power for the three model selection methods: marginal search in the left column, exhaustive search in the middle column and forward search in the right column. Two definitions of power (A) for detecting the true model or both true SNPs in marginal search in row 1, and (B) for detecting either true SNP in row 2 are considered. We consider genetic models with the main effects b1 = b2 varying from −1 to 1 and the epistatic effect b3 varying from −1 to 1. The allele frequency qj = 0.3, j = 1, …, p, and the false discovery number R is set to be 10.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2712761&req=5

pgen-1000582-g001: 3D plots of statistical power over genetic model space.The results of power for the three model selection methods: marginal search in the left column, exhaustive search in the middle column and forward search in the right column. Two definitions of power (A) for detecting the true model or both true SNPs in marginal search in row 1, and (B) for detecting either true SNP in row 2 are considered. We consider genetic models with the main effects b1 = b2 varying from −1 to 1 and the epistatic effect b3 varying from −1 to 1. The allele frequency qj = 0.3, j = 1, …, p, and the false discovery number R is set to be 10.

Mentions: Figure 1 gives the 3D plots of statistical power over the genetic model space for different model selection methods (in columns) under two power definitions (A) and (B) (in rows), when controlling the number of false discoveries to be R = 10. These figures illustrate that marginal search and forward search cannot detect the marginal association of the influential SNP 1 or 2 in a certain region of the model space, while exhaustive search can. This portion of the model space is represented by the region where the power of marginal search and that of forward search are very close to 0, no matter how large the genetic effect is. According to formulas (8) and (16) in the Methods section, the marginally non-detectable region for SNP 1, where b1+b3(p2−q2) = 0, depends on the additive genetic effect b1, epistatic effect b3, and the allele frequency p2 of SNP 2. The non-detectable region for SNP 2 is analogous by symmetry. In exhaustive search, such region does not exist, as indicated by formula (12). So, exhaustive search can better identify the signals when they are counterbalanced.


Statistical power of model selection strategies for genome-wide association studies.

Wu Z, Zhao H - PLoS Genet. (2009)

3D plots of statistical power over genetic model space.The results of power for the three model selection methods: marginal search in the left column, exhaustive search in the middle column and forward search in the right column. Two definitions of power (A) for detecting the true model or both true SNPs in marginal search in row 1, and (B) for detecting either true SNP in row 2 are considered. We consider genetic models with the main effects b1 = b2 varying from −1 to 1 and the epistatic effect b3 varying from −1 to 1. The allele frequency qj = 0.3, j = 1, …, p, and the false discovery number R is set to be 10.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2712761&req=5

pgen-1000582-g001: 3D plots of statistical power over genetic model space.The results of power for the three model selection methods: marginal search in the left column, exhaustive search in the middle column and forward search in the right column. Two definitions of power (A) for detecting the true model or both true SNPs in marginal search in row 1, and (B) for detecting either true SNP in row 2 are considered. We consider genetic models with the main effects b1 = b2 varying from −1 to 1 and the epistatic effect b3 varying from −1 to 1. The allele frequency qj = 0.3, j = 1, …, p, and the false discovery number R is set to be 10.
Mentions: Figure 1 gives the 3D plots of statistical power over the genetic model space for different model selection methods (in columns) under two power definitions (A) and (B) (in rows), when controlling the number of false discoveries to be R = 10. These figures illustrate that marginal search and forward search cannot detect the marginal association of the influential SNP 1 or 2 in a certain region of the model space, while exhaustive search can. This portion of the model space is represented by the region where the power of marginal search and that of forward search are very close to 0, no matter how large the genetic effect is. According to formulas (8) and (16) in the Methods section, the marginally non-detectable region for SNP 1, where b1+b3(p2−q2) = 0, depends on the additive genetic effect b1, epistatic effect b3, and the allele frequency p2 of SNP 2. The non-detectable region for SNP 2 is analogous by symmetry. In exhaustive search, such region does not exist, as indicated by formula (12). So, exhaustive search can better identify the signals when they are counterbalanced.

Bottom Line: Because many genes are potentially involved in common diseases and a large number of markers are analyzed, it is crucial to devise an effective strategy to identify truly associated variants that have individual and/or interactive effects, while controlling false positives at the desired level.After the accuracy of our analytical results is validated through simulations, they are utilized to systematically evaluate and compare the performance of these strategies in a wide class of genetic models.For a specific genetic model, our results clearly reveal how different factors, such as effect size, allele frequency, and interaction, jointly affect the statistical power of each strategy.

View Article: PubMed Central - PubMed

Affiliation: Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, Connecticut, United States of America.

ABSTRACT
Genome-wide association studies (GWAS) aim to identify genetic variants related to diseases by examining the associations between phenotypes and hundreds of thousands of genotyped markers. Because many genes are potentially involved in common diseases and a large number of markers are analyzed, it is crucial to devise an effective strategy to identify truly associated variants that have individual and/or interactive effects, while controlling false positives at the desired level. Although a number of model selection methods have been proposed in the literature, including marginal search, exhaustive search, and forward search, their relative performance has only been evaluated through limited simulations due to the lack of an analytical approach to calculating the power of these methods. This article develops a novel statistical approach for power calculation, derives accurate formulas for the power of different model selection strategies, and then uses the formulas to evaluate and compare these strategies in genetic model spaces. In contrast to previous studies, our theoretical framework allows for random genotypes, correlations among test statistics, and a false-positive control based on GWAS practice. After the accuracy of our analytical results is validated through simulations, they are utilized to systematically evaluate and compare the performance of these strategies in a wide class of genetic models. For a specific genetic model, our results clearly reveal how different factors, such as effect size, allele frequency, and interaction, jointly affect the statistical power of each strategy. An example is provided for the application of our approach to empirical research. The statistical approach used in our derivations is general and can be employed to address the model selection problems in other random predictor settings. We have developed an R package markerSearchPower to implement our formulas, which can be downloaded from the Comprehensive R Archive Network (CRAN) or http://bioinformatics.med.yale.edu/group/.

Show MeSH