Limits...
Discovering Genetic Interactions in Large-Scale Association Studies by Stage-wise Likelihood Ratio Tests.

Frånberg M, Gertow K, Hamsten A, PROCARDIS consortiumLagergren J, Sennblad B - PLoS Genet. (2015)

Bottom Line: This sequential testing provides an efficient way to reduce the number of non-interacting variant pairs before the final interaction test.Our new methodology facilitates identification of new disease-relevant interactions from existing and future genome-wide association data, which may involve genes with previously unknown association to the disease.Moreover, it enables construction of interaction networks that provide a systems biology view of complex diseases, serving as a basis for more comprehensive understanding of disease pathophysiology and its clinical consequences.

View Article: PubMed Central - PubMed

Affiliation: Atherosclerosis Research Unit, Department of Medicine, Solna, Karolinska Institutet, Stockholm, Sweden; Department of Numerical Analysis and Computer Science, Stockholm University, Stockholm, Sweden; Science for Life Laboratory, Stockholm, Sweden.

ABSTRACT
Despite the success of genome-wide association studies in medical genetics, the underlying genetics of many complex diseases remains enigmatic. One plausible reason for this could be the failure to account for the presence of genetic interactions in current analyses. Exhaustive investigations of interactions are typically infeasible because the vast number of possible interactions impose hard statistical and computational challenges. There is, therefore, a need for computationally efficient methods that build on models appropriately capturing interaction. We introduce a new methodology where we augment the interaction hypothesis with a set of simpler hypotheses that are tested, in order of their complexity, against a saturated alternative hypothesis representing interaction. This sequential testing provides an efficient way to reduce the number of non-interacting variant pairs before the final interaction test. We devise two different methods, one that relies on a priori estimated numbers of marginally associated variants to correct for multiple tests, and a second that does this adaptively. We show that our methodology in general has an improved statistical power in comparison to seven other methods, and, using the idea of closed testing, that it controls the family-wise error rate. We apply our methodology to genetic data from the PROCARDIS coronary artery disease case/control cohort and discover three distinct interactions. While analyses on simulated data suggest that the statistical power may suffice for an exhaustive search of all variant pairs in ideal cases, we explore strategies for a priori selecting subsets of variant pairs to test. Our new methodology facilitates identification of new disease-relevant interactions from existing and future genome-wide association data, which may involve genes with previously unknown association to the disease. Moreover, it enables construction of interaction networks that provide a systems biology view of complex diseases, serving as a basis for more comprehensive understanding of disease pathophysiology and its clinical consequences.

No MeSH data available.


Related in: MedlinePlus

The exceedence distribution of power over all possible interaction models with a specific heritability.For each plot, the x-axis shows a threshold, t, for power to detect an interaction among 1012 variant pairs, and the corresponding y-axis shows the fraction of models that have a power greater than or equal to t. The rows correspond to the sample size of a balanced design e.g. 2000 indicates 2000 cases and 2000 controls. The columns correspond to the heritability of the models. Six methods for inference of interactions Logistic, Marginal+logistic, CSS+marginal, R2+marginal, LD-contrast, and Sixpac; see text for details), are compared to our static stage-wise scale-invariant method. The line colors used to denote the different methods are shown in the legend to the right.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4581725&req=5

pgen.1005502.g003: The exceedence distribution of power over all possible interaction models with a specific heritability.For each plot, the x-axis shows a threshold, t, for power to detect an interaction among 1012 variant pairs, and the corresponding y-axis shows the fraction of models that have a power greater than or equal to t. The rows correspond to the sample size of a balanced design e.g. 2000 indicates 2000 cases and 2000 controls. The columns correspond to the heritability of the models. Six methods for inference of interactions Logistic, Marginal+logistic, CSS+marginal, R2+marginal, LD-contrast, and Sixpac; see text for details), are compared to our static stage-wise scale-invariant method. The line colors used to denote the different methods are shown in the legend to the right.

Mentions: We further investigated the distribution of statistical power of seven methods using simulated data generated from the spectrum of all possible interaction models, following the ideas of [35] (see Material and methods section Generation of synthetic data for estimation of statistical power for details). The first of these methods is our static method, and the remaining methods include four methods based on a logit-link GLM with different screening strategies, Logistic (without screening), Marginal+logistic [29], CSS+logistic [30] and R2+logistic [31]) and two methods based on the LD-contrast test with different screening strategies, LD-contrast (without screening), and Sixpac [23] (a LDcases+LD-contrast method), for details, see Material and methods section Comparison of statistical methods. It should be noted that none of the latter six methods are scale-invariant—one may expect that this property would enhance their power. For simplicity of simulations, we only evaluated the static method here; however, since the adaptive method is more powerful than the static, this can also be viewed as a conservative estimate of the power of the adaptive method. As can be seen in Fig 3, the static method consistently has greater power than the other approaches. The marginal+logistic method performs best of the remaining methods, while the the LD-contrast method have the worst performance. In S1–S4 Figs, we also report the result of a more computationally intensive power comparison, including the above methods, as well as our adaptive stage-wise scale-invariance method and the Model-based MDR (MB-MDR) method [26] (see S2 Text for details). These results corroborate those above, that is, for most models our stage-wise methods performs better than the other methods (see further discussion in S2 Text).


Discovering Genetic Interactions in Large-Scale Association Studies by Stage-wise Likelihood Ratio Tests.

Frånberg M, Gertow K, Hamsten A, PROCARDIS consortiumLagergren J, Sennblad B - PLoS Genet. (2015)

The exceedence distribution of power over all possible interaction models with a specific heritability.For each plot, the x-axis shows a threshold, t, for power to detect an interaction among 1012 variant pairs, and the corresponding y-axis shows the fraction of models that have a power greater than or equal to t. The rows correspond to the sample size of a balanced design e.g. 2000 indicates 2000 cases and 2000 controls. The columns correspond to the heritability of the models. Six methods for inference of interactions Logistic, Marginal+logistic, CSS+marginal, R2+marginal, LD-contrast, and Sixpac; see text for details), are compared to our static stage-wise scale-invariant method. The line colors used to denote the different methods are shown in the legend to the right.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4581725&req=5

pgen.1005502.g003: The exceedence distribution of power over all possible interaction models with a specific heritability.For each plot, the x-axis shows a threshold, t, for power to detect an interaction among 1012 variant pairs, and the corresponding y-axis shows the fraction of models that have a power greater than or equal to t. The rows correspond to the sample size of a balanced design e.g. 2000 indicates 2000 cases and 2000 controls. The columns correspond to the heritability of the models. Six methods for inference of interactions Logistic, Marginal+logistic, CSS+marginal, R2+marginal, LD-contrast, and Sixpac; see text for details), are compared to our static stage-wise scale-invariant method. The line colors used to denote the different methods are shown in the legend to the right.
Mentions: We further investigated the distribution of statistical power of seven methods using simulated data generated from the spectrum of all possible interaction models, following the ideas of [35] (see Material and methods section Generation of synthetic data for estimation of statistical power for details). The first of these methods is our static method, and the remaining methods include four methods based on a logit-link GLM with different screening strategies, Logistic (without screening), Marginal+logistic [29], CSS+logistic [30] and R2+logistic [31]) and two methods based on the LD-contrast test with different screening strategies, LD-contrast (without screening), and Sixpac [23] (a LDcases+LD-contrast method), for details, see Material and methods section Comparison of statistical methods. It should be noted that none of the latter six methods are scale-invariant—one may expect that this property would enhance their power. For simplicity of simulations, we only evaluated the static method here; however, since the adaptive method is more powerful than the static, this can also be viewed as a conservative estimate of the power of the adaptive method. As can be seen in Fig 3, the static method consistently has greater power than the other approaches. The marginal+logistic method performs best of the remaining methods, while the the LD-contrast method have the worst performance. In S1–S4 Figs, we also report the result of a more computationally intensive power comparison, including the above methods, as well as our adaptive stage-wise scale-invariance method and the Model-based MDR (MB-MDR) method [26] (see S2 Text for details). These results corroborate those above, that is, for most models our stage-wise methods performs better than the other methods (see further discussion in S2 Text).

Bottom Line: This sequential testing provides an efficient way to reduce the number of non-interacting variant pairs before the final interaction test.Our new methodology facilitates identification of new disease-relevant interactions from existing and future genome-wide association data, which may involve genes with previously unknown association to the disease.Moreover, it enables construction of interaction networks that provide a systems biology view of complex diseases, serving as a basis for more comprehensive understanding of disease pathophysiology and its clinical consequences.

View Article: PubMed Central - PubMed

Affiliation: Atherosclerosis Research Unit, Department of Medicine, Solna, Karolinska Institutet, Stockholm, Sweden; Department of Numerical Analysis and Computer Science, Stockholm University, Stockholm, Sweden; Science for Life Laboratory, Stockholm, Sweden.

ABSTRACT
Despite the success of genome-wide association studies in medical genetics, the underlying genetics of many complex diseases remains enigmatic. One plausible reason for this could be the failure to account for the presence of genetic interactions in current analyses. Exhaustive investigations of interactions are typically infeasible because the vast number of possible interactions impose hard statistical and computational challenges. There is, therefore, a need for computationally efficient methods that build on models appropriately capturing interaction. We introduce a new methodology where we augment the interaction hypothesis with a set of simpler hypotheses that are tested, in order of their complexity, against a saturated alternative hypothesis representing interaction. This sequential testing provides an efficient way to reduce the number of non-interacting variant pairs before the final interaction test. We devise two different methods, one that relies on a priori estimated numbers of marginally associated variants to correct for multiple tests, and a second that does this adaptively. We show that our methodology in general has an improved statistical power in comparison to seven other methods, and, using the idea of closed testing, that it controls the family-wise error rate. We apply our methodology to genetic data from the PROCARDIS coronary artery disease case/control cohort and discover three distinct interactions. While analyses on simulated data suggest that the statistical power may suffice for an exhaustive search of all variant pairs in ideal cases, we explore strategies for a priori selecting subsets of variant pairs to test. Our new methodology facilitates identification of new disease-relevant interactions from existing and future genome-wide association data, which may involve genes with previously unknown association to the disease. Moreover, it enables construction of interaction networks that provide a systems biology view of complex diseases, serving as a basis for more comprehensive understanding of disease pathophysiology and its clinical consequences.

No MeSH data available.


Related in: MedlinePlus