Limits...
Discovering Genetic Interactions in Large-Scale Association Studies by Stage-wise Likelihood Ratio Tests.

Frånberg M, Gertow K, Hamsten A, PROCARDIS consortiumLagergren J, Sennblad B - PLoS Genet. (2015)

Bottom Line: This sequential testing provides an efficient way to reduce the number of non-interacting variant pairs before the final interaction test.Our new methodology facilitates identification of new disease-relevant interactions from existing and future genome-wide association data, which may involve genes with previously unknown association to the disease.Moreover, it enables construction of interaction networks that provide a systems biology view of complex diseases, serving as a basis for more comprehensive understanding of disease pathophysiology and its clinical consequences.

View Article: PubMed Central - PubMed

Affiliation: Atherosclerosis Research Unit, Department of Medicine, Solna, Karolinska Institutet, Stockholm, Sweden; Department of Numerical Analysis and Computer Science, Stockholm University, Stockholm, Sweden; Science for Life Laboratory, Stockholm, Sweden.

ABSTRACT
Despite the success of genome-wide association studies in medical genetics, the underlying genetics of many complex diseases remains enigmatic. One plausible reason for this could be the failure to account for the presence of genetic interactions in current analyses. Exhaustive investigations of interactions are typically infeasible because the vast number of possible interactions impose hard statistical and computational challenges. There is, therefore, a need for computationally efficient methods that build on models appropriately capturing interaction. We introduce a new methodology where we augment the interaction hypothesis with a set of simpler hypotheses that are tested, in order of their complexity, against a saturated alternative hypothesis representing interaction. This sequential testing provides an efficient way to reduce the number of non-interacting variant pairs before the final interaction test. We devise two different methods, one that relies on a priori estimated numbers of marginally associated variants to correct for multiple tests, and a second that does this adaptively. We show that our methodology in general has an improved statistical power in comparison to seven other methods, and, using the idea of closed testing, that it controls the family-wise error rate. We apply our methodology to genetic data from the PROCARDIS coronary artery disease case/control cohort and discover three distinct interactions. While analyses on simulated data suggest that the statistical power may suffice for an exhaustive search of all variant pairs in ideal cases, we explore strategies for a priori selecting subsets of variant pairs to test. Our new methodology facilitates identification of new disease-relevant interactions from existing and future genome-wide association data, which may involve genes with previously unknown association to the disease. Moreover, it enables construction of interaction networks that provide a systems biology view of complex diseases, serving as a basis for more comprehensive understanding of disease pathophysiology and its clinical consequences.

No MeSH data available.


Related in: MedlinePlus

The dependence of power on the estimated number of associated variants.The x-axis is the heritability and the y-axis is the estimated power. The colored dashed lines correspond to our stage-wise test using different number of associated variants, as indicated by the legend to the right. The black solid line corresponds to the logistic regression method using Bonferroni correction (which does not depend on the estimated number of associated variants). The power estimates are based on data simulated from the double dominant interaction model.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4581725&req=5

pgen.1005502.g004: The dependence of power on the estimated number of associated variants.The x-axis is the heritability and the y-axis is the estimated power. The colored dashed lines correspond to our stage-wise test using different number of associated variants, as indicated by the legend to the right. The black solid line corresponds to the logistic regression method using Bonferroni correction (which does not depend on the estimated number of associated variants). The power estimates are based on data simulated from the double dominant interaction model.

Mentions: Intuitively, when more variants are associated with the phenotype in our stage-wise methodology, the multiple testing correction in the intermediate stages becomes larger, and therefore statistical power is reduced. For this reason, we investigated how the statistical power depends on the number of associated variants using data simulated from the double-dominant interaction model (see Material and methods section Generation of synthetic data for estimation of statistical power). As shown in Fig 4, the power decreases as the number of associated variants increases. Because of the additional penalty of the weight, the static method can have lower power than directly testing interaction using a Bonferroni correction, precisely when M(M − 1) > w4N(N − 1) (where N is the total number of variants and M is the number of associated variants). It can be noted that for our biological data, M(M − 1) = 306 ≪ w4N(N − 1) = 346,035,421.8 (based on the N = 33,963 tested variants and the M = 18 robustly associated CAD variants present on the IBC-chip, cf. S1 Table).


Discovering Genetic Interactions in Large-Scale Association Studies by Stage-wise Likelihood Ratio Tests.

Frånberg M, Gertow K, Hamsten A, PROCARDIS consortiumLagergren J, Sennblad B - PLoS Genet. (2015)

The dependence of power on the estimated number of associated variants.The x-axis is the heritability and the y-axis is the estimated power. The colored dashed lines correspond to our stage-wise test using different number of associated variants, as indicated by the legend to the right. The black solid line corresponds to the logistic regression method using Bonferroni correction (which does not depend on the estimated number of associated variants). The power estimates are based on data simulated from the double dominant interaction model.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4581725&req=5

pgen.1005502.g004: The dependence of power on the estimated number of associated variants.The x-axis is the heritability and the y-axis is the estimated power. The colored dashed lines correspond to our stage-wise test using different number of associated variants, as indicated by the legend to the right. The black solid line corresponds to the logistic regression method using Bonferroni correction (which does not depend on the estimated number of associated variants). The power estimates are based on data simulated from the double dominant interaction model.
Mentions: Intuitively, when more variants are associated with the phenotype in our stage-wise methodology, the multiple testing correction in the intermediate stages becomes larger, and therefore statistical power is reduced. For this reason, we investigated how the statistical power depends on the number of associated variants using data simulated from the double-dominant interaction model (see Material and methods section Generation of synthetic data for estimation of statistical power). As shown in Fig 4, the power decreases as the number of associated variants increases. Because of the additional penalty of the weight, the static method can have lower power than directly testing interaction using a Bonferroni correction, precisely when M(M − 1) > w4N(N − 1) (where N is the total number of variants and M is the number of associated variants). It can be noted that for our biological data, M(M − 1) = 306 ≪ w4N(N − 1) = 346,035,421.8 (based on the N = 33,963 tested variants and the M = 18 robustly associated CAD variants present on the IBC-chip, cf. S1 Table).

Bottom Line: This sequential testing provides an efficient way to reduce the number of non-interacting variant pairs before the final interaction test.Our new methodology facilitates identification of new disease-relevant interactions from existing and future genome-wide association data, which may involve genes with previously unknown association to the disease.Moreover, it enables construction of interaction networks that provide a systems biology view of complex diseases, serving as a basis for more comprehensive understanding of disease pathophysiology and its clinical consequences.

View Article: PubMed Central - PubMed

Affiliation: Atherosclerosis Research Unit, Department of Medicine, Solna, Karolinska Institutet, Stockholm, Sweden; Department of Numerical Analysis and Computer Science, Stockholm University, Stockholm, Sweden; Science for Life Laboratory, Stockholm, Sweden.

ABSTRACT
Despite the success of genome-wide association studies in medical genetics, the underlying genetics of many complex diseases remains enigmatic. One plausible reason for this could be the failure to account for the presence of genetic interactions in current analyses. Exhaustive investigations of interactions are typically infeasible because the vast number of possible interactions impose hard statistical and computational challenges. There is, therefore, a need for computationally efficient methods that build on models appropriately capturing interaction. We introduce a new methodology where we augment the interaction hypothesis with a set of simpler hypotheses that are tested, in order of their complexity, against a saturated alternative hypothesis representing interaction. This sequential testing provides an efficient way to reduce the number of non-interacting variant pairs before the final interaction test. We devise two different methods, one that relies on a priori estimated numbers of marginally associated variants to correct for multiple tests, and a second that does this adaptively. We show that our methodology in general has an improved statistical power in comparison to seven other methods, and, using the idea of closed testing, that it controls the family-wise error rate. We apply our methodology to genetic data from the PROCARDIS coronary artery disease case/control cohort and discover three distinct interactions. While analyses on simulated data suggest that the statistical power may suffice for an exhaustive search of all variant pairs in ideal cases, we explore strategies for a priori selecting subsets of variant pairs to test. Our new methodology facilitates identification of new disease-relevant interactions from existing and future genome-wide association data, which may involve genes with previously unknown association to the disease. Moreover, it enables construction of interaction networks that provide a systems biology view of complex diseases, serving as a basis for more comprehensive understanding of disease pathophysiology and its clinical consequences.

No MeSH data available.


Related in: MedlinePlus