Limits...
Discovering Genetic Interactions in Large-Scale Association Studies by Stage-wise Likelihood Ratio Tests.

Frånberg M, Gertow K, Hamsten A, PROCARDIS consortiumLagergren J, Sennblad B - PLoS Genet. (2015)

Bottom Line: This sequential testing provides an efficient way to reduce the number of non-interacting variant pairs before the final interaction test.Our new methodology facilitates identification of new disease-relevant interactions from existing and future genome-wide association data, which may involve genes with previously unknown association to the disease.Moreover, it enables construction of interaction networks that provide a systems biology view of complex diseases, serving as a basis for more comprehensive understanding of disease pathophysiology and its clinical consequences.

View Article: PubMed Central - PubMed

Affiliation: Atherosclerosis Research Unit, Department of Medicine, Solna, Karolinska Institutet, Stockholm, Sweden; Department of Numerical Analysis and Computer Science, Stockholm University, Stockholm, Sweden; Science for Life Laboratory, Stockholm, Sweden.

ABSTRACT
Despite the success of genome-wide association studies in medical genetics, the underlying genetics of many complex diseases remains enigmatic. One plausible reason for this could be the failure to account for the presence of genetic interactions in current analyses. Exhaustive investigations of interactions are typically infeasible because the vast number of possible interactions impose hard statistical and computational challenges. There is, therefore, a need for computationally efficient methods that build on models appropriately capturing interaction. We introduce a new methodology where we augment the interaction hypothesis with a set of simpler hypotheses that are tested, in order of their complexity, against a saturated alternative hypothesis representing interaction. This sequential testing provides an efficient way to reduce the number of non-interacting variant pairs before the final interaction test. We devise two different methods, one that relies on a priori estimated numbers of marginally associated variants to correct for multiple tests, and a second that does this adaptively. We show that our methodology in general has an improved statistical power in comparison to seven other methods, and, using the idea of closed testing, that it controls the family-wise error rate. We apply our methodology to genetic data from the PROCARDIS coronary artery disease case/control cohort and discover three distinct interactions. While analyses on simulated data suggest that the statistical power may suffice for an exhaustive search of all variant pairs in ideal cases, we explore strategies for a priori selecting subsets of variant pairs to test. Our new methodology facilitates identification of new disease-relevant interactions from existing and future genome-wide association data, which may involve genes with previously unknown association to the disease. Moreover, it enables construction of interaction networks that provide a systems biology view of complex diseases, serving as a basis for more comprehensive understanding of disease pathophysiology and its clinical consequences.

No MeSH data available.


Related in: MedlinePlus

Illustration of the rejection procedure.a) We have a set of four hypotheses that are closed under intersection. We start at stage 1 by testing the simplest  hypothesis H1 for each variant pair; the p-value threshold α for this test is corrected for the total number of pairs. In the figure H1 is accepted in the first and last pair, and these pairs will not be tested in the subsequent stages. We then continue through the  hypotheses from simple to complex but correcting the α for each stage only for the expected number of pairs, in the static method, or the actual number of pairs, in the adaptive method, that are tested at this stage. Finally if all  hypotheses could be rejected for a specific pair i (e.g., snp2 and snp3 in the figure), we declare pair i to be interacting. b) Pseudocode describing the static stage-wise testing method. Variant pairs are ordered from 1 to n. Null hypotheses are ordered from 1 to m in any order that respects the partial order of how they are nested. Only pairs for which the  hypothesis was rejected in the previous step are considered in the current step. The p-value for testing  hypothesis j for pair i is pij. Rejected hypotheses in stage j are contained in the set Rj, and α is the significance threshold. The hypotheses of interaction for pair i is accepted only if the  hypotheses for all m tests could be rejected. c) Pseudocode describing the adaptive method. The overall algorithm is the same as the static. However, the significance threshold is now determined by the total number of rejections in the previous stage ∣Rj−1∣.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4581725&req=5

pgen.1005502.g001: Illustration of the rejection procedure.a) We have a set of four hypotheses that are closed under intersection. We start at stage 1 by testing the simplest hypothesis H1 for each variant pair; the p-value threshold α for this test is corrected for the total number of pairs. In the figure H1 is accepted in the first and last pair, and these pairs will not be tested in the subsequent stages. We then continue through the hypotheses from simple to complex but correcting the α for each stage only for the expected number of pairs, in the static method, or the actual number of pairs, in the adaptive method, that are tested at this stage. Finally if all hypotheses could be rejected for a specific pair i (e.g., snp2 and snp3 in the figure), we declare pair i to be interacting. b) Pseudocode describing the static stage-wise testing method. Variant pairs are ordered from 1 to n. Null hypotheses are ordered from 1 to m in any order that respects the partial order of how they are nested. Only pairs for which the hypothesis was rejected in the previous step are considered in the current step. The p-value for testing hypothesis j for pair i is pij. Rejected hypotheses in stage j are contained in the set Rj, and α is the significance threshold. The hypotheses of interaction for pair i is accepted only if the hypotheses for all m tests could be rejected. c) Pseudocode describing the adaptive method. The overall algorithm is the same as the static. However, the significance threshold is now determined by the total number of rejections in the previous stage ∣Rj−1∣.

Mentions: In the first method, the static method, we assume that the exact number of variant pairs belonging to each stage is known. Intuitively, a variant pair belongs to a stage if the model at this stage is the simplest model that is correct for the pair. To preserve the FWER, we introduce weights {ws, s ∈ [4] : ∑s∈[4]ws = 1}, one for each stage, that adjust the p-value thresholds for the four stages of tests. Let Ks be the number of variant pairs belonging to a stage t ≥ s, and pis be the p-value of stage s for pair i. If pis < wsα/Ks, the pair i is tested for stage s + 1. The idea is illustrated in Fig 1a and the algorithm is outlined in Fig 1b. A generalized version of the closed testing principle [32] can be used to show that this method controls the FWER, a proof is provided in S1 Text. The adjusted p-value is defined [33] and can be computed byp˜i=maxsKspisws.The following is an example on how to estimate the number of hypotheses in each stage. Let N be the number of genotyped variants and M be the number of marginally associated variants (which, e.g., can be taken from a meta analysis). Then estimates of the static multiple testing correction, Ks, for each stage are, in order, , N ⋅ M, N ⋅ M and .


Discovering Genetic Interactions in Large-Scale Association Studies by Stage-wise Likelihood Ratio Tests.

Frånberg M, Gertow K, Hamsten A, PROCARDIS consortiumLagergren J, Sennblad B - PLoS Genet. (2015)

Illustration of the rejection procedure.a) We have a set of four hypotheses that are closed under intersection. We start at stage 1 by testing the simplest  hypothesis H1 for each variant pair; the p-value threshold α for this test is corrected for the total number of pairs. In the figure H1 is accepted in the first and last pair, and these pairs will not be tested in the subsequent stages. We then continue through the  hypotheses from simple to complex but correcting the α for each stage only for the expected number of pairs, in the static method, or the actual number of pairs, in the adaptive method, that are tested at this stage. Finally if all  hypotheses could be rejected for a specific pair i (e.g., snp2 and snp3 in the figure), we declare pair i to be interacting. b) Pseudocode describing the static stage-wise testing method. Variant pairs are ordered from 1 to n. Null hypotheses are ordered from 1 to m in any order that respects the partial order of how they are nested. Only pairs for which the  hypothesis was rejected in the previous step are considered in the current step. The p-value for testing  hypothesis j for pair i is pij. Rejected hypotheses in stage j are contained in the set Rj, and α is the significance threshold. The hypotheses of interaction for pair i is accepted only if the  hypotheses for all m tests could be rejected. c) Pseudocode describing the adaptive method. The overall algorithm is the same as the static. However, the significance threshold is now determined by the total number of rejections in the previous stage ∣Rj−1∣.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4581725&req=5

pgen.1005502.g001: Illustration of the rejection procedure.a) We have a set of four hypotheses that are closed under intersection. We start at stage 1 by testing the simplest hypothesis H1 for each variant pair; the p-value threshold α for this test is corrected for the total number of pairs. In the figure H1 is accepted in the first and last pair, and these pairs will not be tested in the subsequent stages. We then continue through the hypotheses from simple to complex but correcting the α for each stage only for the expected number of pairs, in the static method, or the actual number of pairs, in the adaptive method, that are tested at this stage. Finally if all hypotheses could be rejected for a specific pair i (e.g., snp2 and snp3 in the figure), we declare pair i to be interacting. b) Pseudocode describing the static stage-wise testing method. Variant pairs are ordered from 1 to n. Null hypotheses are ordered from 1 to m in any order that respects the partial order of how they are nested. Only pairs for which the hypothesis was rejected in the previous step are considered in the current step. The p-value for testing hypothesis j for pair i is pij. Rejected hypotheses in stage j are contained in the set Rj, and α is the significance threshold. The hypotheses of interaction for pair i is accepted only if the hypotheses for all m tests could be rejected. c) Pseudocode describing the adaptive method. The overall algorithm is the same as the static. However, the significance threshold is now determined by the total number of rejections in the previous stage ∣Rj−1∣.
Mentions: In the first method, the static method, we assume that the exact number of variant pairs belonging to each stage is known. Intuitively, a variant pair belongs to a stage if the model at this stage is the simplest model that is correct for the pair. To preserve the FWER, we introduce weights {ws, s ∈ [4] : ∑s∈[4]ws = 1}, one for each stage, that adjust the p-value thresholds for the four stages of tests. Let Ks be the number of variant pairs belonging to a stage t ≥ s, and pis be the p-value of stage s for pair i. If pis < wsα/Ks, the pair i is tested for stage s + 1. The idea is illustrated in Fig 1a and the algorithm is outlined in Fig 1b. A generalized version of the closed testing principle [32] can be used to show that this method controls the FWER, a proof is provided in S1 Text. The adjusted p-value is defined [33] and can be computed byp˜i=maxsKspisws.The following is an example on how to estimate the number of hypotheses in each stage. Let N be the number of genotyped variants and M be the number of marginally associated variants (which, e.g., can be taken from a meta analysis). Then estimates of the static multiple testing correction, Ks, for each stage are, in order, , N ⋅ M, N ⋅ M and .

Bottom Line: This sequential testing provides an efficient way to reduce the number of non-interacting variant pairs before the final interaction test.Our new methodology facilitates identification of new disease-relevant interactions from existing and future genome-wide association data, which may involve genes with previously unknown association to the disease.Moreover, it enables construction of interaction networks that provide a systems biology view of complex diseases, serving as a basis for more comprehensive understanding of disease pathophysiology and its clinical consequences.

View Article: PubMed Central - PubMed

Affiliation: Atherosclerosis Research Unit, Department of Medicine, Solna, Karolinska Institutet, Stockholm, Sweden; Department of Numerical Analysis and Computer Science, Stockholm University, Stockholm, Sweden; Science for Life Laboratory, Stockholm, Sweden.

ABSTRACT
Despite the success of genome-wide association studies in medical genetics, the underlying genetics of many complex diseases remains enigmatic. One plausible reason for this could be the failure to account for the presence of genetic interactions in current analyses. Exhaustive investigations of interactions are typically infeasible because the vast number of possible interactions impose hard statistical and computational challenges. There is, therefore, a need for computationally efficient methods that build on models appropriately capturing interaction. We introduce a new methodology where we augment the interaction hypothesis with a set of simpler hypotheses that are tested, in order of their complexity, against a saturated alternative hypothesis representing interaction. This sequential testing provides an efficient way to reduce the number of non-interacting variant pairs before the final interaction test. We devise two different methods, one that relies on a priori estimated numbers of marginally associated variants to correct for multiple tests, and a second that does this adaptively. We show that our methodology in general has an improved statistical power in comparison to seven other methods, and, using the idea of closed testing, that it controls the family-wise error rate. We apply our methodology to genetic data from the PROCARDIS coronary artery disease case/control cohort and discover three distinct interactions. While analyses on simulated data suggest that the statistical power may suffice for an exhaustive search of all variant pairs in ideal cases, we explore strategies for a priori selecting subsets of variant pairs to test. Our new methodology facilitates identification of new disease-relevant interactions from existing and future genome-wide association data, which may involve genes with previously unknown association to the disease. Moreover, it enables construction of interaction networks that provide a systems biology view of complex diseases, serving as a basis for more comprehensive understanding of disease pathophysiology and its clinical consequences.

No MeSH data available.


Related in: MedlinePlus