Limits...
metaCCA: summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis.

Cichonska A, Rousu J, Marttinen P, Kangas AJ, Soininen P, Lehtimäki T, Raitakari OT, Järvelin MR, Salomaa V, Ala-Korpela M, Ripatti S, Pirinen M - Bioinformatics (2016)

Bottom Line: However, analyzing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly.It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness.Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data.Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies.

View Article: PubMed Central - PubMed

Affiliation: Institute for Molecular Medicine Finland FIMM, University of Helsinki, Helsinki, Finland, Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, Espoo, Finland.

No MeSH data available.


Multi-SNP–multi-trait analysis: −log 10 P-values of CCA on pooled individual-level datasets (NFBC + YFS), and the meta-analyses conducted using metaCCA, as a function of the number of SNPs representing a gene. Sets of 2–25 SNPs were tested for an association with the group of 9 related lipid measures. In practice, the smallest number of SNPs that explain, at median, over 95% of the variance of the remaining SNPs would be chosen to represent a gene, and is marked with x. The evolution of the median variance explained versus the number of SNPs is shown in Supplementary Figure S8. For each gene, the largest −log 10 P-value from single-SNP–single-trait tests (top univariate) is represented by a dashed line. The largest single-SNP–multi-trait −log 10 P-values are 11.54 for APOE, 23.77 for CETP, 9.64 for GCKR, 6.58 for PCSK9 and 0.97 for NOD2. The values are summarized with details in Supplementary Table S4. The number of tests in each gene is 1 for multi-SNP, G for single-SNP–multi-trait, and  for single-SNP–single-trait tests, where G is the number of SNPs in that gene
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4920109&req=5

btw052-F4: Multi-SNP–multi-trait analysis: −log 10 P-values of CCA on pooled individual-level datasets (NFBC + YFS), and the meta-analyses conducted using metaCCA, as a function of the number of SNPs representing a gene. Sets of 2–25 SNPs were tested for an association with the group of 9 related lipid measures. In practice, the smallest number of SNPs that explain, at median, over 95% of the variance of the remaining SNPs would be chosen to represent a gene, and is marked with x. The evolution of the median variance explained versus the number of SNPs is shown in Supplementary Figure S8. For each gene, the largest −log 10 P-value from single-SNP–single-trait tests (top univariate) is represented by a dashed line. The largest single-SNP–multi-trait −log 10 P-values are 11.54 for APOE, 23.77 for CETP, 9.64 for GCKR, 6.58 for PCSK9 and 0.97 for NOD2. The values are summarized with details in Supplementary Table S4. The number of tests in each gene is 1 for multi-SNP, G for single-SNP–multi-trait, and for single-SNP–single-trait tests, where G is the number of SNPs in that gene

Mentions: Figure 4 summarizes the results of the multi-SNP–multi-trait meta-analysis, and shows the performance of metaCCA when different numbers of SNPs, from 2 up to 25, representing a gene, are tested jointly for an association with the group of 9 related lipid traits. Numbers of SNPs that are chosen by our approach (Section 2.4) are marked with x. Figure 4 validates that by using this protocol, a gene is described well, since when adding more SNPs no clear power gain is observed. Both metaCCA and metaCCA+ (Fig. 4, Supplementary Table S4) produced very accurate P-values. For the largest signals (APOE, CETP), −log 10 P-values are less than one unit overestimated by metaCCA, and underestimated by metaCCA+. These differences would be unlikely to lead to false inferences when a reference significance level in a gene-based analysis was set to , i.e. 5.61 on − log 10 scale, based on there being about 20 000 protein-coding genes in the human genome. At this level, both metaCCA and metaCCA+ found an association between APOE, CETP, GCKR and the network of VLDL and HDL particles studied. For APOE and CETP, gene-based signals are clearly higher than the univariate ones, even before accounting for different numbers of tests. Moreover, in case of APOE, the multi-SNP–multi-trait signal is nearly 4.5 units higher than the single-SNP–multi-trait one. Note that NOD2 has no (known) association with metabolic traits, and therefore it serves as a negative control Figure 4 and Supplementary Table S4.Fig. 4.


metaCCA: summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis.

Cichonska A, Rousu J, Marttinen P, Kangas AJ, Soininen P, Lehtimäki T, Raitakari OT, Järvelin MR, Salomaa V, Ala-Korpela M, Ripatti S, Pirinen M - Bioinformatics (2016)

Multi-SNP–multi-trait analysis: −log 10 P-values of CCA on pooled individual-level datasets (NFBC + YFS), and the meta-analyses conducted using metaCCA, as a function of the number of SNPs representing a gene. Sets of 2–25 SNPs were tested for an association with the group of 9 related lipid measures. In practice, the smallest number of SNPs that explain, at median, over 95% of the variance of the remaining SNPs would be chosen to represent a gene, and is marked with x. The evolution of the median variance explained versus the number of SNPs is shown in Supplementary Figure S8. For each gene, the largest −log 10 P-value from single-SNP–single-trait tests (top univariate) is represented by a dashed line. The largest single-SNP–multi-trait −log 10 P-values are 11.54 for APOE, 23.77 for CETP, 9.64 for GCKR, 6.58 for PCSK9 and 0.97 for NOD2. The values are summarized with details in Supplementary Table S4. The number of tests in each gene is 1 for multi-SNP, G for single-SNP–multi-trait, and  for single-SNP–single-trait tests, where G is the number of SNPs in that gene
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4920109&req=5

btw052-F4: Multi-SNP–multi-trait analysis: −log 10 P-values of CCA on pooled individual-level datasets (NFBC + YFS), and the meta-analyses conducted using metaCCA, as a function of the number of SNPs representing a gene. Sets of 2–25 SNPs were tested for an association with the group of 9 related lipid measures. In practice, the smallest number of SNPs that explain, at median, over 95% of the variance of the remaining SNPs would be chosen to represent a gene, and is marked with x. The evolution of the median variance explained versus the number of SNPs is shown in Supplementary Figure S8. For each gene, the largest −log 10 P-value from single-SNP–single-trait tests (top univariate) is represented by a dashed line. The largest single-SNP–multi-trait −log 10 P-values are 11.54 for APOE, 23.77 for CETP, 9.64 for GCKR, 6.58 for PCSK9 and 0.97 for NOD2. The values are summarized with details in Supplementary Table S4. The number of tests in each gene is 1 for multi-SNP, G for single-SNP–multi-trait, and for single-SNP–single-trait tests, where G is the number of SNPs in that gene
Mentions: Figure 4 summarizes the results of the multi-SNP–multi-trait meta-analysis, and shows the performance of metaCCA when different numbers of SNPs, from 2 up to 25, representing a gene, are tested jointly for an association with the group of 9 related lipid traits. Numbers of SNPs that are chosen by our approach (Section 2.4) are marked with x. Figure 4 validates that by using this protocol, a gene is described well, since when adding more SNPs no clear power gain is observed. Both metaCCA and metaCCA+ (Fig. 4, Supplementary Table S4) produced very accurate P-values. For the largest signals (APOE, CETP), −log 10 P-values are less than one unit overestimated by metaCCA, and underestimated by metaCCA+. These differences would be unlikely to lead to false inferences when a reference significance level in a gene-based analysis was set to , i.e. 5.61 on − log 10 scale, based on there being about 20 000 protein-coding genes in the human genome. At this level, both metaCCA and metaCCA+ found an association between APOE, CETP, GCKR and the network of VLDL and HDL particles studied. For APOE and CETP, gene-based signals are clearly higher than the univariate ones, even before accounting for different numbers of tests. Moreover, in case of APOE, the multi-SNP–multi-trait signal is nearly 4.5 units higher than the single-SNP–multi-trait one. Note that NOD2 has no (known) association with metabolic traits, and therefore it serves as a negative control Figure 4 and Supplementary Table S4.Fig. 4.

Bottom Line: However, analyzing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly.It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness.Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data.Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies.

View Article: PubMed Central - PubMed

Affiliation: Institute for Molecular Medicine Finland FIMM, University of Helsinki, Helsinki, Finland, Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, Espoo, Finland.

No MeSH data available.