Limits...
MAGMA: generalized gene-set analysis of GWAS data.

de Leeuw CA, Mooij JM, Heskes T, Posthuma D - PLoS Comput. Biol. (2015)

Bottom Line: The gene analysis is based on a multiple regression model, to provide better statistical performance.The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn's Disease while maintaining a correct type 1 error rate.Moreover, the MAGMA analysis of the Crohn's Disease data was found to be considerably faster as well.

View Article: PubMed Central - PubMed

Affiliation: Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, VU University Amsterdam, Amsterdam, The Netherlands; Institute for Computing and Information Sciences, Radboud University Nijmegen, Nijmegen, The Netherlands.

ABSTRACT
By aggregating data for complex traits in a biologically meaningful way, gene and gene-set analysis constitute a valuable addition to single-marker analysis. However, although various methods for gene and gene-set analysis currently exist, they generally suffer from a number of issues. Statistical power for most methods is strongly affected by linkage disequilibrium between markers, multi-marker associations are often hard to detect, and the reliance on permutation to compute p-values tends to make the analysis computationally very expensive. To address these issues we have developed MAGMA, a novel tool for gene and gene-set analysis. The gene analysis is based on a multiple regression model, to provide better statistical performance. The gene-set analysis is built as a separate layer around the gene analysis for additional flexibility. This gene-set analysis also uses a regression structure to allow generalization to analysis of continuous properties of genes and simultaneous analysis of multiple gene sets and other gene properties. Simulations and an analysis of Crohn's Disease data are used to evaluate the performance of MAGMA and to compare it to a number of other gene and gene-set analysis tools. The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn's Disease while maintaining a correct type 1 error rate. Moreover, the MAGMA analysis of the Crohn's Disease data was found to be considerably faster as well.

No MeSH data available.


Related in: MedlinePlus

Comparison of self-contained gene-set analysis results.Gene set—log10 p-values from the CD data self-contained gene-set analysis for MAGMA and PLINK. Panel (A) shows the PLINK-avg (no pruning) results compared with the MAGMA-main analysis, panel (B) the PLINK-prune results compared with the MAGMA-main analysis and (C) the two PLINK analyses compared to each other. P-values below 10–8 are truncated to 10–8 (grey points) to preserve the visibility of the other points.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4401657&req=5

pcbi.1004219.g002: Comparison of self-contained gene-set analysis results.Gene set—log10 p-values from the CD data self-contained gene-set analysis for MAGMA and PLINK. Panel (A) shows the PLINK-avg (no pruning) results compared with the MAGMA-main analysis, panel (B) the PLINK-prune results compared with the MAGMA-main analysis and (C) the two PLINK analyses compared to each other. P-values below 10–8 are truncated to 10–8 (grey points) to preserve the visibility of the other points.

Mentions: As with the gene analysis, the results of the CD analysis (Table 3 and Fig 2) can again serve as a gauge of the relative power of the different gene-set analysis methods. For the self-contained gene-set analysis this comparison is straightforward with MAGMA showing considerably more power than the two PLINK analyses. For the most part MAGMA’s power advantage can be explained by the difference in the underlying gene model, given the superior power of the PC regression model over the SNP-wise model used by PLINK shown before. Differences in how the genes are combined may also play a role however since, in contrast to PLINK, MAGMA weighs genes equally rather than by the number of SNPs in them and explicitly takes correlations between genes into account. Of note is also that PLINK-prune does considerably better than PLINK-avg, and that its p-values are somewhat more strongly correlated with those of the MAGMA analysis (Fig 2). An additional summary statistics analysis (MAGMA-pval-1K) on SNP p-values and using 1,000 Genomes reference data was also performed. This showed less power than PLINK even though it uses the same model at the gene level, suggesting that the difference is due to how the genes are aggregated to gene-sets. One of the key differences in this regard is that PLINK gives larger genes greater weight whereas MAGMA weighs them equally. As such a likely explanation is that the PLINK results are partially driven by a smaller number of large genes, though constructing the intermediate models to verify this is beyond the scope of this paper.


MAGMA: generalized gene-set analysis of GWAS data.

de Leeuw CA, Mooij JM, Heskes T, Posthuma D - PLoS Comput. Biol. (2015)

Comparison of self-contained gene-set analysis results.Gene set—log10 p-values from the CD data self-contained gene-set analysis for MAGMA and PLINK. Panel (A) shows the PLINK-avg (no pruning) results compared with the MAGMA-main analysis, panel (B) the PLINK-prune results compared with the MAGMA-main analysis and (C) the two PLINK analyses compared to each other. P-values below 10–8 are truncated to 10–8 (grey points) to preserve the visibility of the other points.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4401657&req=5

pcbi.1004219.g002: Comparison of self-contained gene-set analysis results.Gene set—log10 p-values from the CD data self-contained gene-set analysis for MAGMA and PLINK. Panel (A) shows the PLINK-avg (no pruning) results compared with the MAGMA-main analysis, panel (B) the PLINK-prune results compared with the MAGMA-main analysis and (C) the two PLINK analyses compared to each other. P-values below 10–8 are truncated to 10–8 (grey points) to preserve the visibility of the other points.
Mentions: As with the gene analysis, the results of the CD analysis (Table 3 and Fig 2) can again serve as a gauge of the relative power of the different gene-set analysis methods. For the self-contained gene-set analysis this comparison is straightforward with MAGMA showing considerably more power than the two PLINK analyses. For the most part MAGMA’s power advantage can be explained by the difference in the underlying gene model, given the superior power of the PC regression model over the SNP-wise model used by PLINK shown before. Differences in how the genes are combined may also play a role however since, in contrast to PLINK, MAGMA weighs genes equally rather than by the number of SNPs in them and explicitly takes correlations between genes into account. Of note is also that PLINK-prune does considerably better than PLINK-avg, and that its p-values are somewhat more strongly correlated with those of the MAGMA analysis (Fig 2). An additional summary statistics analysis (MAGMA-pval-1K) on SNP p-values and using 1,000 Genomes reference data was also performed. This showed less power than PLINK even though it uses the same model at the gene level, suggesting that the difference is due to how the genes are aggregated to gene-sets. One of the key differences in this regard is that PLINK gives larger genes greater weight whereas MAGMA weighs them equally. As such a likely explanation is that the PLINK results are partially driven by a smaller number of large genes, though constructing the intermediate models to verify this is beyond the scope of this paper.

Bottom Line: The gene analysis is based on a multiple regression model, to provide better statistical performance.The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn's Disease while maintaining a correct type 1 error rate.Moreover, the MAGMA analysis of the Crohn's Disease data was found to be considerably faster as well.

View Article: PubMed Central - PubMed

Affiliation: Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, VU University Amsterdam, Amsterdam, The Netherlands; Institute for Computing and Information Sciences, Radboud University Nijmegen, Nijmegen, The Netherlands.

ABSTRACT
By aggregating data for complex traits in a biologically meaningful way, gene and gene-set analysis constitute a valuable addition to single-marker analysis. However, although various methods for gene and gene-set analysis currently exist, they generally suffer from a number of issues. Statistical power for most methods is strongly affected by linkage disequilibrium between markers, multi-marker associations are often hard to detect, and the reliance on permutation to compute p-values tends to make the analysis computationally very expensive. To address these issues we have developed MAGMA, a novel tool for gene and gene-set analysis. The gene analysis is based on a multiple regression model, to provide better statistical performance. The gene-set analysis is built as a separate layer around the gene analysis for additional flexibility. This gene-set analysis also uses a regression structure to allow generalization to analysis of continuous properties of genes and simultaneous analysis of multiple gene sets and other gene properties. Simulations and an analysis of Crohn's Disease data are used to evaluate the performance of MAGMA and to compare it to a number of other gene and gene-set analysis tools. The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn's Disease while maintaining a correct type 1 error rate. Moreover, the MAGMA analysis of the Crohn's Disease data was found to be considerably faster as well.

No MeSH data available.


Related in: MedlinePlus