Limits...
MAGMA: generalized gene-set analysis of GWAS data.

de Leeuw CA, Mooij JM, Heskes T, Posthuma D - PLoS Comput. Biol. (2015)

Bottom Line: The gene analysis is based on a multiple regression model, to provide better statistical performance.The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn's Disease while maintaining a correct type 1 error rate.Moreover, the MAGMA analysis of the Crohn's Disease data was found to be considerably faster as well.

View Article: PubMed Central - PubMed

Affiliation: Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, VU University Amsterdam, Amsterdam, The Netherlands; Institute for Computing and Information Sciences, Radboud University Nijmegen, Nijmegen, The Netherlands.

ABSTRACT
By aggregating data for complex traits in a biologically meaningful way, gene and gene-set analysis constitute a valuable addition to single-marker analysis. However, although various methods for gene and gene-set analysis currently exist, they generally suffer from a number of issues. Statistical power for most methods is strongly affected by linkage disequilibrium between markers, multi-marker associations are often hard to detect, and the reliance on permutation to compute p-values tends to make the analysis computationally very expensive. To address these issues we have developed MAGMA, a novel tool for gene and gene-set analysis. The gene analysis is based on a multiple regression model, to provide better statistical performance. The gene-set analysis is built as a separate layer around the gene analysis for additional flexibility. This gene-set analysis also uses a regression structure to allow generalization to analysis of continuous properties of genes and simultaneous analysis of multiple gene sets and other gene properties. Simulations and an analysis of Crohn's Disease data are used to evaluate the performance of MAGMA and to compare it to a number of other gene and gene-set analysis tools. The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn's Disease while maintaining a correct type 1 error rate. Moreover, the MAGMA analysis of the Crohn's Disease data was found to be considerably faster as well.

No MeSH data available.


Related in: MedlinePlus

Comparison of competitive gene-set analysis results at different SNP cut-offs.Comparison of gene set -log10 p-values from the CD data competitive gene-set analysis at different SNP p-value cut-offs for ALIGATOR (top row), INRICH (middle row) and MAGENTA (bottom row). The highest cut-off on the horizontal axis is compared to each of the lower cut-offs. P-values for gene sets not evaluated at the lower cut-off are shown in grey. The shown correlations are for the -log10 p-values for gene-sets evaluated at both cut-offs. Horizontal and vertical grey dotted lines demarcate the p = 0.05 nominal significance threshold.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4401657&req=5

pcbi.1004219.g004: Comparison of competitive gene-set analysis results at different SNP cut-offs.Comparison of gene set -log10 p-values from the CD data competitive gene-set analysis at different SNP p-value cut-offs for ALIGATOR (top row), INRICH (middle row) and MAGENTA (bottom row). The highest cut-off on the horizontal axis is compared to each of the lower cut-offs. P-values for gene sets not evaluated at the lower cut-off are shown in grey. The shown correlations are for the -log10 p-values for gene-sets evaluated at both cut-offs. Horizontal and vertical grey dotted lines demarcate the p = 0.05 nominal significance threshold.

Mentions: Looking at the results in more detail (Fig 3) also suggests that the differences in results are not merely due to a difference in power. The concordance between methods is poor, with only MAGENTA and ALIGATOR showing a reasonable correlation in results. Moreover, there is considerable discordance between different p-values cut-offs for the same methods as well (Fig 4). This suggests that the different methods, or methods at different p-value cut-offs, are sensitive to distinctly different kinds of gene set associations. In particular, MAGMA and the other three methods at higher p-value cut-offs would be expected to respond best to gene-sets containing a larger number of somewhat associated genes. Conversely, at lower p-value cut-offs the latter three should become more sensitive to gene-sets containing a small number of more strongly associated genes. This is exemplified by the INRICH analysis. At the 0.0001 cut-off only quite strongly associated genes are counted as relevant, but as there are only 42 such genes overall the three gene sets (containing either 26 or 29 genes) become significant despite each containing only three relevant genes.


MAGMA: generalized gene-set analysis of GWAS data.

de Leeuw CA, Mooij JM, Heskes T, Posthuma D - PLoS Comput. Biol. (2015)

Comparison of competitive gene-set analysis results at different SNP cut-offs.Comparison of gene set -log10 p-values from the CD data competitive gene-set analysis at different SNP p-value cut-offs for ALIGATOR (top row), INRICH (middle row) and MAGENTA (bottom row). The highest cut-off on the horizontal axis is compared to each of the lower cut-offs. P-values for gene sets not evaluated at the lower cut-off are shown in grey. The shown correlations are for the -log10 p-values for gene-sets evaluated at both cut-offs. Horizontal and vertical grey dotted lines demarcate the p = 0.05 nominal significance threshold.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4401657&req=5

pcbi.1004219.g004: Comparison of competitive gene-set analysis results at different SNP cut-offs.Comparison of gene set -log10 p-values from the CD data competitive gene-set analysis at different SNP p-value cut-offs for ALIGATOR (top row), INRICH (middle row) and MAGENTA (bottom row). The highest cut-off on the horizontal axis is compared to each of the lower cut-offs. P-values for gene sets not evaluated at the lower cut-off are shown in grey. The shown correlations are for the -log10 p-values for gene-sets evaluated at both cut-offs. Horizontal and vertical grey dotted lines demarcate the p = 0.05 nominal significance threshold.
Mentions: Looking at the results in more detail (Fig 3) also suggests that the differences in results are not merely due to a difference in power. The concordance between methods is poor, with only MAGENTA and ALIGATOR showing a reasonable correlation in results. Moreover, there is considerable discordance between different p-values cut-offs for the same methods as well (Fig 4). This suggests that the different methods, or methods at different p-value cut-offs, are sensitive to distinctly different kinds of gene set associations. In particular, MAGMA and the other three methods at higher p-value cut-offs would be expected to respond best to gene-sets containing a larger number of somewhat associated genes. Conversely, at lower p-value cut-offs the latter three should become more sensitive to gene-sets containing a small number of more strongly associated genes. This is exemplified by the INRICH analysis. At the 0.0001 cut-off only quite strongly associated genes are counted as relevant, but as there are only 42 such genes overall the three gene sets (containing either 26 or 29 genes) become significant despite each containing only three relevant genes.

Bottom Line: The gene analysis is based on a multiple regression model, to provide better statistical performance.The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn's Disease while maintaining a correct type 1 error rate.Moreover, the MAGMA analysis of the Crohn's Disease data was found to be considerably faster as well.

View Article: PubMed Central - PubMed

Affiliation: Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, VU University Amsterdam, Amsterdam, The Netherlands; Institute for Computing and Information Sciences, Radboud University Nijmegen, Nijmegen, The Netherlands.

ABSTRACT
By aggregating data for complex traits in a biologically meaningful way, gene and gene-set analysis constitute a valuable addition to single-marker analysis. However, although various methods for gene and gene-set analysis currently exist, they generally suffer from a number of issues. Statistical power for most methods is strongly affected by linkage disequilibrium between markers, multi-marker associations are often hard to detect, and the reliance on permutation to compute p-values tends to make the analysis computationally very expensive. To address these issues we have developed MAGMA, a novel tool for gene and gene-set analysis. The gene analysis is based on a multiple regression model, to provide better statistical performance. The gene-set analysis is built as a separate layer around the gene analysis for additional flexibility. This gene-set analysis also uses a regression structure to allow generalization to analysis of continuous properties of genes and simultaneous analysis of multiple gene sets and other gene properties. Simulations and an analysis of Crohn's Disease data are used to evaluate the performance of MAGMA and to compare it to a number of other gene and gene-set analysis tools. The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn's Disease while maintaining a correct type 1 error rate. Moreover, the MAGMA analysis of the Crohn's Disease data was found to be considerably faster as well.

No MeSH data available.


Related in: MedlinePlus