Limits...
MAGMA: generalized gene-set analysis of GWAS data.

de Leeuw CA, Mooij JM, Heskes T, Posthuma D - PLoS Comput. Biol. (2015)

Bottom Line: The gene analysis is based on a multiple regression model, to provide better statistical performance.The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn's Disease while maintaining a correct type 1 error rate.Moreover, the MAGMA analysis of the Crohn's Disease data was found to be considerably faster as well.

View Article: PubMed Central - PubMed

Affiliation: Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, VU University Amsterdam, Amsterdam, The Netherlands; Institute for Computing and Information Sciences, Radboud University Nijmegen, Nijmegen, The Netherlands.

ABSTRACT
By aggregating data for complex traits in a biologically meaningful way, gene and gene-set analysis constitute a valuable addition to single-marker analysis. However, although various methods for gene and gene-set analysis currently exist, they generally suffer from a number of issues. Statistical power for most methods is strongly affected by linkage disequilibrium between markers, multi-marker associations are often hard to detect, and the reliance on permutation to compute p-values tends to make the analysis computationally very expensive. To address these issues we have developed MAGMA, a novel tool for gene and gene-set analysis. The gene analysis is based on a multiple regression model, to provide better statistical performance. The gene-set analysis is built as a separate layer around the gene analysis for additional flexibility. This gene-set analysis also uses a regression structure to allow generalization to analysis of continuous properties of genes and simultaneous analysis of multiple gene sets and other gene properties. Simulations and an analysis of Crohn's Disease data are used to evaluate the performance of MAGMA and to compare it to a number of other gene and gene-set analysis tools. The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn's Disease while maintaining a correct type 1 error rate. Moreover, the MAGMA analysis of the Crohn's Disease data was found to be considerably faster as well.

No MeSH data available.


Related in: MedlinePlus

Comparison of gene analysis results for different test-statistics.Gene -log10 p-values from the CD data gene analysis in MAGMA for three different gene test-statistics, comparing analyses using (A) the mean χ2 statistic with the top χ2 statistic, (B) the mean χ2 statistic and the PC regression model and (C) the top χ2 statistic and the PC regression model. P-values below 10–8 are truncated to 10–8 (grey points) to preserve the visibility of the other points.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4401657&req=5

pcbi.1004219.g001: Comparison of gene analysis results for different test-statistics.Gene -log10 p-values from the CD data gene analysis in MAGMA for three different gene test-statistics, comparing analyses using (A) the mean χ2 statistic with the top χ2 statistic, (B) the mean χ2 statistic and the PC regression model and (C) the top χ2 statistic and the PC regression model. P-values below 10–8 are truncated to 10–8 (grey points) to preserve the visibility of the other points.

Mentions: Specific implementation issues can be ruled out as the cause of the power difference since the PLINK and VEGAS analyses yield results highly similar to their matched MAGMA models (S9 Fig), and using the pruning option in PLINK also has little effect on the overall results. This means that the difference must be due to the difference in the methods and test-statistics themselves. Comparing the MAGMA implementations of these models in Fig 1, the mean χ2 and top χ2 approaches are shown to produce very similar p-values. Moreover, the plots reveal that the superior power of the MAGMA-main model does not arise from consistently lower gene p-values, but rather from a small set of genes with low p-values for MAGMA-main that are simply not picked up by the other approaches. This is likely to be related to the way LD between SNPs is handled, as that is one of the key differences between the multiple regression model of MAGMA-main and all the others. A post-hoc power simulation indeed indicates that multi-marker effects with weak marginals are the most probable explanation (see ‘Supplemental Methods—Simulation Studies’).


MAGMA: generalized gene-set analysis of GWAS data.

de Leeuw CA, Mooij JM, Heskes T, Posthuma D - PLoS Comput. Biol. (2015)

Comparison of gene analysis results for different test-statistics.Gene -log10 p-values from the CD data gene analysis in MAGMA for three different gene test-statistics, comparing analyses using (A) the mean χ2 statistic with the top χ2 statistic, (B) the mean χ2 statistic and the PC regression model and (C) the top χ2 statistic and the PC regression model. P-values below 10–8 are truncated to 10–8 (grey points) to preserve the visibility of the other points.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4401657&req=5

pcbi.1004219.g001: Comparison of gene analysis results for different test-statistics.Gene -log10 p-values from the CD data gene analysis in MAGMA for three different gene test-statistics, comparing analyses using (A) the mean χ2 statistic with the top χ2 statistic, (B) the mean χ2 statistic and the PC regression model and (C) the top χ2 statistic and the PC regression model. P-values below 10–8 are truncated to 10–8 (grey points) to preserve the visibility of the other points.
Mentions: Specific implementation issues can be ruled out as the cause of the power difference since the PLINK and VEGAS analyses yield results highly similar to their matched MAGMA models (S9 Fig), and using the pruning option in PLINK also has little effect on the overall results. This means that the difference must be due to the difference in the methods and test-statistics themselves. Comparing the MAGMA implementations of these models in Fig 1, the mean χ2 and top χ2 approaches are shown to produce very similar p-values. Moreover, the plots reveal that the superior power of the MAGMA-main model does not arise from consistently lower gene p-values, but rather from a small set of genes with low p-values for MAGMA-main that are simply not picked up by the other approaches. This is likely to be related to the way LD between SNPs is handled, as that is one of the key differences between the multiple regression model of MAGMA-main and all the others. A post-hoc power simulation indeed indicates that multi-marker effects with weak marginals are the most probable explanation (see ‘Supplemental Methods—Simulation Studies’).

Bottom Line: The gene analysis is based on a multiple regression model, to provide better statistical performance.The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn's Disease while maintaining a correct type 1 error rate.Moreover, the MAGMA analysis of the Crohn's Disease data was found to be considerably faster as well.

View Article: PubMed Central - PubMed

Affiliation: Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, VU University Amsterdam, Amsterdam, The Netherlands; Institute for Computing and Information Sciences, Radboud University Nijmegen, Nijmegen, The Netherlands.

ABSTRACT
By aggregating data for complex traits in a biologically meaningful way, gene and gene-set analysis constitute a valuable addition to single-marker analysis. However, although various methods for gene and gene-set analysis currently exist, they generally suffer from a number of issues. Statistical power for most methods is strongly affected by linkage disequilibrium between markers, multi-marker associations are often hard to detect, and the reliance on permutation to compute p-values tends to make the analysis computationally very expensive. To address these issues we have developed MAGMA, a novel tool for gene and gene-set analysis. The gene analysis is based on a multiple regression model, to provide better statistical performance. The gene-set analysis is built as a separate layer around the gene analysis for additional flexibility. This gene-set analysis also uses a regression structure to allow generalization to analysis of continuous properties of genes and simultaneous analysis of multiple gene sets and other gene properties. Simulations and an analysis of Crohn's Disease data are used to evaluate the performance of MAGMA and to compare it to a number of other gene and gene-set analysis tools. The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn's Disease while maintaining a correct type 1 error rate. Moreover, the MAGMA analysis of the Crohn's Disease data was found to be considerably faster as well.

No MeSH data available.


Related in: MedlinePlus