Limits...
Canonical correlation analysis for gene-based pleiotropy discovery.

Seoane JA, Campbell C, Day IN, Casas JP, Gaunt TR - PLoS Comput. Biol. (2014)

Bottom Line: To apply CCA, we must restrict the number of attributes relative to the number of samples.In order to do this, we use an attribute selection strategy based on a binary genetic algorithm.Applied to a UK-based prospective cohort study of 4286 women (the British Women's Heart and Health Study), we find improved statistical power in the detection of previously reported genetic associations, and identify a number of novel pleiotropic associations between genetic variants and phenotypes.

View Article: PubMed Central - PubMed

Affiliation: School of Social and Community Medicine, University of Bristol, Bristol, United Kingdom.

ABSTRACT
Genome-wide association studies have identified a wealth of genetic variants involved in complex traits and multifactorial diseases. There is now considerable interest in testing variants for association with multiple phenotypes (pleiotropy) and for testing multiple variants for association with a single phenotype (gene-based association tests). Such approaches can increase statistical power by combining evidence for association over multiple phenotypes or genetic variants respectively. Canonical Correlation Analysis (CCA) measures the correlation between two sets of multidimensional variables, and thus offers the potential to combine these two approaches. To apply CCA, we must restrict the number of attributes relative to the number of samples. Hence we consider modules of genetic variation that can comprise a gene, a pathway or another biologically relevant grouping, and/or a set of phenotypes. In order to do this, we use an attribute selection strategy based on a binary genetic algorithm. Applied to a UK-based prospective cohort study of 4286 women (the British Women's Heart and Health Study), we find improved statistical power in the detection of previously reported genetic associations, and identify a number of novel pleiotropic associations between genetic variants and phenotypes. New discoveries include gene-based association of NSF with triglyceride levels and several genes (ACSM3, ERI2, IL18RAP, IL23RAP and NRG1) with left ventricular hypertrophy phenotypes. In multiple-phenotype analyses we find association of NRG1 with left ventricular hypertrophy phenotypes, fibrinogen and urea and pleiotropic relationships of F7 and F10 with Factor VII, Factor IX and cholesterol levels.

Show MeSH

Related in: MedlinePlus

Hive plot single gene/multiple phenotype.This figure shows a hive plot for gene/phenotype association rules. The vertical axis represents the association rules (higher, more association). The left axis represents phenotypes and the right axis represents genes. An interactive hive plot is published on the project webpage (http://pleioexp.epi.bris.ac.uk/cca/geneNphenHive.html).
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4199483&req=5

pcbi-1003876-g003: Hive plot single gene/multiple phenotype.This figure shows a hive plot for gene/phenotype association rules. The vertical axis represents the association rules (higher, more association). The left axis represents phenotypes and the right axis represents genes. An interactive hive plot is published on the project webpage (http://pleioexp.epi.bris.ac.uk/cca/geneNphenHive.html).

Mentions: In Table 2 we show some of the most important pleiotropic genotype/multiple phenotype associations, including the p-value of CCA association and the phenotypes with which they are associated. We also show Fisher's combined association value and, in parentheses, the association value of the genes and the single phenotype. In Table S2 we show all the results for associations between one gene/multiple phenotypes. In order to correct for multiple associations, we use a Bonferroni correction for 3648 genes and combinations of 82 phenotypes in subsets of 24 to 2 groups. We chose 24 because it is the maximum number of different phenotypes in one association rule (an association rule is a combination of a number of phenotypes associated with a number of genes) selected by the genetic algorithm (see the multiple test association correction paragraph in Methods). This combination gives 5.36×1020 different phenotypic rules, giving a threshold p-value of 2.55×10−25 equivalent to p = 0.05 for a single test. In Figure 2, we use a heatmap plot to represent the most important (higher association) pleiotropic relations between phenotypes and genotypes. Also, we use a hive plot (interactive plot available online) in Figure 3. In this diagram, vertical axis represents the association between the phenotype (left axis) and genotype (right axis). Association rules are ordered in the diagram following the association value (the higher association, the higher in the plot).


Canonical correlation analysis for gene-based pleiotropy discovery.

Seoane JA, Campbell C, Day IN, Casas JP, Gaunt TR - PLoS Comput. Biol. (2014)

Hive plot single gene/multiple phenotype.This figure shows a hive plot for gene/phenotype association rules. The vertical axis represents the association rules (higher, more association). The left axis represents phenotypes and the right axis represents genes. An interactive hive plot is published on the project webpage (http://pleioexp.epi.bris.ac.uk/cca/geneNphenHive.html).
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4199483&req=5

pcbi-1003876-g003: Hive plot single gene/multiple phenotype.This figure shows a hive plot for gene/phenotype association rules. The vertical axis represents the association rules (higher, more association). The left axis represents phenotypes and the right axis represents genes. An interactive hive plot is published on the project webpage (http://pleioexp.epi.bris.ac.uk/cca/geneNphenHive.html).
Mentions: In Table 2 we show some of the most important pleiotropic genotype/multiple phenotype associations, including the p-value of CCA association and the phenotypes with which they are associated. We also show Fisher's combined association value and, in parentheses, the association value of the genes and the single phenotype. In Table S2 we show all the results for associations between one gene/multiple phenotypes. In order to correct for multiple associations, we use a Bonferroni correction for 3648 genes and combinations of 82 phenotypes in subsets of 24 to 2 groups. We chose 24 because it is the maximum number of different phenotypes in one association rule (an association rule is a combination of a number of phenotypes associated with a number of genes) selected by the genetic algorithm (see the multiple test association correction paragraph in Methods). This combination gives 5.36×1020 different phenotypic rules, giving a threshold p-value of 2.55×10−25 equivalent to p = 0.05 for a single test. In Figure 2, we use a heatmap plot to represent the most important (higher association) pleiotropic relations between phenotypes and genotypes. Also, we use a hive plot (interactive plot available online) in Figure 3. In this diagram, vertical axis represents the association between the phenotype (left axis) and genotype (right axis). Association rules are ordered in the diagram following the association value (the higher association, the higher in the plot).

Bottom Line: To apply CCA, we must restrict the number of attributes relative to the number of samples.In order to do this, we use an attribute selection strategy based on a binary genetic algorithm.Applied to a UK-based prospective cohort study of 4286 women (the British Women's Heart and Health Study), we find improved statistical power in the detection of previously reported genetic associations, and identify a number of novel pleiotropic associations between genetic variants and phenotypes.

View Article: PubMed Central - PubMed

Affiliation: School of Social and Community Medicine, University of Bristol, Bristol, United Kingdom.

ABSTRACT
Genome-wide association studies have identified a wealth of genetic variants involved in complex traits and multifactorial diseases. There is now considerable interest in testing variants for association with multiple phenotypes (pleiotropy) and for testing multiple variants for association with a single phenotype (gene-based association tests). Such approaches can increase statistical power by combining evidence for association over multiple phenotypes or genetic variants respectively. Canonical Correlation Analysis (CCA) measures the correlation between two sets of multidimensional variables, and thus offers the potential to combine these two approaches. To apply CCA, we must restrict the number of attributes relative to the number of samples. Hence we consider modules of genetic variation that can comprise a gene, a pathway or another biologically relevant grouping, and/or a set of phenotypes. In order to do this, we use an attribute selection strategy based on a binary genetic algorithm. Applied to a UK-based prospective cohort study of 4286 women (the British Women's Heart and Health Study), we find improved statistical power in the detection of previously reported genetic associations, and identify a number of novel pleiotropic associations between genetic variants and phenotypes. New discoveries include gene-based association of NSF with triglyceride levels and several genes (ACSM3, ERI2, IL18RAP, IL23RAP and NRG1) with left ventricular hypertrophy phenotypes. In multiple-phenotype analyses we find association of NRG1 with left ventricular hypertrophy phenotypes, fibrinogen and urea and pleiotropic relationships of F7 and F10 with Factor VII, Factor IX and cholesterol levels.

Show MeSH
Related in: MedlinePlus