Limits...
Prioritizing causal disease genes using unbiased genomic features.

Deo RC, Musso G, Tasan M, Tang P, Poon A, Yuan C, Felix JF, Vasan RS, Beroukhim R, De Marco T, Kwok PY, MacRae CA, Roth FP - Genome Biol. (2014)

Bottom Line: In practice, however, identification of the actual genes contributing to disease pathogenesis has lagged behind identification of associated loci, thus limiting the clinical benefits.Using a zebrafish model, we experimentally validate FLNC and identify a novel FLNC splice-site mutation in a patient with severe DCM.Our approach stands to assist interpretation of large-scale genetic studies without compromising their fundamentally unbiased nature.

View Article: PubMed Central - PubMed

ABSTRACT

Background: Cardiovascular disease (CVD) is the leading cause of death in the developed world. Human genetic studies, including genome-wide sequencing and SNP-array approaches, promise to reveal disease genes and mechanisms representing new therapeutic targets. In practice, however, identification of the actual genes contributing to disease pathogenesis has lagged behind identification of associated loci, thus limiting the clinical benefits.

Results: To aid in localizing causal genes, we develop a machine learning approach, Objective Prioritization for Enhanced Novelty (OPEN), which quantitatively prioritizes gene-disease associations based on a diverse group of genomic features. This approach uses only unbiased predictive features and thus is not hampered by a preference towards previously well-characterized genes. We demonstrate success in identifying genetic determinants for CVD-related traits, including cholesterol levels, blood pressure, and conduction system and cardiomyopathy phenotypes. Using OPEN, we prioritize genes, including FLNC, for association with increased left ventricular diameter, which is a defining feature of a prevalent cardiovascular disorder, dilated cardiomyopathy or DCM. Using a zebrafish model, we experimentally validate FLNC and identify a novel FLNC splice-site mutation in a patient with severe DCM.

Conclusion: Our approach stands to assist interpretation of large-scale genetic studies without compromising their fundamentally unbiased nature.

Show MeSH

Related in: MedlinePlus

OPEN scores for DCM successfully prioritize genes at loci identified through a GWA study for left ventricular diameter (LVD). OPEN scores for DCM were mapped to genes at loci marginally associated with LVD (P < 5 × 10-5). Eleven loci have a high-scoring top-ranked gene based on OPEN scores for DCM association. Blue coloring indicates the seven genes are mutated in Mendelian forms of cardiac disease, including DCM, HCM, arrhythmogenic right ventricular cardiomyopathy and catecholaminergic polymorphic ventricular tachycardia (P = 8 × 10-5 for enrichment).
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4279789&req=5

Fig4: OPEN scores for DCM successfully prioritize genes at loci identified through a GWA study for left ventricular diameter (LVD). OPEN scores for DCM were mapped to genes at loci marginally associated with LVD (P < 5 × 10-5). Eleven loci have a high-scoring top-ranked gene based on OPEN scores for DCM association. Blue coloring indicates the seven genes are mutated in Mendelian forms of cardiac disease, including DCM, HCM, arrhythmogenic right ventricular cardiomyopathy and catecholaminergic polymorphic ventricular tachycardia (P = 8 × 10-5 for enrichment).

Mentions: Left ventricular diameter (LVD) is a heritable complex trait that predicts incident congestive heart failure and mortality [37]. A recent GWA meta-analysis of LVD found only a single locus at genome-wide significance [21], though interestingly, the locus includes PLN (which encodes phospholamban), a known DCM gene [38]. Given the phenotypic similarity between DCM and enlarged LVD, we hypothesized that OPEN scores for DCM would be useful to prioritize genes at LVD loci, even if these failed to meet genome-wide significance in GWA analysis. We first selected all SNPs with a nominal P-value of 5 × 10-5 (considerably less stringent than the conventionally accepted threshold of 5 × 10-8 for significance), mapped these to neighboring genes, and looked for any loci where the top-ranked gene (according to DCM score) had a higher score than that expected based on random chance (see Materials and methods). We found 11 such loci (of which 10 were multigenic; Figure 4). For four of these loci, the top gene according to OPEN had been shown to be mutated in DCM (PLN, ACTN2, TNNT2, TTN) [39], while for two others, the top gene was known to be mutated in HCM (MYL2) or an arrhythmogenic form of cardiac disease (CASQ2) [40]. For two more, the top gene (GBE1 and APOBEC2) affects heart contractile function in animal models [41,42]. Note that the OPEN procedure is ‘clean’, in the sense that the score for any given gene is derived from models that never used that gene in training. Overall, the 'insignificant' LVD GWA loci were highly enriched for genes causing inherited abnormalities of the heart muscle (P = 8 × 10-5, odds ratio = 7.65, Fisher’s exact test).Figure 4


Prioritizing causal disease genes using unbiased genomic features.

Deo RC, Musso G, Tasan M, Tang P, Poon A, Yuan C, Felix JF, Vasan RS, Beroukhim R, De Marco T, Kwok PY, MacRae CA, Roth FP - Genome Biol. (2014)

OPEN scores for DCM successfully prioritize genes at loci identified through a GWA study for left ventricular diameter (LVD). OPEN scores for DCM were mapped to genes at loci marginally associated with LVD (P < 5 × 10-5). Eleven loci have a high-scoring top-ranked gene based on OPEN scores for DCM association. Blue coloring indicates the seven genes are mutated in Mendelian forms of cardiac disease, including DCM, HCM, arrhythmogenic right ventricular cardiomyopathy and catecholaminergic polymorphic ventricular tachycardia (P = 8 × 10-5 for enrichment).
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4279789&req=5

Fig4: OPEN scores for DCM successfully prioritize genes at loci identified through a GWA study for left ventricular diameter (LVD). OPEN scores for DCM were mapped to genes at loci marginally associated with LVD (P < 5 × 10-5). Eleven loci have a high-scoring top-ranked gene based on OPEN scores for DCM association. Blue coloring indicates the seven genes are mutated in Mendelian forms of cardiac disease, including DCM, HCM, arrhythmogenic right ventricular cardiomyopathy and catecholaminergic polymorphic ventricular tachycardia (P = 8 × 10-5 for enrichment).
Mentions: Left ventricular diameter (LVD) is a heritable complex trait that predicts incident congestive heart failure and mortality [37]. A recent GWA meta-analysis of LVD found only a single locus at genome-wide significance [21], though interestingly, the locus includes PLN (which encodes phospholamban), a known DCM gene [38]. Given the phenotypic similarity between DCM and enlarged LVD, we hypothesized that OPEN scores for DCM would be useful to prioritize genes at LVD loci, even if these failed to meet genome-wide significance in GWA analysis. We first selected all SNPs with a nominal P-value of 5 × 10-5 (considerably less stringent than the conventionally accepted threshold of 5 × 10-8 for significance), mapped these to neighboring genes, and looked for any loci where the top-ranked gene (according to DCM score) had a higher score than that expected based on random chance (see Materials and methods). We found 11 such loci (of which 10 were multigenic; Figure 4). For four of these loci, the top gene according to OPEN had been shown to be mutated in DCM (PLN, ACTN2, TNNT2, TTN) [39], while for two others, the top gene was known to be mutated in HCM (MYL2) or an arrhythmogenic form of cardiac disease (CASQ2) [40]. For two more, the top gene (GBE1 and APOBEC2) affects heart contractile function in animal models [41,42]. Note that the OPEN procedure is ‘clean’, in the sense that the score for any given gene is derived from models that never used that gene in training. Overall, the 'insignificant' LVD GWA loci were highly enriched for genes causing inherited abnormalities of the heart muscle (P = 8 × 10-5, odds ratio = 7.65, Fisher’s exact test).Figure 4

Bottom Line: In practice, however, identification of the actual genes contributing to disease pathogenesis has lagged behind identification of associated loci, thus limiting the clinical benefits.Using a zebrafish model, we experimentally validate FLNC and identify a novel FLNC splice-site mutation in a patient with severe DCM.Our approach stands to assist interpretation of large-scale genetic studies without compromising their fundamentally unbiased nature.

View Article: PubMed Central - PubMed

ABSTRACT

Background: Cardiovascular disease (CVD) is the leading cause of death in the developed world. Human genetic studies, including genome-wide sequencing and SNP-array approaches, promise to reveal disease genes and mechanisms representing new therapeutic targets. In practice, however, identification of the actual genes contributing to disease pathogenesis has lagged behind identification of associated loci, thus limiting the clinical benefits.

Results: To aid in localizing causal genes, we develop a machine learning approach, Objective Prioritization for Enhanced Novelty (OPEN), which quantitatively prioritizes gene-disease associations based on a diverse group of genomic features. This approach uses only unbiased predictive features and thus is not hampered by a preference towards previously well-characterized genes. We demonstrate success in identifying genetic determinants for CVD-related traits, including cholesterol levels, blood pressure, and conduction system and cardiomyopathy phenotypes. Using OPEN, we prioritize genes, including FLNC, for association with increased left ventricular diameter, which is a defining feature of a prevalent cardiovascular disorder, dilated cardiomyopathy or DCM. Using a zebrafish model, we experimentally validate FLNC and identify a novel FLNC splice-site mutation in a patient with severe DCM.

Conclusion: Our approach stands to assist interpretation of large-scale genetic studies without compromising their fundamentally unbiased nature.

Show MeSH
Related in: MedlinePlus