Limits...
Prioritizing causal disease genes using unbiased genomic features.

Deo RC, Musso G, Tasan M, Tang P, Poon A, Yuan C, Felix JF, Vasan RS, Beroukhim R, De Marco T, Kwok PY, MacRae CA, Roth FP - Genome Biol. (2014)

Bottom Line: In practice, however, identification of the actual genes contributing to disease pathogenesis has lagged behind identification of associated loci, thus limiting the clinical benefits.Using a zebrafish model, we experimentally validate FLNC and identify a novel FLNC splice-site mutation in a patient with severe DCM.Our approach stands to assist interpretation of large-scale genetic studies without compromising their fundamentally unbiased nature.

View Article: PubMed Central - PubMed

ABSTRACT

Background: Cardiovascular disease (CVD) is the leading cause of death in the developed world. Human genetic studies, including genome-wide sequencing and SNP-array approaches, promise to reveal disease genes and mechanisms representing new therapeutic targets. In practice, however, identification of the actual genes contributing to disease pathogenesis has lagged behind identification of associated loci, thus limiting the clinical benefits.

Results: To aid in localizing causal genes, we develop a machine learning approach, Objective Prioritization for Enhanced Novelty (OPEN), which quantitatively prioritizes gene-disease associations based on a diverse group of genomic features. This approach uses only unbiased predictive features and thus is not hampered by a preference towards previously well-characterized genes. We demonstrate success in identifying genetic determinants for CVD-related traits, including cholesterol levels, blood pressure, and conduction system and cardiomyopathy phenotypes. Using OPEN, we prioritize genes, including FLNC, for association with increased left ventricular diameter, which is a defining feature of a prevalent cardiovascular disorder, dilated cardiomyopathy or DCM. Using a zebrafish model, we experimentally validate FLNC and identify a novel FLNC splice-site mutation in a patient with severe DCM.

Conclusion: Our approach stands to assist interpretation of large-scale genetic studies without compromising their fundamentally unbiased nature.

Show MeSH

Related in: MedlinePlus

OPEN successfully predicts cardiomyopathy genes. (A) ROC curves for hypertrophic cardiomyopathy (HCM, left) and dilated cardiomyopathy (DCM, right). (B) Top ranked genes according to OPEN score for HCM. Log-odds of disease association are obtained through cross-validation. Blue bars represent positive training examples. (C) OPEN scores for DCM (as a Mendelian disease) are useful for prioritizing genes at DCM GWA loci. Each locus is represented by a scatter plot of OPEN score against chromosomal position, with every gene at the locus represented by a circle. Gene symbols for top ranked genes are provided. Purple coloring indicates that BAG3 has already been implicated in a Mendelian form of DCM.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4279789&req=5

Fig3: OPEN successfully predicts cardiomyopathy genes. (A) ROC curves for hypertrophic cardiomyopathy (HCM, left) and dilated cardiomyopathy (DCM, right). (B) Top ranked genes according to OPEN score for HCM. Log-odds of disease association are obtained through cross-validation. Blue bars represent positive training examples. (C) OPEN scores for DCM (as a Mendelian disease) are useful for prioritizing genes at DCM GWA loci. Each locus is represented by a scatter plot of OPEN score against chromosomal position, with every gene at the locus represented by a circle. Gene symbols for top ranked genes are provided. Purple coloring indicates that BAG3 has already been implicated in a Mendelian form of DCM.

Mentions: Given that only 25 to 60% of familial cases of DCM or HCM can be explained by our current census of CMP genes [32,33], genetic studies of CMP patients remain a priority. Unfortunately, GWA results have been difficult to interpret, as causal genes are hard to differentiate amongst the many genes harboring rare mutations [34]. To provide an unbiased, independent method of prioritizing causal CMP genes, we applied our OPEN approach to HCM and DCM, using established causal genes as training examples. OPEN successfully predicted established CMP genes, as evidenced by an AUROC of 0.88 and 0.96 for DCM and HCM, respectively (Figure 3A). The prominent clustering of known CMP causal genes at the top of the list (Figure 3B; precision at 20% recall of 19% for DCM and 42% for HCM) strongly suggests that OPEN scores could be integrated with sequencing data to choose candidates for experimental validation.Figure 3


Prioritizing causal disease genes using unbiased genomic features.

Deo RC, Musso G, Tasan M, Tang P, Poon A, Yuan C, Felix JF, Vasan RS, Beroukhim R, De Marco T, Kwok PY, MacRae CA, Roth FP - Genome Biol. (2014)

OPEN successfully predicts cardiomyopathy genes. (A) ROC curves for hypertrophic cardiomyopathy (HCM, left) and dilated cardiomyopathy (DCM, right). (B) Top ranked genes according to OPEN score for HCM. Log-odds of disease association are obtained through cross-validation. Blue bars represent positive training examples. (C) OPEN scores for DCM (as a Mendelian disease) are useful for prioritizing genes at DCM GWA loci. Each locus is represented by a scatter plot of OPEN score against chromosomal position, with every gene at the locus represented by a circle. Gene symbols for top ranked genes are provided. Purple coloring indicates that BAG3 has already been implicated in a Mendelian form of DCM.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4279789&req=5

Fig3: OPEN successfully predicts cardiomyopathy genes. (A) ROC curves for hypertrophic cardiomyopathy (HCM, left) and dilated cardiomyopathy (DCM, right). (B) Top ranked genes according to OPEN score for HCM. Log-odds of disease association are obtained through cross-validation. Blue bars represent positive training examples. (C) OPEN scores for DCM (as a Mendelian disease) are useful for prioritizing genes at DCM GWA loci. Each locus is represented by a scatter plot of OPEN score against chromosomal position, with every gene at the locus represented by a circle. Gene symbols for top ranked genes are provided. Purple coloring indicates that BAG3 has already been implicated in a Mendelian form of DCM.
Mentions: Given that only 25 to 60% of familial cases of DCM or HCM can be explained by our current census of CMP genes [32,33], genetic studies of CMP patients remain a priority. Unfortunately, GWA results have been difficult to interpret, as causal genes are hard to differentiate amongst the many genes harboring rare mutations [34]. To provide an unbiased, independent method of prioritizing causal CMP genes, we applied our OPEN approach to HCM and DCM, using established causal genes as training examples. OPEN successfully predicted established CMP genes, as evidenced by an AUROC of 0.88 and 0.96 for DCM and HCM, respectively (Figure 3A). The prominent clustering of known CMP causal genes at the top of the list (Figure 3B; precision at 20% recall of 19% for DCM and 42% for HCM) strongly suggests that OPEN scores could be integrated with sequencing data to choose candidates for experimental validation.Figure 3

Bottom Line: In practice, however, identification of the actual genes contributing to disease pathogenesis has lagged behind identification of associated loci, thus limiting the clinical benefits.Using a zebrafish model, we experimentally validate FLNC and identify a novel FLNC splice-site mutation in a patient with severe DCM.Our approach stands to assist interpretation of large-scale genetic studies without compromising their fundamentally unbiased nature.

View Article: PubMed Central - PubMed

ABSTRACT

Background: Cardiovascular disease (CVD) is the leading cause of death in the developed world. Human genetic studies, including genome-wide sequencing and SNP-array approaches, promise to reveal disease genes and mechanisms representing new therapeutic targets. In practice, however, identification of the actual genes contributing to disease pathogenesis has lagged behind identification of associated loci, thus limiting the clinical benefits.

Results: To aid in localizing causal genes, we develop a machine learning approach, Objective Prioritization for Enhanced Novelty (OPEN), which quantitatively prioritizes gene-disease associations based on a diverse group of genomic features. This approach uses only unbiased predictive features and thus is not hampered by a preference towards previously well-characterized genes. We demonstrate success in identifying genetic determinants for CVD-related traits, including cholesterol levels, blood pressure, and conduction system and cardiomyopathy phenotypes. Using OPEN, we prioritize genes, including FLNC, for association with increased left ventricular diameter, which is a defining feature of a prevalent cardiovascular disorder, dilated cardiomyopathy or DCM. Using a zebrafish model, we experimentally validate FLNC and identify a novel FLNC splice-site mutation in a patient with severe DCM.

Conclusion: Our approach stands to assist interpretation of large-scale genetic studies without compromising their fundamentally unbiased nature.

Show MeSH
Related in: MedlinePlus