Limits...
Prioritizing causal disease genes using unbiased genomic features.

Deo RC, Musso G, Tasan M, Tang P, Poon A, Yuan C, Felix JF, Vasan RS, Beroukhim R, De Marco T, Kwok PY, MacRae CA, Roth FP - Genome Biol. (2014)

Bottom Line: In practice, however, identification of the actual genes contributing to disease pathogenesis has lagged behind identification of associated loci, thus limiting the clinical benefits.Using a zebrafish model, we experimentally validate FLNC and identify a novel FLNC splice-site mutation in a patient with severe DCM.Our approach stands to assist interpretation of large-scale genetic studies without compromising their fundamentally unbiased nature.

View Article: PubMed Central - PubMed

ABSTRACT

Background: Cardiovascular disease (CVD) is the leading cause of death in the developed world. Human genetic studies, including genome-wide sequencing and SNP-array approaches, promise to reveal disease genes and mechanisms representing new therapeutic targets. In practice, however, identification of the actual genes contributing to disease pathogenesis has lagged behind identification of associated loci, thus limiting the clinical benefits.

Results: To aid in localizing causal genes, we develop a machine learning approach, Objective Prioritization for Enhanced Novelty (OPEN), which quantitatively prioritizes gene-disease associations based on a diverse group of genomic features. This approach uses only unbiased predictive features and thus is not hampered by a preference towards previously well-characterized genes. We demonstrate success in identifying genetic determinants for CVD-related traits, including cholesterol levels, blood pressure, and conduction system and cardiomyopathy phenotypes. Using OPEN, we prioritize genes, including FLNC, for association with increased left ventricular diameter, which is a defining feature of a prevalent cardiovascular disorder, dilated cardiomyopathy or DCM. Using a zebrafish model, we experimentally validate FLNC and identify a novel FLNC splice-site mutation in a patient with severe DCM.

Conclusion: Our approach stands to assist interpretation of large-scale genetic studies without compromising their fundamentally unbiased nature.

Show MeSH

Related in: MedlinePlus

OPEN successfully prioritizes causal genes for complex traits. (A) Receiver operating characteristic (ROC) curves for prioritization of 'likely positive' genes for LDL-cholesterol. (B) OPEN effectively prioritizes likely positives for low-density lipoprotein (LDL)-cholesterol within GWA loci. A histogram shows the distribution of the number of genes prioritized by random chance over 10,000 independent simulations, with arrow indicating the number prioritized by OPEN (P < 0.0001). (C) OPEN successfully prioritizes the statin target HMGCR at the 5q13.3 locus (left). A heatmap depicts the six genes at the LDL-associated 5q13.3 locus, with the first four columns indicating which genes are near the index variant, and which have been annotated with prior evidence via the Gene Ontology (GO), Mouse Phenotype Database (MPD) or through the Online Mendelian Inheritance in Man (OMIM) database. The final column depicts the OPEN score, with color scheme from beige to dark purple indicating increasing magnitude of the log odds for disease association provided by OPEN. At the 2p15 LDL-associated locus (right), OPEN ranks the un-annotated EHBP1 gene highest. (D) Area under the ROC curve (AUROC) values for cardiac (left) and non-cardiac phenotypes (right). EKG, electrocardiogram; HDL, high-density lipoprotein.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4279789&req=5

Fig2: OPEN successfully prioritizes causal genes for complex traits. (A) Receiver operating characteristic (ROC) curves for prioritization of 'likely positive' genes for LDL-cholesterol. (B) OPEN effectively prioritizes likely positives for low-density lipoprotein (LDL)-cholesterol within GWA loci. A histogram shows the distribution of the number of genes prioritized by random chance over 10,000 independent simulations, with arrow indicating the number prioritized by OPEN (P < 0.0001). (C) OPEN successfully prioritizes the statin target HMGCR at the 5q13.3 locus (left). A heatmap depicts the six genes at the LDL-associated 5q13.3 locus, with the first four columns indicating which genes are near the index variant, and which have been annotated with prior evidence via the Gene Ontology (GO), Mouse Phenotype Database (MPD) or through the Online Mendelian Inheritance in Man (OMIM) database. The final column depicts the OPEN score, with color scheme from beige to dark purple indicating increasing magnitude of the log odds for disease association provided by OPEN. At the 2p15 LDL-associated locus (right), OPEN ranks the un-annotated EHBP1 gene highest. (D) Area under the ROC curve (AUROC) values for cardiac (left) and non-cardiac phenotypes (right). EKG, electrocardiogram; HDL, high-density lipoprotein.

Mentions: Cross-validation was used to estimate the odds of disease association for every gene in the genome. We initially focused on CVD complex trait phenotypes for which GWA studies had identified 10 or more significantly associated loci (Figure 2A-D). These included plasma concentrations of cholesterol subfractions (high-density lipoprotein-cholesterol (HDL-C), low-density lipoprotein (LDL)-cholesterol, total cholesterol), plasma triglycerides, hypertension/blood pressure, heart rate, and three electrocardiographic phenotypes (QRS duration, QT interval, PR interval, as well as pooled phenotypes of all of these).Figure 2


Prioritizing causal disease genes using unbiased genomic features.

Deo RC, Musso G, Tasan M, Tang P, Poon A, Yuan C, Felix JF, Vasan RS, Beroukhim R, De Marco T, Kwok PY, MacRae CA, Roth FP - Genome Biol. (2014)

OPEN successfully prioritizes causal genes for complex traits. (A) Receiver operating characteristic (ROC) curves for prioritization of 'likely positive' genes for LDL-cholesterol. (B) OPEN effectively prioritizes likely positives for low-density lipoprotein (LDL)-cholesterol within GWA loci. A histogram shows the distribution of the number of genes prioritized by random chance over 10,000 independent simulations, with arrow indicating the number prioritized by OPEN (P < 0.0001). (C) OPEN successfully prioritizes the statin target HMGCR at the 5q13.3 locus (left). A heatmap depicts the six genes at the LDL-associated 5q13.3 locus, with the first four columns indicating which genes are near the index variant, and which have been annotated with prior evidence via the Gene Ontology (GO), Mouse Phenotype Database (MPD) or through the Online Mendelian Inheritance in Man (OMIM) database. The final column depicts the OPEN score, with color scheme from beige to dark purple indicating increasing magnitude of the log odds for disease association provided by OPEN. At the 2p15 LDL-associated locus (right), OPEN ranks the un-annotated EHBP1 gene highest. (D) Area under the ROC curve (AUROC) values for cardiac (left) and non-cardiac phenotypes (right). EKG, electrocardiogram; HDL, high-density lipoprotein.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4279789&req=5

Fig2: OPEN successfully prioritizes causal genes for complex traits. (A) Receiver operating characteristic (ROC) curves for prioritization of 'likely positive' genes for LDL-cholesterol. (B) OPEN effectively prioritizes likely positives for low-density lipoprotein (LDL)-cholesterol within GWA loci. A histogram shows the distribution of the number of genes prioritized by random chance over 10,000 independent simulations, with arrow indicating the number prioritized by OPEN (P < 0.0001). (C) OPEN successfully prioritizes the statin target HMGCR at the 5q13.3 locus (left). A heatmap depicts the six genes at the LDL-associated 5q13.3 locus, with the first four columns indicating which genes are near the index variant, and which have been annotated with prior evidence via the Gene Ontology (GO), Mouse Phenotype Database (MPD) or through the Online Mendelian Inheritance in Man (OMIM) database. The final column depicts the OPEN score, with color scheme from beige to dark purple indicating increasing magnitude of the log odds for disease association provided by OPEN. At the 2p15 LDL-associated locus (right), OPEN ranks the un-annotated EHBP1 gene highest. (D) Area under the ROC curve (AUROC) values for cardiac (left) and non-cardiac phenotypes (right). EKG, electrocardiogram; HDL, high-density lipoprotein.
Mentions: Cross-validation was used to estimate the odds of disease association for every gene in the genome. We initially focused on CVD complex trait phenotypes for which GWA studies had identified 10 or more significantly associated loci (Figure 2A-D). These included plasma concentrations of cholesterol subfractions (high-density lipoprotein-cholesterol (HDL-C), low-density lipoprotein (LDL)-cholesterol, total cholesterol), plasma triglycerides, hypertension/blood pressure, heart rate, and three electrocardiographic phenotypes (QRS duration, QT interval, PR interval, as well as pooled phenotypes of all of these).Figure 2

Bottom Line: In practice, however, identification of the actual genes contributing to disease pathogenesis has lagged behind identification of associated loci, thus limiting the clinical benefits.Using a zebrafish model, we experimentally validate FLNC and identify a novel FLNC splice-site mutation in a patient with severe DCM.Our approach stands to assist interpretation of large-scale genetic studies without compromising their fundamentally unbiased nature.

View Article: PubMed Central - PubMed

ABSTRACT

Background: Cardiovascular disease (CVD) is the leading cause of death in the developed world. Human genetic studies, including genome-wide sequencing and SNP-array approaches, promise to reveal disease genes and mechanisms representing new therapeutic targets. In practice, however, identification of the actual genes contributing to disease pathogenesis has lagged behind identification of associated loci, thus limiting the clinical benefits.

Results: To aid in localizing causal genes, we develop a machine learning approach, Objective Prioritization for Enhanced Novelty (OPEN), which quantitatively prioritizes gene-disease associations based on a diverse group of genomic features. This approach uses only unbiased predictive features and thus is not hampered by a preference towards previously well-characterized genes. We demonstrate success in identifying genetic determinants for CVD-related traits, including cholesterol levels, blood pressure, and conduction system and cardiomyopathy phenotypes. Using OPEN, we prioritize genes, including FLNC, for association with increased left ventricular diameter, which is a defining feature of a prevalent cardiovascular disorder, dilated cardiomyopathy or DCM. Using a zebrafish model, we experimentally validate FLNC and identify a novel FLNC splice-site mutation in a patient with severe DCM.

Conclusion: Our approach stands to assist interpretation of large-scale genetic studies without compromising their fundamentally unbiased nature.

Show MeSH
Related in: MedlinePlus