Limits...
Improved human disease candidate gene prioritization using mouse phenotype.

Chen J, Xu H, Aronow BJ, Jegga AG - BMC Bioinformatics (2007)

Bottom Line: The majority of common diseases are multi-factorial and modified by genetically and mechanistically complex polygenic interactions and environmental factors.High-throughput genome-wide studies like linkage analysis and gene expression profiling, tend to be most useful for classification and characterization but do not provide sufficient information to identify or prioritize specific disease causal genes.Extending on an earlier hypothesis that the majority of genes that impact or cause disease share membership in any of several functional relationships we, for the first time, show the utility of mouse phenotype data in human disease gene prioritization.

View Article: PubMed Central - HTML - PubMed

Affiliation: Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, USA. Jing.Chen@cchmc.org

ABSTRACT

Background: The majority of common diseases are multi-factorial and modified by genetically and mechanistically complex polygenic interactions and environmental factors. High-throughput genome-wide studies like linkage analysis and gene expression profiling, tend to be most useful for classification and characterization but do not provide sufficient information to identify or prioritize specific disease causal genes.

Results: Extending on an earlier hypothesis that the majority of genes that impact or cause disease share membership in any of several functional relationships we, for the first time, show the utility of mouse phenotype data in human disease gene prioritization. We study the effect of different data integration methods, and based on the validation studies, we show that our approach, ToppGene http://toppgene.cchmc.org, outperforms two of the existing candidate gene prioritization methods, SUSPECTS and ENDEAVOUR.

Conclusion: The incorporation of phenotype information for mouse orthologs of human genes greatly improves the human disease candidate gene analysis and prioritization.

Show MeSH

Related in: MedlinePlus

ROC curves of random-gene cross-validation based on score ranks. Blue curve was generated from the 19 disease gene training sets. Black curve, negative control, was generated from 20 random training sets. See text for the definitions of sensitivity and specificity.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2194797&req=5

Figure 1: ROC curves of random-gene cross-validation based on score ranks. Blue curve was generated from the 19 disease gene training sets. Black curve, negative control, was generated from 20 random training sets. See text for the definitions of sensitivity and specificity.

Mentions: Using ToppGene, we first created the overall ROC curves. In order to compare with ENDEAVOUR directly, we followed the same definitions for sensitivity and specificity as described by Aerts et al [9]. Figure 1 shows the overall ROC curves using ToppGene. The AUC score of the 19 disease training sets was 0.916, and the sensitivity/specificity was 90/77, i.e. the "target" gene was ranked among the top 23% in 90% of the cases. In case of the control, the AUC score of the 20 random training sets was 0.503 (see section A of Table 3).


Improved human disease candidate gene prioritization using mouse phenotype.

Chen J, Xu H, Aronow BJ, Jegga AG - BMC Bioinformatics (2007)

ROC curves of random-gene cross-validation based on score ranks. Blue curve was generated from the 19 disease gene training sets. Black curve, negative control, was generated from 20 random training sets. See text for the definitions of sensitivity and specificity.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2194797&req=5

Figure 1: ROC curves of random-gene cross-validation based on score ranks. Blue curve was generated from the 19 disease gene training sets. Black curve, negative control, was generated from 20 random training sets. See text for the definitions of sensitivity and specificity.
Mentions: Using ToppGene, we first created the overall ROC curves. In order to compare with ENDEAVOUR directly, we followed the same definitions for sensitivity and specificity as described by Aerts et al [9]. Figure 1 shows the overall ROC curves using ToppGene. The AUC score of the 19 disease training sets was 0.916, and the sensitivity/specificity was 90/77, i.e. the "target" gene was ranked among the top 23% in 90% of the cases. In case of the control, the AUC score of the 20 random training sets was 0.503 (see section A of Table 3).

Bottom Line: The majority of common diseases are multi-factorial and modified by genetically and mechanistically complex polygenic interactions and environmental factors.High-throughput genome-wide studies like linkage analysis and gene expression profiling, tend to be most useful for classification and characterization but do not provide sufficient information to identify or prioritize specific disease causal genes.Extending on an earlier hypothesis that the majority of genes that impact or cause disease share membership in any of several functional relationships we, for the first time, show the utility of mouse phenotype data in human disease gene prioritization.

View Article: PubMed Central - HTML - PubMed

Affiliation: Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, USA. Jing.Chen@cchmc.org

ABSTRACT

Background: The majority of common diseases are multi-factorial and modified by genetically and mechanistically complex polygenic interactions and environmental factors. High-throughput genome-wide studies like linkage analysis and gene expression profiling, tend to be most useful for classification and characterization but do not provide sufficient information to identify or prioritize specific disease causal genes.

Results: Extending on an earlier hypothesis that the majority of genes that impact or cause disease share membership in any of several functional relationships we, for the first time, show the utility of mouse phenotype data in human disease gene prioritization. We study the effect of different data integration methods, and based on the validation studies, we show that our approach, ToppGene http://toppgene.cchmc.org, outperforms two of the existing candidate gene prioritization methods, SUSPECTS and ENDEAVOUR.

Conclusion: The incorporation of phenotype information for mouse orthologs of human genes greatly improves the human disease candidate gene analysis and prioritization.

Show MeSH
Related in: MedlinePlus