Limits...
Improved human disease candidate gene prioritization using mouse phenotype.

Chen J, Xu H, Aronow BJ, Jegga AG - BMC Bioinformatics (2007)

Bottom Line: The majority of common diseases are multi-factorial and modified by genetically and mechanistically complex polygenic interactions and environmental factors.High-throughput genome-wide studies like linkage analysis and gene expression profiling, tend to be most useful for classification and characterization but do not provide sufficient information to identify or prioritize specific disease causal genes.Extending on an earlier hypothesis that the majority of genes that impact or cause disease share membership in any of several functional relationships we, for the first time, show the utility of mouse phenotype data in human disease gene prioritization.

View Article: PubMed Central - HTML - PubMed

Affiliation: Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, USA. Jing.Chen@cchmc.org

ABSTRACT

Background: The majority of common diseases are multi-factorial and modified by genetically and mechanistically complex polygenic interactions and environmental factors. High-throughput genome-wide studies like linkage analysis and gene expression profiling, tend to be most useful for classification and characterization but do not provide sufficient information to identify or prioritize specific disease causal genes.

Results: Extending on an earlier hypothesis that the majority of genes that impact or cause disease share membership in any of several functional relationships we, for the first time, show the utility of mouse phenotype data in human disease gene prioritization. We study the effect of different data integration methods, and based on the validation studies, we show that our approach, ToppGene http://toppgene.cchmc.org, outperforms two of the existing candidate gene prioritization methods, SUSPECTS and ENDEAVOUR.

Conclusion: The incorporation of phenotype information for mouse orthologs of human genes greatly improves the human disease candidate gene analysis and prioritization.

Show MeSH

Related in: MedlinePlus

ROC curves of random-gene cross-validation based on scores. The red curve was generated using all features sets (AUC score 0.913). The blue curve was generated without Mouse Phenotype annotations (AUC score 0.893). The orange curve was generated without Mouse Phenotype and Pubmed annotations (AUC score 0.888). See text for the definitions of sensitivity and specificity.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2194797&req=5

Figure 3: ROC curves of random-gene cross-validation based on scores. The red curve was generated using all features sets (AUC score 0.913). The blue curve was generated without Mouse Phenotype annotations (AUC score 0.893). The orange curve was generated without Mouse Phenotype and Pubmed annotations (AUC score 0.888). See text for the definitions of sensitivity and specificity.

Mentions: To understand better the relative performance and the power of each of the features in gene prioritization, we tested ToppGene by performing cross-validations with one of the features left out. The performance decreased significantly only when MP was removed (see ROC curve in Figure 3). As expected, the best performance was recorded when all the features were considered for prioritization, with an AUC of 0.913 (see ROC curve in Figure 3) and a coverage of ~89%. For a cutoff score of 0.93, the sensitivity/specificity was 74/90. In other words, 74% of the "target" genes were included in the candidate list (about 9-fold reduction from the original test set).


Improved human disease candidate gene prioritization using mouse phenotype.

Chen J, Xu H, Aronow BJ, Jegga AG - BMC Bioinformatics (2007)

ROC curves of random-gene cross-validation based on scores. The red curve was generated using all features sets (AUC score 0.913). The blue curve was generated without Mouse Phenotype annotations (AUC score 0.893). The orange curve was generated without Mouse Phenotype and Pubmed annotations (AUC score 0.888). See text for the definitions of sensitivity and specificity.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2194797&req=5

Figure 3: ROC curves of random-gene cross-validation based on scores. The red curve was generated using all features sets (AUC score 0.913). The blue curve was generated without Mouse Phenotype annotations (AUC score 0.893). The orange curve was generated without Mouse Phenotype and Pubmed annotations (AUC score 0.888). See text for the definitions of sensitivity and specificity.
Mentions: To understand better the relative performance and the power of each of the features in gene prioritization, we tested ToppGene by performing cross-validations with one of the features left out. The performance decreased significantly only when MP was removed (see ROC curve in Figure 3). As expected, the best performance was recorded when all the features were considered for prioritization, with an AUC of 0.913 (see ROC curve in Figure 3) and a coverage of ~89%. For a cutoff score of 0.93, the sensitivity/specificity was 74/90. In other words, 74% of the "target" genes were included in the candidate list (about 9-fold reduction from the original test set).

Bottom Line: The majority of common diseases are multi-factorial and modified by genetically and mechanistically complex polygenic interactions and environmental factors.High-throughput genome-wide studies like linkage analysis and gene expression profiling, tend to be most useful for classification and characterization but do not provide sufficient information to identify or prioritize specific disease causal genes.Extending on an earlier hypothesis that the majority of genes that impact or cause disease share membership in any of several functional relationships we, for the first time, show the utility of mouse phenotype data in human disease gene prioritization.

View Article: PubMed Central - HTML - PubMed

Affiliation: Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, USA. Jing.Chen@cchmc.org

ABSTRACT

Background: The majority of common diseases are multi-factorial and modified by genetically and mechanistically complex polygenic interactions and environmental factors. High-throughput genome-wide studies like linkage analysis and gene expression profiling, tend to be most useful for classification and characterization but do not provide sufficient information to identify or prioritize specific disease causal genes.

Results: Extending on an earlier hypothesis that the majority of genes that impact or cause disease share membership in any of several functional relationships we, for the first time, show the utility of mouse phenotype data in human disease gene prioritization. We study the effect of different data integration methods, and based on the validation studies, we show that our approach, ToppGene http://toppgene.cchmc.org, outperforms two of the existing candidate gene prioritization methods, SUSPECTS and ENDEAVOUR.

Conclusion: The incorporation of phenotype information for mouse orthologs of human genes greatly improves the human disease candidate gene analysis and prioritization.

Show MeSH
Related in: MedlinePlus