Limits...
Improved human disease candidate gene prioritization using mouse phenotype.

Chen J, Xu H, Aronow BJ, Jegga AG - BMC Bioinformatics (2007)

Bottom Line: The majority of common diseases are multi-factorial and modified by genetically and mechanistically complex polygenic interactions and environmental factors.High-throughput genome-wide studies like linkage analysis and gene expression profiling, tend to be most useful for classification and characterization but do not provide sufficient information to identify or prioritize specific disease causal genes.Extending on an earlier hypothesis that the majority of genes that impact or cause disease share membership in any of several functional relationships we, for the first time, show the utility of mouse phenotype data in human disease gene prioritization.

View Article: PubMed Central - HTML - PubMed

Affiliation: Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, USA. Jing.Chen@cchmc.org

ABSTRACT

Background: The majority of common diseases are multi-factorial and modified by genetically and mechanistically complex polygenic interactions and environmental factors. High-throughput genome-wide studies like linkage analysis and gene expression profiling, tend to be most useful for classification and characterization but do not provide sufficient information to identify or prioritize specific disease causal genes.

Results: Extending on an earlier hypothesis that the majority of genes that impact or cause disease share membership in any of several functional relationships we, for the first time, show the utility of mouse phenotype data in human disease gene prioritization. We study the effect of different data integration methods, and based on the validation studies, we show that our approach, ToppGene http://toppgene.cchmc.org, outperforms two of the existing candidate gene prioritization methods, SUSPECTS and ENDEAVOUR.

Conclusion: The incorporation of phenotype information for mouse orthologs of human genes greatly improves the human disease candidate gene analysis and prioritization.

Show MeSH

Related in: MedlinePlus

AUC of different feature sets. Red bars indicate the AUC scores based on each feature set, and blue bars are the corresponding random controls. Yellow bars indicate the coverage of each feature set in the whole genome. For example, mouse phenotype (MP) has AUC score 0.78 and covers 19% of genes in the whole genome. For each feature set, the ROC curve was generated using genes with annotations only.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2194797&req=5

Figure 2: AUC of different feature sets. Red bars indicate the AUC scores based on each feature set, and blue bars are the corresponding random controls. Yellow bars indicate the coverage of each feature set in the whole genome. For example, mouse phenotype (MP) has AUC score 0.78 and covers 19% of genes in the whole genome. For each feature set, the ROC curve was generated using genes with annotations only.

Mentions: To study the efficiency of different features (GO-Gene Ontology, MP-Mouse Phenotype, Pathways, PubMed, Protein Domains, Gene Expression and Protein Interactions), ROC curve of each of the feature sets was generated. Figure 2 shows the corresponding AUC scores of the ROC curves, depicting the relative performance of each feature set in the prioritization method. The mouse phenotype and PubMed showed the best performance while protein interactions and gene expression features performed poorly. In terms of coverage (the percentage of genes annotated with each of these features in the whole genome), PubMed was the best while MP had least coverage (only about 19% of known genes have at least one MP term association).


Improved human disease candidate gene prioritization using mouse phenotype.

Chen J, Xu H, Aronow BJ, Jegga AG - BMC Bioinformatics (2007)

AUC of different feature sets. Red bars indicate the AUC scores based on each feature set, and blue bars are the corresponding random controls. Yellow bars indicate the coverage of each feature set in the whole genome. For example, mouse phenotype (MP) has AUC score 0.78 and covers 19% of genes in the whole genome. For each feature set, the ROC curve was generated using genes with annotations only.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2194797&req=5

Figure 2: AUC of different feature sets. Red bars indicate the AUC scores based on each feature set, and blue bars are the corresponding random controls. Yellow bars indicate the coverage of each feature set in the whole genome. For example, mouse phenotype (MP) has AUC score 0.78 and covers 19% of genes in the whole genome. For each feature set, the ROC curve was generated using genes with annotations only.
Mentions: To study the efficiency of different features (GO-Gene Ontology, MP-Mouse Phenotype, Pathways, PubMed, Protein Domains, Gene Expression and Protein Interactions), ROC curve of each of the feature sets was generated. Figure 2 shows the corresponding AUC scores of the ROC curves, depicting the relative performance of each feature set in the prioritization method. The mouse phenotype and PubMed showed the best performance while protein interactions and gene expression features performed poorly. In terms of coverage (the percentage of genes annotated with each of these features in the whole genome), PubMed was the best while MP had least coverage (only about 19% of known genes have at least one MP term association).

Bottom Line: The majority of common diseases are multi-factorial and modified by genetically and mechanistically complex polygenic interactions and environmental factors.High-throughput genome-wide studies like linkage analysis and gene expression profiling, tend to be most useful for classification and characterization but do not provide sufficient information to identify or prioritize specific disease causal genes.Extending on an earlier hypothesis that the majority of genes that impact or cause disease share membership in any of several functional relationships we, for the first time, show the utility of mouse phenotype data in human disease gene prioritization.

View Article: PubMed Central - HTML - PubMed

Affiliation: Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, USA. Jing.Chen@cchmc.org

ABSTRACT

Background: The majority of common diseases are multi-factorial and modified by genetically and mechanistically complex polygenic interactions and environmental factors. High-throughput genome-wide studies like linkage analysis and gene expression profiling, tend to be most useful for classification and characterization but do not provide sufficient information to identify or prioritize specific disease causal genes.

Results: Extending on an earlier hypothesis that the majority of genes that impact or cause disease share membership in any of several functional relationships we, for the first time, show the utility of mouse phenotype data in human disease gene prioritization. We study the effect of different data integration methods, and based on the validation studies, we show that our approach, ToppGene http://toppgene.cchmc.org, outperforms two of the existing candidate gene prioritization methods, SUSPECTS and ENDEAVOUR.

Conclusion: The incorporation of phenotype information for mouse orthologs of human genes greatly improves the human disease candidate gene analysis and prioritization.

Show MeSH
Related in: MedlinePlus