Limits...
Clinical phenotype-based gene prioritization: an initial study using semantic similarity and the human phenotype ontology.

Masino AJ, Dechene ET, Dulik MC, Wilkens A, Spinner NB, Krantz ID, Pennington JW, Robinson PN, White PS - BMC Bioinformatics (2014)

Bottom Line: However, when also considering the patient's exome data and filtering non-exomic and common variants, the median rank improved to 3.The clinical case results suggest that phenotype rank combined with variant analysis provides significant improvement over the individual approaches.We expect that this combined prioritization approach may increase accuracy and decrease effort for clinical genetic diagnosis.

View Article: PubMed Central - PubMed

Affiliation: Department of Pediatrics, Cincinnati Children's Hospital and Medical Center, Cincinnati, OH, USA. peter.white@cchmc.org.

ABSTRACT

Background: Exome sequencing is a promising method for diagnosing patients with a complex phenotype. However, variant interpretation relative to patient phenotype can be challenging in some scenarios, particularly clinical assessment of rare complex phenotypes. Each patient's sequence reveals many possibly damaging variants that must be individually assessed to establish clear association with patient phenotype. To assist interpretation, we implemented an algorithm that ranks a given set of genes relative to patient phenotype. The algorithm orders genes by the semantic similarity computed between phenotypic descriptors associated with each gene and those describing the patient. Phenotypic descriptor terms are taken from the Human Phenotype Ontology (HPO) and semantic similarity is derived from each term's information content.

Results: Model validation was performed via simulation and with clinical data. We simulated 33 Mendelian diseases with 100 patients per disease. We modeled clinical conditions by adding noise and imprecision, i.e. phenotypic terms unrelated to the disease and terms less specific than the actual disease terms. We ranked the causative gene against all 2488 HPO annotated genes. The median causative gene rank was 1 for the optimal and noise cases, 12 for the imprecision case, and 60 for the imprecision with noise case. Additionally, we examined a clinical cohort of subjects with hearing impairment. The disease gene median rank was 22. However, when also considering the patient's exome data and filtering non-exomic and common variants, the median rank improved to 3.

Conclusions: Semantic similarity can rank a causative gene highly within a gene list relative to patient phenotype characteristics, provided that imprecision is mitigated. The clinical case results suggest that phenotype rank combined with variant analysis provides significant improvement over the individual approaches. We expect that this combined prioritization approach may increase accuracy and decrease effort for clinical genetic diagnosis.

Show MeSH

Related in: MedlinePlus

Patient HPO annotation count impact. The causative gene mean rank when ranked by similarity score as a function of the number of HPO terms used to describe the patient.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4117966&req=5

Fig6: Patient HPO annotation count impact. The causative gene mean rank when ranked by similarity score as a function of the number of HPO terms used to describe the patient.

Mentions: The noise and imprecision analysis on algorithm performance suggests possible guidelines for selecting patient terms in clinical practice. In particular, it appears that the algorithm performs better when provided patient terms that are very specific, and by having a large number of relevant terms (Figures 5 and6). Comparison of the noise-with-imprecision case to the noise-only and imprecision-only cases in Figure 5 indicates that selecting very specific terms can help counter noise. As with Figure 4, the minimum rank (best performance) that occurs as the maximum IC reaches 7 in Figure 5 is likely a result of the specific random sample observed in this study. Given more samples, we expect that the best performance would shift toward a maximum IC of 8, which is the maximum possible for the particular annotation set used. Again, we expect that the general trend observed in the plot of decreasing rank (increasing performance) with increasing maximum IC would be maintained with more samples. Figure 6 indicates that both imprecision and noise effects can be countered by selecting a large set of patient terms, provided that the terms are relevant to the disease condition.Figure 5


Clinical phenotype-based gene prioritization: an initial study using semantic similarity and the human phenotype ontology.

Masino AJ, Dechene ET, Dulik MC, Wilkens A, Spinner NB, Krantz ID, Pennington JW, Robinson PN, White PS - BMC Bioinformatics (2014)

Patient HPO annotation count impact. The causative gene mean rank when ranked by similarity score as a function of the number of HPO terms used to describe the patient.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4117966&req=5

Fig6: Patient HPO annotation count impact. The causative gene mean rank when ranked by similarity score as a function of the number of HPO terms used to describe the patient.
Mentions: The noise and imprecision analysis on algorithm performance suggests possible guidelines for selecting patient terms in clinical practice. In particular, it appears that the algorithm performs better when provided patient terms that are very specific, and by having a large number of relevant terms (Figures 5 and6). Comparison of the noise-with-imprecision case to the noise-only and imprecision-only cases in Figure 5 indicates that selecting very specific terms can help counter noise. As with Figure 4, the minimum rank (best performance) that occurs as the maximum IC reaches 7 in Figure 5 is likely a result of the specific random sample observed in this study. Given more samples, we expect that the best performance would shift toward a maximum IC of 8, which is the maximum possible for the particular annotation set used. Again, we expect that the general trend observed in the plot of decreasing rank (increasing performance) with increasing maximum IC would be maintained with more samples. Figure 6 indicates that both imprecision and noise effects can be countered by selecting a large set of patient terms, provided that the terms are relevant to the disease condition.Figure 5

Bottom Line: However, when also considering the patient's exome data and filtering non-exomic and common variants, the median rank improved to 3.The clinical case results suggest that phenotype rank combined with variant analysis provides significant improvement over the individual approaches.We expect that this combined prioritization approach may increase accuracy and decrease effort for clinical genetic diagnosis.

View Article: PubMed Central - PubMed

Affiliation: Department of Pediatrics, Cincinnati Children's Hospital and Medical Center, Cincinnati, OH, USA. peter.white@cchmc.org.

ABSTRACT

Background: Exome sequencing is a promising method for diagnosing patients with a complex phenotype. However, variant interpretation relative to patient phenotype can be challenging in some scenarios, particularly clinical assessment of rare complex phenotypes. Each patient's sequence reveals many possibly damaging variants that must be individually assessed to establish clear association with patient phenotype. To assist interpretation, we implemented an algorithm that ranks a given set of genes relative to patient phenotype. The algorithm orders genes by the semantic similarity computed between phenotypic descriptors associated with each gene and those describing the patient. Phenotypic descriptor terms are taken from the Human Phenotype Ontology (HPO) and semantic similarity is derived from each term's information content.

Results: Model validation was performed via simulation and with clinical data. We simulated 33 Mendelian diseases with 100 patients per disease. We modeled clinical conditions by adding noise and imprecision, i.e. phenotypic terms unrelated to the disease and terms less specific than the actual disease terms. We ranked the causative gene against all 2488 HPO annotated genes. The median causative gene rank was 1 for the optimal and noise cases, 12 for the imprecision case, and 60 for the imprecision with noise case. Additionally, we examined a clinical cohort of subjects with hearing impairment. The disease gene median rank was 22. However, when also considering the patient's exome data and filtering non-exomic and common variants, the median rank improved to 3.

Conclusions: Semantic similarity can rank a causative gene highly within a gene list relative to patient phenotype characteristics, provided that imprecision is mitigated. The clinical case results suggest that phenotype rank combined with variant analysis provides significant improvement over the individual approaches. We expect that this combined prioritization approach may increase accuracy and decrease effort for clinical genetic diagnosis.

Show MeSH
Related in: MedlinePlus