Limits...
Clinical phenotype-based gene prioritization: an initial study using semantic similarity and the human phenotype ontology.

Masino AJ, Dechene ET, Dulik MC, Wilkens A, Spinner NB, Krantz ID, Pennington JW, Robinson PN, White PS - BMC Bioinformatics (2014)

Bottom Line: However, when also considering the patient's exome data and filtering non-exomic and common variants, the median rank improved to 3.The clinical case results suggest that phenotype rank combined with variant analysis provides significant improvement over the individual approaches.We expect that this combined prioritization approach may increase accuracy and decrease effort for clinical genetic diagnosis.

View Article: PubMed Central - PubMed

Affiliation: Department of Pediatrics, Cincinnati Children's Hospital and Medical Center, Cincinnati, OH, USA. peter.white@cchmc.org.

ABSTRACT

Background: Exome sequencing is a promising method for diagnosing patients with a complex phenotype. However, variant interpretation relative to patient phenotype can be challenging in some scenarios, particularly clinical assessment of rare complex phenotypes. Each patient's sequence reveals many possibly damaging variants that must be individually assessed to establish clear association with patient phenotype. To assist interpretation, we implemented an algorithm that ranks a given set of genes relative to patient phenotype. The algorithm orders genes by the semantic similarity computed between phenotypic descriptors associated with each gene and those describing the patient. Phenotypic descriptor terms are taken from the Human Phenotype Ontology (HPO) and semantic similarity is derived from each term's information content.

Results: Model validation was performed via simulation and with clinical data. We simulated 33 Mendelian diseases with 100 patients per disease. We modeled clinical conditions by adding noise and imprecision, i.e. phenotypic terms unrelated to the disease and terms less specific than the actual disease terms. We ranked the causative gene against all 2488 HPO annotated genes. The median causative gene rank was 1 for the optimal and noise cases, 12 for the imprecision case, and 60 for the imprecision with noise case. Additionally, we examined a clinical cohort of subjects with hearing impairment. The disease gene median rank was 22. However, when also considering the patient's exome data and filtering non-exomic and common variants, the median rank improved to 3.

Conclusions: Semantic similarity can rank a causative gene highly within a gene list relative to patient phenotype characteristics, provided that imprecision is mitigated. The clinical case results suggest that phenotype rank combined with variant analysis provides significant improvement over the individual approaches. We expect that this combined prioritization approach may increase accuracy and decrease effort for clinical genetic diagnosis.

Show MeSH

Related in: MedlinePlus

Simulated HPO term annotation counts. Distribution of the number of HPO term annotations per disease for the 33 Mendelian diseases used in the simulation cases (shown in red), and the number of HPO term annotations per simulated patient (shown in blue).
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4117966&req=5

Fig2: Simulated HPO term annotation counts. Distribution of the number of HPO term annotations per disease for the 33 Mendelian diseases used in the simulation cases (shown in red), and the number of HPO term annotations per simulated patient (shown in blue).

Mentions: Simulated results were generated for 33 diseases that have a single known causative gene according to the OMIM database and for which sufficient phenotype feature penetrance data were available to accurately model patient characteristics[21, 22]. For these 33 diseases, the number of HPO annotations per disease is approximately normally distributed, with a range from 6 to 50 and a mean of 19.7 (Figure 2). An additional file lists the diseases and their associated HPO terms and penetrance [see Additional file1]. For each disease, we first generated 100 simulated patients by selecting HPO terms for the patient from those terms directly annotated to the disease gene, with probability determined by the penetrance data (see Methods). This case is considered optimal and represents a clinical scenario where the patient phenotype is well recognized and a specific causative gene is suspected. We then added noise terms (terms unrelated to the causative gene) to simulate clinical scenarios where certain patient characteristics are unrelated to the disease. For each patient, the number of noise terms was taken to be half the number of optimal terms, e.g. if a patient had 10 optimal terms, 5 randomly selected noise terms were added. Finally, we considered imprecision, which occurs when selected patient terms are related to the disease but are less specific than the terms annotated to the causative gene. Imprecision was simulated by randomly replacing each optimal patient term with one of its ancestor terms. In all, we generated 13,200 simulated patients with a range of HPO annotations between 1 and 39 and a mean of 9.4 (Figure 2).


Clinical phenotype-based gene prioritization: an initial study using semantic similarity and the human phenotype ontology.

Masino AJ, Dechene ET, Dulik MC, Wilkens A, Spinner NB, Krantz ID, Pennington JW, Robinson PN, White PS - BMC Bioinformatics (2014)

Simulated HPO term annotation counts. Distribution of the number of HPO term annotations per disease for the 33 Mendelian diseases used in the simulation cases (shown in red), and the number of HPO term annotations per simulated patient (shown in blue).
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4117966&req=5

Fig2: Simulated HPO term annotation counts. Distribution of the number of HPO term annotations per disease for the 33 Mendelian diseases used in the simulation cases (shown in red), and the number of HPO term annotations per simulated patient (shown in blue).
Mentions: Simulated results were generated for 33 diseases that have a single known causative gene according to the OMIM database and for which sufficient phenotype feature penetrance data were available to accurately model patient characteristics[21, 22]. For these 33 diseases, the number of HPO annotations per disease is approximately normally distributed, with a range from 6 to 50 and a mean of 19.7 (Figure 2). An additional file lists the diseases and their associated HPO terms and penetrance [see Additional file1]. For each disease, we first generated 100 simulated patients by selecting HPO terms for the patient from those terms directly annotated to the disease gene, with probability determined by the penetrance data (see Methods). This case is considered optimal and represents a clinical scenario where the patient phenotype is well recognized and a specific causative gene is suspected. We then added noise terms (terms unrelated to the causative gene) to simulate clinical scenarios where certain patient characteristics are unrelated to the disease. For each patient, the number of noise terms was taken to be half the number of optimal terms, e.g. if a patient had 10 optimal terms, 5 randomly selected noise terms were added. Finally, we considered imprecision, which occurs when selected patient terms are related to the disease but are less specific than the terms annotated to the causative gene. Imprecision was simulated by randomly replacing each optimal patient term with one of its ancestor terms. In all, we generated 13,200 simulated patients with a range of HPO annotations between 1 and 39 and a mean of 9.4 (Figure 2).

Bottom Line: However, when also considering the patient's exome data and filtering non-exomic and common variants, the median rank improved to 3.The clinical case results suggest that phenotype rank combined with variant analysis provides significant improvement over the individual approaches.We expect that this combined prioritization approach may increase accuracy and decrease effort for clinical genetic diagnosis.

View Article: PubMed Central - PubMed

Affiliation: Department of Pediatrics, Cincinnati Children's Hospital and Medical Center, Cincinnati, OH, USA. peter.white@cchmc.org.

ABSTRACT

Background: Exome sequencing is a promising method for diagnosing patients with a complex phenotype. However, variant interpretation relative to patient phenotype can be challenging in some scenarios, particularly clinical assessment of rare complex phenotypes. Each patient's sequence reveals many possibly damaging variants that must be individually assessed to establish clear association with patient phenotype. To assist interpretation, we implemented an algorithm that ranks a given set of genes relative to patient phenotype. The algorithm orders genes by the semantic similarity computed between phenotypic descriptors associated with each gene and those describing the patient. Phenotypic descriptor terms are taken from the Human Phenotype Ontology (HPO) and semantic similarity is derived from each term's information content.

Results: Model validation was performed via simulation and with clinical data. We simulated 33 Mendelian diseases with 100 patients per disease. We modeled clinical conditions by adding noise and imprecision, i.e. phenotypic terms unrelated to the disease and terms less specific than the actual disease terms. We ranked the causative gene against all 2488 HPO annotated genes. The median causative gene rank was 1 for the optimal and noise cases, 12 for the imprecision case, and 60 for the imprecision with noise case. Additionally, we examined a clinical cohort of subjects with hearing impairment. The disease gene median rank was 22. However, when also considering the patient's exome data and filtering non-exomic and common variants, the median rank improved to 3.

Conclusions: Semantic similarity can rank a causative gene highly within a gene list relative to patient phenotype characteristics, provided that imprecision is mitigated. The clinical case results suggest that phenotype rank combined with variant analysis provides significant improvement over the individual approaches. We expect that this combined prioritization approach may increase accuracy and decrease effort for clinical genetic diagnosis.

Show MeSH
Related in: MedlinePlus