Limits...
Clinical phenotype-based gene prioritization: an initial study using semantic similarity and the human phenotype ontology.

Masino AJ, Dechene ET, Dulik MC, Wilkens A, Spinner NB, Krantz ID, Pennington JW, Robinson PN, White PS - BMC Bioinformatics (2014)

Bottom Line: However, when also considering the patient's exome data and filtering non-exomic and common variants, the median rank improved to 3.The clinical case results suggest that phenotype rank combined with variant analysis provides significant improvement over the individual approaches.We expect that this combined prioritization approach may increase accuracy and decrease effort for clinical genetic diagnosis.

View Article: PubMed Central - PubMed

Affiliation: Department of Pediatrics, Cincinnati Children's Hospital and Medical Center, Cincinnati, OH, USA. peter.white@cchmc.org.

ABSTRACT

Background: Exome sequencing is a promising method for diagnosing patients with a complex phenotype. However, variant interpretation relative to patient phenotype can be challenging in some scenarios, particularly clinical assessment of rare complex phenotypes. Each patient's sequence reveals many possibly damaging variants that must be individually assessed to establish clear association with patient phenotype. To assist interpretation, we implemented an algorithm that ranks a given set of genes relative to patient phenotype. The algorithm orders genes by the semantic similarity computed between phenotypic descriptors associated with each gene and those describing the patient. Phenotypic descriptor terms are taken from the Human Phenotype Ontology (HPO) and semantic similarity is derived from each term's information content.

Results: Model validation was performed via simulation and with clinical data. We simulated 33 Mendelian diseases with 100 patients per disease. We modeled clinical conditions by adding noise and imprecision, i.e. phenotypic terms unrelated to the disease and terms less specific than the actual disease terms. We ranked the causative gene against all 2488 HPO annotated genes. The median causative gene rank was 1 for the optimal and noise cases, 12 for the imprecision case, and 60 for the imprecision with noise case. Additionally, we examined a clinical cohort of subjects with hearing impairment. The disease gene median rank was 22. However, when also considering the patient's exome data and filtering non-exomic and common variants, the median rank improved to 3.

Conclusions: Semantic similarity can rank a causative gene highly within a gene list relative to patient phenotype characteristics, provided that imprecision is mitigated. The clinical case results suggest that phenotype rank combined with variant analysis provides significant improvement over the individual approaches. We expect that this combined prioritization approach may increase accuracy and decrease effort for clinical genetic diagnosis.

Show MeSH

Related in: MedlinePlus

Causative gene rank cumulative distribution function. The cumulative distribution function of causative gene rank for the four simulated scenarios taken across the 33 simulated diseases. The solid lines are the results obtained when ranked by similarity score. The dashed lines are the results obtained when ranked by p-value. The x-axis is the rank, r, and the y-axis is the probability that the causative gene rank, R, is less than or equal to r. Note that the x-axis is on a logarithmic scale.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4117966&req=5

Fig3: Causative gene rank cumulative distribution function. The cumulative distribution function of causative gene rank for the four simulated scenarios taken across the 33 simulated diseases. The solid lines are the results obtained when ranked by similarity score. The dashed lines are the results obtained when ranked by p-value. The x-axis is the rank, r, and the y-axis is the probability that the causative gene rank, R, is less than or equal to r. Note that the x-axis is on a logarithmic scale.

Mentions: For each simulated patient, we ranked the causative gene against all 2488 genes annotated by at least one HPO term to evaluate algorithm performance. The causative gene rank cumulative distribution plots, shown in FigureĀ 3, summarize the results. The causative gene was ranked first for 92% of the optimal cases when ranked by similarity score and 80% when ranked by p-value. The occurrence of optimal cases with causative gene ranks other than first is a result of the patient phenotype annotation process. Recall that optimal patient terms were selected probabilistically based on term penetrance. Thus some optimal patients had fewer annotations and/or were not annotated with the important disease terms, specifically those with higher information content.Figure 2


Clinical phenotype-based gene prioritization: an initial study using semantic similarity and the human phenotype ontology.

Masino AJ, Dechene ET, Dulik MC, Wilkens A, Spinner NB, Krantz ID, Pennington JW, Robinson PN, White PS - BMC Bioinformatics (2014)

Causative gene rank cumulative distribution function. The cumulative distribution function of causative gene rank for the four simulated scenarios taken across the 33 simulated diseases. The solid lines are the results obtained when ranked by similarity score. The dashed lines are the results obtained when ranked by p-value. The x-axis is the rank, r, and the y-axis is the probability that the causative gene rank, R, is less than or equal to r. Note that the x-axis is on a logarithmic scale.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4117966&req=5

Fig3: Causative gene rank cumulative distribution function. The cumulative distribution function of causative gene rank for the four simulated scenarios taken across the 33 simulated diseases. The solid lines are the results obtained when ranked by similarity score. The dashed lines are the results obtained when ranked by p-value. The x-axis is the rank, r, and the y-axis is the probability that the causative gene rank, R, is less than or equal to r. Note that the x-axis is on a logarithmic scale.
Mentions: For each simulated patient, we ranked the causative gene against all 2488 genes annotated by at least one HPO term to evaluate algorithm performance. The causative gene rank cumulative distribution plots, shown in FigureĀ 3, summarize the results. The causative gene was ranked first for 92% of the optimal cases when ranked by similarity score and 80% when ranked by p-value. The occurrence of optimal cases with causative gene ranks other than first is a result of the patient phenotype annotation process. Recall that optimal patient terms were selected probabilistically based on term penetrance. Thus some optimal patients had fewer annotations and/or were not annotated with the important disease terms, specifically those with higher information content.Figure 2

Bottom Line: However, when also considering the patient's exome data and filtering non-exomic and common variants, the median rank improved to 3.The clinical case results suggest that phenotype rank combined with variant analysis provides significant improvement over the individual approaches.We expect that this combined prioritization approach may increase accuracy and decrease effort for clinical genetic diagnosis.

View Article: PubMed Central - PubMed

Affiliation: Department of Pediatrics, Cincinnati Children's Hospital and Medical Center, Cincinnati, OH, USA. peter.white@cchmc.org.

ABSTRACT

Background: Exome sequencing is a promising method for diagnosing patients with a complex phenotype. However, variant interpretation relative to patient phenotype can be challenging in some scenarios, particularly clinical assessment of rare complex phenotypes. Each patient's sequence reveals many possibly damaging variants that must be individually assessed to establish clear association with patient phenotype. To assist interpretation, we implemented an algorithm that ranks a given set of genes relative to patient phenotype. The algorithm orders genes by the semantic similarity computed between phenotypic descriptors associated with each gene and those describing the patient. Phenotypic descriptor terms are taken from the Human Phenotype Ontology (HPO) and semantic similarity is derived from each term's information content.

Results: Model validation was performed via simulation and with clinical data. We simulated 33 Mendelian diseases with 100 patients per disease. We modeled clinical conditions by adding noise and imprecision, i.e. phenotypic terms unrelated to the disease and terms less specific than the actual disease terms. We ranked the causative gene against all 2488 HPO annotated genes. The median causative gene rank was 1 for the optimal and noise cases, 12 for the imprecision case, and 60 for the imprecision with noise case. Additionally, we examined a clinical cohort of subjects with hearing impairment. The disease gene median rank was 22. However, when also considering the patient's exome data and filtering non-exomic and common variants, the median rank improved to 3.

Conclusions: Semantic similarity can rank a causative gene highly within a gene list relative to patient phenotype characteristics, provided that imprecision is mitigated. The clinical case results suggest that phenotype rank combined with variant analysis provides significant improvement over the individual approaches. We expect that this combined prioritization approach may increase accuracy and decrease effort for clinical genetic diagnosis.

Show MeSH
Related in: MedlinePlus