Limits...
Discovering relations between indirectly connected biomedical concepts.

Weissenborn D, Schroeder M, Tsatsaronis G - J Biomed Semantics (2015)

Bottom Line: Towards this direction, it is necessary to combine facts in order to formulate hypotheses or draw conclusions about the domain concepts.Results suggest that relation discovery using indirect knowledge is possible, with an AUC that can reach up to 0.8, a result which is a great improvement compared to the random classification, and which shows that good predictions can be prioritized by following the suggested approach.Furthermore, this work demonstrates that the constructed graph allows for the easy integration of heterogeneous information and discovery of indirect connections between biomedical concepts.

View Article: PubMed Central - PubMed

Affiliation: DFKI Projektbüro Berlin, Alt-Moabit 91c, Berlin, 10559 Germany ; Biotechnology Center, Technische Universität Dresden, Tatzberg 47/49, Dresden, 01307 Germany.

ABSTRACT

Background: The complexity and scale of the knowledge in the biomedical domain has motivated research work towards mining heterogeneous data from both structured and unstructured knowledge bases. Towards this direction, it is necessary to combine facts in order to formulate hypotheses or draw conclusions about the domain concepts. This work addresses this problem by using indirect knowledge connecting two concepts in a knowledge graph to discover hidden relations between them. The graph represents concepts as vertices and relations as edges, stemming from structured (ontologies) and unstructured (textual) data. In this graph, path patterns, i.e. sequences of relations, are mined using distant supervision that potentially characterize a biomedical relation.

Results: It is possible to identify characteristic path patterns of biomedical relations from this representation using machine learning. For experimental evaluation two frequent biomedical relations, namely "has target", and "may treat", are chosen. Results suggest that relation discovery using indirect knowledge is possible, with an AUC that can reach up to 0.8, a result which is a great improvement compared to the random classification, and which shows that good predictions can be prioritized by following the suggested approach.

Conclusions: Analysis of the results indicates that the models can successfully learn expressive path patterns for the examined relations. Furthermore, this work demonstrates that the constructed graph allows for the easy integration of heterogeneous information and discovery of indirect connections between biomedical concepts.

No MeSH data available.


Confidence scores of trained may treat classifier using LDA features on a drug repositioning dataset. The figure shows the results of the application of the trained may treat classifier, to a drug repositioning dataset, with real case studies of repositioning collected from the period 1955 to 2013. The average classification score of negative training pairs is included as baseline at 0.57.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4492092&req=5

Fig12: Confidence scores of trained may treat classifier using LDA features on a drug repositioning dataset. The figure shows the results of the application of the trained may treat classifier, to a drug repositioning dataset, with real case studies of repositioning collected from the period 1955 to 2013. The average classification score of negative training pairs is included as baseline at 0.57.

Mentions: The resulting scores are ordered by year of FDA approval and are presented in Figure 12. The first finding is that the scores seem to be independent from the year of approval. The classifier is able to classify even most of the very recent repositioning cases with a high score. These results show that recently established knowledge can be discovered by this approach and suggest that even the discovery of new knowledge might be possible. It is noticeable that the confidence scores of the classifier are in general very high on the repositioning dataset, considering that the average classification score of negative pairs for this classifier is 0.57 with only little variation among the scores of those negative examples.Figure 12


Discovering relations between indirectly connected biomedical concepts.

Weissenborn D, Schroeder M, Tsatsaronis G - J Biomed Semantics (2015)

Confidence scores of trained may treat classifier using LDA features on a drug repositioning dataset. The figure shows the results of the application of the trained may treat classifier, to a drug repositioning dataset, with real case studies of repositioning collected from the period 1955 to 2013. The average classification score of negative training pairs is included as baseline at 0.57.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4492092&req=5

Fig12: Confidence scores of trained may treat classifier using LDA features on a drug repositioning dataset. The figure shows the results of the application of the trained may treat classifier, to a drug repositioning dataset, with real case studies of repositioning collected from the period 1955 to 2013. The average classification score of negative training pairs is included as baseline at 0.57.
Mentions: The resulting scores are ordered by year of FDA approval and are presented in Figure 12. The first finding is that the scores seem to be independent from the year of approval. The classifier is able to classify even most of the very recent repositioning cases with a high score. These results show that recently established knowledge can be discovered by this approach and suggest that even the discovery of new knowledge might be possible. It is noticeable that the confidence scores of the classifier are in general very high on the repositioning dataset, considering that the average classification score of negative pairs for this classifier is 0.57 with only little variation among the scores of those negative examples.Figure 12

Bottom Line: Towards this direction, it is necessary to combine facts in order to formulate hypotheses or draw conclusions about the domain concepts.Results suggest that relation discovery using indirect knowledge is possible, with an AUC that can reach up to 0.8, a result which is a great improvement compared to the random classification, and which shows that good predictions can be prioritized by following the suggested approach.Furthermore, this work demonstrates that the constructed graph allows for the easy integration of heterogeneous information and discovery of indirect connections between biomedical concepts.

View Article: PubMed Central - PubMed

Affiliation: DFKI Projektbüro Berlin, Alt-Moabit 91c, Berlin, 10559 Germany ; Biotechnology Center, Technische Universität Dresden, Tatzberg 47/49, Dresden, 01307 Germany.

ABSTRACT

Background: The complexity and scale of the knowledge in the biomedical domain has motivated research work towards mining heterogeneous data from both structured and unstructured knowledge bases. Towards this direction, it is necessary to combine facts in order to formulate hypotheses or draw conclusions about the domain concepts. This work addresses this problem by using indirect knowledge connecting two concepts in a knowledge graph to discover hidden relations between them. The graph represents concepts as vertices and relations as edges, stemming from structured (ontologies) and unstructured (textual) data. In this graph, path patterns, i.e. sequences of relations, are mined using distant supervision that potentially characterize a biomedical relation.

Results: It is possible to identify characteristic path patterns of biomedical relations from this representation using machine learning. For experimental evaluation two frequent biomedical relations, namely "has target", and "may treat", are chosen. Results suggest that relation discovery using indirect knowledge is possible, with an AUC that can reach up to 0.8, a result which is a great improvement compared to the random classification, and which shows that good predictions can be prioritized by following the suggested approach.

Conclusions: Analysis of the results indicates that the models can successfully learn expressive path patterns for the examined relations. Furthermore, this work demonstrates that the constructed graph allows for the easy integration of heterogeneous information and discovery of indirect connections between biomedical concepts.

No MeSH data available.