Limits...
Discovering relations between indirectly connected biomedical concepts.

Weissenborn D, Schroeder M, Tsatsaronis G - J Biomed Semantics (2015)

Bottom Line: Towards this direction, it is necessary to combine facts in order to formulate hypotheses or draw conclusions about the domain concepts.Results suggest that relation discovery using indirect knowledge is possible, with an AUC that can reach up to 0.8, a result which is a great improvement compared to the random classification, and which shows that good predictions can be prioritized by following the suggested approach.Furthermore, this work demonstrates that the constructed graph allows for the easy integration of heterogeneous information and discovery of indirect connections between biomedical concepts.

View Article: PubMed Central - PubMed

Affiliation: DFKI Projektbüro Berlin, Alt-Moabit 91c, Berlin, 10559 Germany ; Biotechnology Center, Technische Universität Dresden, Tatzberg 47/49, Dresden, 01307 Germany.

ABSTRACT

Background: The complexity and scale of the knowledge in the biomedical domain has motivated research work towards mining heterogeneous data from both structured and unstructured knowledge bases. Towards this direction, it is necessary to combine facts in order to formulate hypotheses or draw conclusions about the domain concepts. This work addresses this problem by using indirect knowledge connecting two concepts in a knowledge graph to discover hidden relations between them. The graph represents concepts as vertices and relations as edges, stemming from structured (ontologies) and unstructured (textual) data. In this graph, path patterns, i.e. sequences of relations, are mined using distant supervision that potentially characterize a biomedical relation.

Results: It is possible to identify characteristic path patterns of biomedical relations from this representation using machine learning. For experimental evaluation two frequent biomedical relations, namely "has target", and "may treat", are chosen. Results suggest that relation discovery using indirect knowledge is possible, with an AUC that can reach up to 0.8, a result which is a great improvement compared to the random classification, and which shows that good predictions can be prioritized by following the suggested approach.

Conclusions: Analysis of the results indicates that the models can successfully learn expressive path patterns for the examined relations. Furthermore, this work demonstrates that the constructed graph allows for the easy integration of heterogeneous information and discovery of indirect connections between biomedical concepts.

No MeSH data available.


ROC curves using a varying number of path lengths. The figure shows the ROC curves for using paths of only length 2 and paths of both length 2 and 3 on the has target dataset.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4492092&req=5

Fig13: ROC curves using a varying number of path lengths. The figure shows the ROC curves for using paths of only length 2 and paths of both length 2 and 3 on the has target dataset.

Mentions: In many approaches to knowledge discovery (e.g., for database curation), only direct mentions of two concepts in one sentence are being considered to assert a specific relation between two concepts. This approach can be reflected in our setting by only considering paths of length 2 (i.e. only direct connections), which were excluded for all previous experiments. The exclusion from the previous experiments follows the rationale that this approach aims to find new, unknown facts, based on indirect connections between concepts. Furthermore, the problem of only using direct connections is that only around 36% of the has target pairs and 46% of the may treat pairs have direct connections in the graph, which means that it is not possible to classify more than those correctly. The improvements of adding indirect connections as features can be seen in Figure 13. By using indirect connections almost twice the number of positive examples can be ranked highly compared to the case of only using direct connections. Note that pairs of the has target dataset which do not have any connections of length 2 or 3, respectively, were also included in this experiment to illustrate the recall improvements when indirect connections are included as features.Figure 13


Discovering relations between indirectly connected biomedical concepts.

Weissenborn D, Schroeder M, Tsatsaronis G - J Biomed Semantics (2015)

ROC curves using a varying number of path lengths. The figure shows the ROC curves for using paths of only length 2 and paths of both length 2 and 3 on the has target dataset.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4492092&req=5

Fig13: ROC curves using a varying number of path lengths. The figure shows the ROC curves for using paths of only length 2 and paths of both length 2 and 3 on the has target dataset.
Mentions: In many approaches to knowledge discovery (e.g., for database curation), only direct mentions of two concepts in one sentence are being considered to assert a specific relation between two concepts. This approach can be reflected in our setting by only considering paths of length 2 (i.e. only direct connections), which were excluded for all previous experiments. The exclusion from the previous experiments follows the rationale that this approach aims to find new, unknown facts, based on indirect connections between concepts. Furthermore, the problem of only using direct connections is that only around 36% of the has target pairs and 46% of the may treat pairs have direct connections in the graph, which means that it is not possible to classify more than those correctly. The improvements of adding indirect connections as features can be seen in Figure 13. By using indirect connections almost twice the number of positive examples can be ranked highly compared to the case of only using direct connections. Note that pairs of the has target dataset which do not have any connections of length 2 or 3, respectively, were also included in this experiment to illustrate the recall improvements when indirect connections are included as features.Figure 13

Bottom Line: Towards this direction, it is necessary to combine facts in order to formulate hypotheses or draw conclusions about the domain concepts.Results suggest that relation discovery using indirect knowledge is possible, with an AUC that can reach up to 0.8, a result which is a great improvement compared to the random classification, and which shows that good predictions can be prioritized by following the suggested approach.Furthermore, this work demonstrates that the constructed graph allows for the easy integration of heterogeneous information and discovery of indirect connections between biomedical concepts.

View Article: PubMed Central - PubMed

Affiliation: DFKI Projektbüro Berlin, Alt-Moabit 91c, Berlin, 10559 Germany ; Biotechnology Center, Technische Universität Dresden, Tatzberg 47/49, Dresden, 01307 Germany.

ABSTRACT

Background: The complexity and scale of the knowledge in the biomedical domain has motivated research work towards mining heterogeneous data from both structured and unstructured knowledge bases. Towards this direction, it is necessary to combine facts in order to formulate hypotheses or draw conclusions about the domain concepts. This work addresses this problem by using indirect knowledge connecting two concepts in a knowledge graph to discover hidden relations between them. The graph represents concepts as vertices and relations as edges, stemming from structured (ontologies) and unstructured (textual) data. In this graph, path patterns, i.e. sequences of relations, are mined using distant supervision that potentially characterize a biomedical relation.

Results: It is possible to identify characteristic path patterns of biomedical relations from this representation using machine learning. For experimental evaluation two frequent biomedical relations, namely "has target", and "may treat", are chosen. Results suggest that relation discovery using indirect knowledge is possible, with an AUC that can reach up to 0.8, a result which is a great improvement compared to the random classification, and which shows that good predictions can be prioritized by following the suggested approach.

Conclusions: Analysis of the results indicates that the models can successfully learn expressive path patterns for the examined relations. Furthermore, this work demonstrates that the constructed graph allows for the easy integration of heterogeneous information and discovery of indirect connections between biomedical concepts.

No MeSH data available.