Limits...
Discovering relations between indirectly connected biomedical concepts.

Weissenborn D, Schroeder M, Tsatsaronis G - J Biomed Semantics (2015)

Bottom Line: Towards this direction, it is necessary to combine facts in order to formulate hypotheses or draw conclusions about the domain concepts.Results suggest that relation discovery using indirect knowledge is possible, with an AUC that can reach up to 0.8, a result which is a great improvement compared to the random classification, and which shows that good predictions can be prioritized by following the suggested approach.Furthermore, this work demonstrates that the constructed graph allows for the easy integration of heterogeneous information and discovery of indirect connections between biomedical concepts.

View Article: PubMed Central - PubMed

Affiliation: DFKI Projektbüro Berlin, Alt-Moabit 91c, Berlin, 10559 Germany ; Biotechnology Center, Technische Universität Dresden, Tatzberg 47/49, Dresden, 01307 Germany.

ABSTRACT

Background: The complexity and scale of the knowledge in the biomedical domain has motivated research work towards mining heterogeneous data from both structured and unstructured knowledge bases. Towards this direction, it is necessary to combine facts in order to formulate hypotheses or draw conclusions about the domain concepts. This work addresses this problem by using indirect knowledge connecting two concepts in a knowledge graph to discover hidden relations between them. The graph represents concepts as vertices and relations as edges, stemming from structured (ontologies) and unstructured (textual) data. In this graph, path patterns, i.e. sequences of relations, are mined using distant supervision that potentially characterize a biomedical relation.

Results: It is possible to identify characteristic path patterns of biomedical relations from this representation using machine learning. For experimental evaluation two frequent biomedical relations, namely "has target", and "may treat", are chosen. Results suggest that relation discovery using indirect knowledge is possible, with an AUC that can reach up to 0.8, a result which is a great improvement compared to the random classification, and which shows that good predictions can be prioritized by following the suggested approach.

Conclusions: Analysis of the results indicates that the models can successfully learn expressive path patterns for the examined relations. Furthermore, this work demonstrates that the constructed graph allows for the easy integration of heterogeneous information and discovery of indirect connections between biomedical concepts.

No MeSH data available.


Distribution of drug types in the has target dataset. The distribution of the drug types occurrences in the has target dataset is shown.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4492092&req=5

Fig8: Distribution of drug types in the has target dataset. The distribution of the drug types occurrences in the has target dataset is shown.

Mentions: Experiments were conducted on two different datasets, pertaining to two different relations, though the approach is applicable for learning any new relation, provided that it comprises concepts from the UMLS metathesaurus. The first dataset contains 438 concept pairs of the may treat relation taken from the UMLS. It was constructed with two restrictions in mind. First, it was ensured that no drug or disease concept occurred more than once in the whole dataset and second, every concept in that dataset had to be part of the pruned graph. The former restriction assured that the diseases are not dominated by one disease type (e.g., neoplasms, cardiovascular diseases etc.), but that many types of diseases are represented proportionally in each category. The latter restriction was made because for the extraction of paths the pair of concepts in question has to be part of the graph. Figures 6 and 7 show the distribution of drug and disease types, respectively, contained in that dataset. The second dataset consists of 744 pairs of the has target relation extracted from DrugBank and mapped to UMLS. As for the may treat dataset it was ensured that all concepts are part of the pruned knowledge graph but multiple occurrences of one concept were allowed. Figures 8 and 9 show the distribution of drug and disease types, respectively, contained in that dataset. Both datasets were constructed by extracting all concept pairs that are contained in the respective relation from the UMLS and afterwards the pairs were filtered with the aforementioned restrictions in mind. Negative examples were constructed as described in the previous section. Note that ensuring the exclusiveness of positive and negative examples can lead to a slightly smaller set of negative examples. The used datasets are publicly available and can be found as Additional file 1.Figure 6


Discovering relations between indirectly connected biomedical concepts.

Weissenborn D, Schroeder M, Tsatsaronis G - J Biomed Semantics (2015)

Distribution of drug types in the has target dataset. The distribution of the drug types occurrences in the has target dataset is shown.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4492092&req=5

Fig8: Distribution of drug types in the has target dataset. The distribution of the drug types occurrences in the has target dataset is shown.
Mentions: Experiments were conducted on two different datasets, pertaining to two different relations, though the approach is applicable for learning any new relation, provided that it comprises concepts from the UMLS metathesaurus. The first dataset contains 438 concept pairs of the may treat relation taken from the UMLS. It was constructed with two restrictions in mind. First, it was ensured that no drug or disease concept occurred more than once in the whole dataset and second, every concept in that dataset had to be part of the pruned graph. The former restriction assured that the diseases are not dominated by one disease type (e.g., neoplasms, cardiovascular diseases etc.), but that many types of diseases are represented proportionally in each category. The latter restriction was made because for the extraction of paths the pair of concepts in question has to be part of the graph. Figures 6 and 7 show the distribution of drug and disease types, respectively, contained in that dataset. The second dataset consists of 744 pairs of the has target relation extracted from DrugBank and mapped to UMLS. As for the may treat dataset it was ensured that all concepts are part of the pruned knowledge graph but multiple occurrences of one concept were allowed. Figures 8 and 9 show the distribution of drug and disease types, respectively, contained in that dataset. Both datasets were constructed by extracting all concept pairs that are contained in the respective relation from the UMLS and afterwards the pairs were filtered with the aforementioned restrictions in mind. Negative examples were constructed as described in the previous section. Note that ensuring the exclusiveness of positive and negative examples can lead to a slightly smaller set of negative examples. The used datasets are publicly available and can be found as Additional file 1.Figure 6

Bottom Line: Towards this direction, it is necessary to combine facts in order to formulate hypotheses or draw conclusions about the domain concepts.Results suggest that relation discovery using indirect knowledge is possible, with an AUC that can reach up to 0.8, a result which is a great improvement compared to the random classification, and which shows that good predictions can be prioritized by following the suggested approach.Furthermore, this work demonstrates that the constructed graph allows for the easy integration of heterogeneous information and discovery of indirect connections between biomedical concepts.

View Article: PubMed Central - PubMed

Affiliation: DFKI Projektbüro Berlin, Alt-Moabit 91c, Berlin, 10559 Germany ; Biotechnology Center, Technische Universität Dresden, Tatzberg 47/49, Dresden, 01307 Germany.

ABSTRACT

Background: The complexity and scale of the knowledge in the biomedical domain has motivated research work towards mining heterogeneous data from both structured and unstructured knowledge bases. Towards this direction, it is necessary to combine facts in order to formulate hypotheses or draw conclusions about the domain concepts. This work addresses this problem by using indirect knowledge connecting two concepts in a knowledge graph to discover hidden relations between them. The graph represents concepts as vertices and relations as edges, stemming from structured (ontologies) and unstructured (textual) data. In this graph, path patterns, i.e. sequences of relations, are mined using distant supervision that potentially characterize a biomedical relation.

Results: It is possible to identify characteristic path patterns of biomedical relations from this representation using machine learning. For experimental evaluation two frequent biomedical relations, namely "has target", and "may treat", are chosen. Results suggest that relation discovery using indirect knowledge is possible, with an AUC that can reach up to 0.8, a result which is a great improvement compared to the random classification, and which shows that good predictions can be prioritized by following the suggested approach.

Conclusions: Analysis of the results indicates that the models can successfully learn expressive path patterns for the examined relations. Furthermore, this work demonstrates that the constructed graph allows for the easy integration of heterogeneous information and discovery of indirect connections between biomedical concepts.

No MeSH data available.