Limits...
Supervised Learning Based Hypothesis Generation from Biomedical Literature.

Sang S, Yang Z, Li Z, Lin H - Biomed Res Int (2015)

Bottom Line: Compared with the concept cooccurrence and grammar engineering-based approaches like SemRep, machine learning based models usually can achieve better performance in information extraction (IE) from texts.Then through combining the two models, the approach reconstructs the ABC model and generates biomedical hypotheses from literature.The experimental results on the three classic Swanson hypotheses show that our approach outperforms SemRep system.

View Article: PubMed Central - PubMed

Affiliation: College of Computer Science and Engineering, Dalian University of Technology, Dalian 116024, China.

ABSTRACT
Nowadays, the amount of biomedical literatures is growing at an explosive speed, and there is much useful knowledge undiscovered in this literature. Researchers can form biomedical hypotheses through mining these works. In this paper, we propose a supervised learning based approach to generate hypotheses from biomedical literature. This approach splits the traditional processing of hypothesis generation with classic ABC model into AB model and BC model which are constructed with supervised learning method. Compared with the concept cooccurrence and grammar engineering-based approaches like SemRep, machine learning based models usually can achieve better performance in information extraction (IE) from texts. Then through combining the two models, the approach reconstructs the ABC model and generates biomedical hypotheses from literature. The experimental results on the three classic Swanson hypotheses show that our approach outperforms SemRep system.

No MeSH data available.


Related in: MedlinePlus

Graph representation generated from an example sentence. The candidate interaction pair is marked as PROT1 and PROT2; the third protein is marked as PROT. The shortest path between the proteins is shown in bold. In the dependency based subgraph all nodes in a shortest path are specialized using a posttag (IP). In the linear order subgraph possible tags are before (B), middle (M), and after (A). For the other two candidate pairs in the sentence, graphs with the same structure but different weights and labels would be generated.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4561867&req=5

fig3: Graph representation generated from an example sentence. The candidate interaction pair is marked as PROT1 and PROT2; the third protein is marked as PROT. The shortest path between the proteins is shown in bold. In the dependency based subgraph all nodes in a shortest path are specialized using a posttag (IP). In the linear order subgraph possible tags are before (B), middle (M), and after (A). For the other two candidate pairs in the sentence, graphs with the same structure but different weights and labels would be generated.

Mentions: In our experiment, all the sentences are parsed by Stanford Parser to generate the output of dependency path and POS path. A graph kernel calculates the similarity between two input graphs by comparing the relations between common vertices (nodes). The graph kernel used in our method is the all-paths graph kernel proposed by Airola et al. [19]. The kernel represents the target pair using graph matrices based on two subgraphs, where the graph features include all nonzero elements in the graph matrices. The two subgraphs are a parse structure subgraph (PSS) and a linear order subgraph (LOS), as shown in Figure 3. PSS represents the parse structure of a sentence and includes word or link vertices. A word vertex contains its lemma and its POS, while a link vertex contains its link. Additionally, both types of vertices contain their positions relative to the shortest path. LOS represents the word sequence in the sentence and thus has word vertices, each of which contains its lemma, its relative position to the target pair, and its POS.


Supervised Learning Based Hypothesis Generation from Biomedical Literature.

Sang S, Yang Z, Li Z, Lin H - Biomed Res Int (2015)

Graph representation generated from an example sentence. The candidate interaction pair is marked as PROT1 and PROT2; the third protein is marked as PROT. The shortest path between the proteins is shown in bold. In the dependency based subgraph all nodes in a shortest path are specialized using a posttag (IP). In the linear order subgraph possible tags are before (B), middle (M), and after (A). For the other two candidate pairs in the sentence, graphs with the same structure but different weights and labels would be generated.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4561867&req=5

fig3: Graph representation generated from an example sentence. The candidate interaction pair is marked as PROT1 and PROT2; the third protein is marked as PROT. The shortest path between the proteins is shown in bold. In the dependency based subgraph all nodes in a shortest path are specialized using a posttag (IP). In the linear order subgraph possible tags are before (B), middle (M), and after (A). For the other two candidate pairs in the sentence, graphs with the same structure but different weights and labels would be generated.
Mentions: In our experiment, all the sentences are parsed by Stanford Parser to generate the output of dependency path and POS path. A graph kernel calculates the similarity between two input graphs by comparing the relations between common vertices (nodes). The graph kernel used in our method is the all-paths graph kernel proposed by Airola et al. [19]. The kernel represents the target pair using graph matrices based on two subgraphs, where the graph features include all nonzero elements in the graph matrices. The two subgraphs are a parse structure subgraph (PSS) and a linear order subgraph (LOS), as shown in Figure 3. PSS represents the parse structure of a sentence and includes word or link vertices. A word vertex contains its lemma and its POS, while a link vertex contains its link. Additionally, both types of vertices contain their positions relative to the shortest path. LOS represents the word sequence in the sentence and thus has word vertices, each of which contains its lemma, its relative position to the target pair, and its POS.

Bottom Line: Compared with the concept cooccurrence and grammar engineering-based approaches like SemRep, machine learning based models usually can achieve better performance in information extraction (IE) from texts.Then through combining the two models, the approach reconstructs the ABC model and generates biomedical hypotheses from literature.The experimental results on the three classic Swanson hypotheses show that our approach outperforms SemRep system.

View Article: PubMed Central - PubMed

Affiliation: College of Computer Science and Engineering, Dalian University of Technology, Dalian 116024, China.

ABSTRACT
Nowadays, the amount of biomedical literatures is growing at an explosive speed, and there is much useful knowledge undiscovered in this literature. Researchers can form biomedical hypotheses through mining these works. In this paper, we propose a supervised learning based approach to generate hypotheses from biomedical literature. This approach splits the traditional processing of hypothesis generation with classic ABC model into AB model and BC model which are constructed with supervised learning method. Compared with the concept cooccurrence and grammar engineering-based approaches like SemRep, machine learning based models usually can achieve better performance in information extraction (IE) from texts. Then through combining the two models, the approach reconstructs the ABC model and generates biomedical hypotheses from literature. The experimental results on the three classic Swanson hypotheses show that our approach outperforms SemRep system.

No MeSH data available.


Related in: MedlinePlus