Limits...
Disease Related Knowledge Summarization Based on Deep Graph Search.

Wu X, Yang Z, Li Z, Lin H, Wang J - Biomed Res Int (2015)

Bottom Line: Traditional information retrieval (IR) techniques, when applied to large databases such as PubMed, often return large, unmanageable lists of citations that do not fulfill the searcher's information needs.In this approach, firstly Kullback-Leibler Divergence combined with mutual information metric is used to extract disease salient information.Then deep search based on depth first search (DFS) is applied to find hidden (indirect) relations between biomedical entities.

View Article: PubMed Central - PubMed

Affiliation: College of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China.

ABSTRACT
The volume of published biomedical literature on disease related knowledge is expanding rapidly. Traditional information retrieval (IR) techniques, when applied to large databases such as PubMed, often return large, unmanageable lists of citations that do not fulfill the searcher's information needs. In this paper, we present an approach to automatically construct disease related knowledge summarization from biomedical literature. In this approach, firstly Kullback-Leibler Divergence combined with mutual information metric is used to extract disease salient information. Then deep search based on depth first search (DFS) is applied to find hidden (indirect) relations between biomedical entities. Finally random walk algorithm is exploited to filter out the weak relations. The experimental results show that our approach achieves a precision of 60% and a recall of 61% on salient information extraction for Carcinoma of bladder and outperforms the method of Combo.

No MeSH data available.


Related in: MedlinePlus

R@N of KM and Combo. R@N is the recall of top N samples in the ranking; N is the number of samples.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4561941&req=5

fig2: R@N of KM and Combo. R@N is the recall of top N samples in the ranking; N is the number of samples.

Mentions: To compare with Combo, the same metrics, namely, recall, precision, and F-score (defined as F = (2PR)/(P + R) where P denotes precision and R recall) used in [7], are employed to evaluate KM's performance. Recalls are calculated by comparing outputs to the reference standard of genes noted in relevant GHR and OMIM records. The reference standard provides a list of genes whose values have already been confirmed within the task of secondary genetic database curation, because GHR and OMIM curators have annotated their potential roles in bladder cancer development. In addition, R@N, the recall in the top N results, is used to evaluate the recall performance in only the topmost results returned by different methods. The results of the reference standard analysis are listed in Table 5. KM and Combo methods achieve the same recall (61%) since they are both based on semantic predications generated by the SemRep, while these predications only include the eight genes of the total thirteen genes (61%). However, the genes summarized by KM have better rankings than those of Combo as shown in Figure 2. The mean average precision (MAP) of KM (39.46%) is higher than that of Combo (37.52%).


Disease Related Knowledge Summarization Based on Deep Graph Search.

Wu X, Yang Z, Li Z, Lin H, Wang J - Biomed Res Int (2015)

R@N of KM and Combo. R@N is the recall of top N samples in the ranking; N is the number of samples.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4561941&req=5

fig2: R@N of KM and Combo. R@N is the recall of top N samples in the ranking; N is the number of samples.
Mentions: To compare with Combo, the same metrics, namely, recall, precision, and F-score (defined as F = (2PR)/(P + R) where P denotes precision and R recall) used in [7], are employed to evaluate KM's performance. Recalls are calculated by comparing outputs to the reference standard of genes noted in relevant GHR and OMIM records. The reference standard provides a list of genes whose values have already been confirmed within the task of secondary genetic database curation, because GHR and OMIM curators have annotated their potential roles in bladder cancer development. In addition, R@N, the recall in the top N results, is used to evaluate the recall performance in only the topmost results returned by different methods. The results of the reference standard analysis are listed in Table 5. KM and Combo methods achieve the same recall (61%) since they are both based on semantic predications generated by the SemRep, while these predications only include the eight genes of the total thirteen genes (61%). However, the genes summarized by KM have better rankings than those of Combo as shown in Figure 2. The mean average precision (MAP) of KM (39.46%) is higher than that of Combo (37.52%).

Bottom Line: Traditional information retrieval (IR) techniques, when applied to large databases such as PubMed, often return large, unmanageable lists of citations that do not fulfill the searcher's information needs.In this approach, firstly Kullback-Leibler Divergence combined with mutual information metric is used to extract disease salient information.Then deep search based on depth first search (DFS) is applied to find hidden (indirect) relations between biomedical entities.

View Article: PubMed Central - PubMed

Affiliation: College of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China.

ABSTRACT
The volume of published biomedical literature on disease related knowledge is expanding rapidly. Traditional information retrieval (IR) techniques, when applied to large databases such as PubMed, often return large, unmanageable lists of citations that do not fulfill the searcher's information needs. In this paper, we present an approach to automatically construct disease related knowledge summarization from biomedical literature. In this approach, firstly Kullback-Leibler Divergence combined with mutual information metric is used to extract disease salient information. Then deep search based on depth first search (DFS) is applied to find hidden (indirect) relations between biomedical entities. Finally random walk algorithm is exploited to filter out the weak relations. The experimental results show that our approach achieves a precision of 60% and a recall of 61% on salient information extraction for Carcinoma of bladder and outperforms the method of Combo.

No MeSH data available.


Related in: MedlinePlus