Limits...
Biomedical question answering using semantic relations.

Hristovski D, Dinevski D, Kastrin A, Rindflesch TC - BMC Bioinformatics (2015)

Bottom Line: The system is implemented as a Web-based application that is able to provide precise answers to a wide range of questions.A typical question is answered within a few seconds.The tool has some extensions that make it especially useful for interpretation of DNA microarray results.

View Article: PubMed Central - PubMed

Affiliation: Institute for Biostatistics and Medical Informatics, Faculty of Medicine, University of Ljubljana, Vrazov trg 2, SI-1104, Ljubljana, Slovenia. dimitar.hristovski@mf.uni-lj.si.

ABSTRACT

Background: The proliferation of the scientific literature in the field of biomedicine makes it difficult to keep abreast of current knowledge, even for domain experts. While general Web search engines and specialized information retrieval (IR) systems have made important strides in recent decades, the problem of accurate knowledge extraction from the biomedical literature is far from solved. Classical IR systems usually return a list of documents that have to be read by the user to extract relevant information. This tedious and time-consuming work can be lessened with automatic Question Answering (QA) systems, which aim to provide users with direct and precise answers to their questions. In this work we propose a novel methodology for QA based on semantic relations extracted from the biomedical literature.

Results: We extracted semantic relations with the SemRep natural language processing system from 122,421,765 sentences, which came from 21,014,382 MEDLINE citations (i.e., the complete MEDLINE distribution up to the end of 2012). A total of 58,879,300 semantic relation instances were extracted and organized in a relational database. The QA process is implemented as a search in this database, which is accessed through a Web-based application, called SemBT (available at http://sembt.mf.uni-lj.si ). We conducted an extensive evaluation of the proposed methodology in order to estimate the accuracy of extracting a particular semantic relation from a particular sentence. Evaluation was performed by 80 domain experts. In total 7,510 semantic relation instances belonging to 2,675 distinct relations were evaluated 12,083 times. The instances were evaluated as correct 8,228 times (68%).

Conclusions: In this work we propose an innovative methodology for biomedical QA. The system is implemented as a Web-based application that is able to provide precise answers to a wide range of questions. A typical question is answered within a few seconds. The tool has some extensions that make it especially useful for interpretation of DNA microarray results.

Show MeSH

Related in: MedlinePlus

Faceting, filtering and argument expansion used together to get the factors that predispose various neoplasms.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4307891&req=5

Fig2: Faceting, filtering and argument expansion used together to get the factors that predispose various neoplasms.

Mentions: When the user question is not specific enough at the beginning or when a more exploratory approach is taken, faceting is another promising avenue to explore. In our tool, faceting is turned on with the “Filter” option and is used for two purposes: to show the top-N subjects, relations and objects of a query, and to use these for further query refinement or result filtering. Faceting results are shown in the left column of the user interface (Figure 2). In our faceting approach top-N means, in case of the subjects, the top-N subjects by the number of relations in which they appear. In other words, a concept that appears as a subject most often in the semantic relations that are the answers to the original query will be shown at the top of the subject facet. The same method applies to the relation and object facets. For example, if the user wants to do some exploratory research on neoplasms and enters the query “arg_name:neoplasms” and also uses argument expansion the most common neoplasms are automatically included in the question. This is a very general question that results in several hundred thousand semantic relations. Now the user can browse the facets in the left column and investigate the subject, relations and objects appearing in highest number of relations. In the relation facet, the PREDISPOSES relation is selected in the relation facet, because that is the aspect the user wants to investigate further. The original query is automatically refined with the selected relation to become “arg_name:neoplasms AND relation:PREDISPOSES” (Figure 2). Now the results of the query show which concepts are known to predispose which particular neoplasms. The facets in the left column can be interpreted as: the concepts in the subject facet are those that predispose the largest number of neoplasms; and the concepts in the object facet are the neoplasms with the largest number of known factors that predispose them.Figure 2


Biomedical question answering using semantic relations.

Hristovski D, Dinevski D, Kastrin A, Rindflesch TC - BMC Bioinformatics (2015)

Faceting, filtering and argument expansion used together to get the factors that predispose various neoplasms.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4307891&req=5

Fig2: Faceting, filtering and argument expansion used together to get the factors that predispose various neoplasms.
Mentions: When the user question is not specific enough at the beginning or when a more exploratory approach is taken, faceting is another promising avenue to explore. In our tool, faceting is turned on with the “Filter” option and is used for two purposes: to show the top-N subjects, relations and objects of a query, and to use these for further query refinement or result filtering. Faceting results are shown in the left column of the user interface (Figure 2). In our faceting approach top-N means, in case of the subjects, the top-N subjects by the number of relations in which they appear. In other words, a concept that appears as a subject most often in the semantic relations that are the answers to the original query will be shown at the top of the subject facet. The same method applies to the relation and object facets. For example, if the user wants to do some exploratory research on neoplasms and enters the query “arg_name:neoplasms” and also uses argument expansion the most common neoplasms are automatically included in the question. This is a very general question that results in several hundred thousand semantic relations. Now the user can browse the facets in the left column and investigate the subject, relations and objects appearing in highest number of relations. In the relation facet, the PREDISPOSES relation is selected in the relation facet, because that is the aspect the user wants to investigate further. The original query is automatically refined with the selected relation to become “arg_name:neoplasms AND relation:PREDISPOSES” (Figure 2). Now the results of the query show which concepts are known to predispose which particular neoplasms. The facets in the left column can be interpreted as: the concepts in the subject facet are those that predispose the largest number of neoplasms; and the concepts in the object facet are the neoplasms with the largest number of known factors that predispose them.Figure 2

Bottom Line: The system is implemented as a Web-based application that is able to provide precise answers to a wide range of questions.A typical question is answered within a few seconds.The tool has some extensions that make it especially useful for interpretation of DNA microarray results.

View Article: PubMed Central - PubMed

Affiliation: Institute for Biostatistics and Medical Informatics, Faculty of Medicine, University of Ljubljana, Vrazov trg 2, SI-1104, Ljubljana, Slovenia. dimitar.hristovski@mf.uni-lj.si.

ABSTRACT

Background: The proliferation of the scientific literature in the field of biomedicine makes it difficult to keep abreast of current knowledge, even for domain experts. While general Web search engines and specialized information retrieval (IR) systems have made important strides in recent decades, the problem of accurate knowledge extraction from the biomedical literature is far from solved. Classical IR systems usually return a list of documents that have to be read by the user to extract relevant information. This tedious and time-consuming work can be lessened with automatic Question Answering (QA) systems, which aim to provide users with direct and precise answers to their questions. In this work we propose a novel methodology for QA based on semantic relations extracted from the biomedical literature.

Results: We extracted semantic relations with the SemRep natural language processing system from 122,421,765 sentences, which came from 21,014,382 MEDLINE citations (i.e., the complete MEDLINE distribution up to the end of 2012). A total of 58,879,300 semantic relation instances were extracted and organized in a relational database. The QA process is implemented as a search in this database, which is accessed through a Web-based application, called SemBT (available at http://sembt.mf.uni-lj.si ). We conducted an extensive evaluation of the proposed methodology in order to estimate the accuracy of extracting a particular semantic relation from a particular sentence. Evaluation was performed by 80 domain experts. In total 7,510 semantic relation instances belonging to 2,675 distinct relations were evaluated 12,083 times. The instances were evaluated as correct 8,228 times (68%).

Conclusions: In this work we propose an innovative methodology for biomedical QA. The system is implemented as a Web-based application that is able to provide precise answers to a wide range of questions. A typical question is answered within a few seconds. The tool has some extensions that make it especially useful for interpretation of DNA microarray results.

Show MeSH
Related in: MedlinePlus