Limits...
Corpus annotation for mining biomedical events from literature.

Kim JD, Ohta T, Tsujii J - BMC Bioinformatics (2008)

Bottom Line: To meet these challenges, we introduced new concepts such as Single-facet Annotation and Semantic Typing, which have collectively contributed to successful completion of a large scale annotation.The resulting event-annotated corpus is the largest and one of the best in quality among similar annotation efforts.We expect it to become a valuable resource for NLP (Natural Language Processing)-based TM in the bio-medical domain.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, School of Information Science and Technology, University of Tokyo, Tokyo, Japan. jdkim@is.s.u-tokyo.ac.jp

ABSTRACT

Background: Advanced Text Mining (TM) such as semantic enrichment of papers, event or relation extraction, and intelligent Question Answering have increasingly attracted attention in the bio-medical domain. For such attempts to succeed, text annotation from the biological point of view is indispensable. However, due to the complexity of the task, semantic annotation has never been tried on a large scale, apart from relatively simple term annotation.

Results: We have completed a new type of semantic annotation, event annotation, which is an addition to the existing annotations in the GENIA corpus. The corpus has already been annotated with POS (Parts of Speech), syntactic trees, terms, etc. The new annotation was made on half of the GENIA corpus, consisting of 1,000 Medline abstracts. It contains 9,372 sentences in which 36,114 events are identified. The major challenges during event annotation were (1) to design a scheme of annotation which meets specific requirements of text annotation, (2) to achieve biology-oriented annotation which reflect biologists' interpretation of text, and (3) to ensure the homogeneity of annotation quality across annotators. To meet these challenges, we introduced new concepts such as Single-facet Annotation and Semantic Typing, which have collectively contributed to successful completion of a large scale annotation.

Conclusion: The resulting event-annotated corpus is the largest and one of the best in quality among similar annotation efforts. We expect it to become a valuable resource for NLP (Natural Language Processing)-based TM in the bio-medical domain.

Show MeSH
Screenshot of XConc Suite. The XConc Suite consists of three plug-ins to Eclipse platform: an XML editor (A), a concordancer (B is the query editor and C is the result view), and an ontology browser (D) which support both the editor and the concordancer for the selection of ontology terms.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2267702&req=5

Figure 8: Screenshot of XConc Suite. The XConc Suite consists of three plug-ins to Eclipse platform: an XML editor (A), a concordancer (B is the query editor and C is the result view), and an ontology browser (D) which support both the editor and the concordancer for the selection of ontology terms.

Mentions: The XConc (XML-based Concordancer) Suite is an integrated annotation environment providing an XML editor, a concordancer and an ontology browser which all interact with each other. For example, the users can retrieve existing annotations and view the concordance in KWIC (keyword in context) format. Figure 8 shows a screenshot of XConc. The pane in the bottom shows the list of annotation instances of Regulation (including its child classes, Positive_regulation and Negative_regulation). Users can choose an instance from the list in order to open the file containing the annotation, which will automatically locate the cursor on the annotation, so that they can easily make an addition to it.


Corpus annotation for mining biomedical events from literature.

Kim JD, Ohta T, Tsujii J - BMC Bioinformatics (2008)

Screenshot of XConc Suite. The XConc Suite consists of three plug-ins to Eclipse platform: an XML editor (A), a concordancer (B is the query editor and C is the result view), and an ontology browser (D) which support both the editor and the concordancer for the selection of ontology terms.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2267702&req=5

Figure 8: Screenshot of XConc Suite. The XConc Suite consists of three plug-ins to Eclipse platform: an XML editor (A), a concordancer (B is the query editor and C is the result view), and an ontology browser (D) which support both the editor and the concordancer for the selection of ontology terms.
Mentions: The XConc (XML-based Concordancer) Suite is an integrated annotation environment providing an XML editor, a concordancer and an ontology browser which all interact with each other. For example, the users can retrieve existing annotations and view the concordance in KWIC (keyword in context) format. Figure 8 shows a screenshot of XConc. The pane in the bottom shows the list of annotation instances of Regulation (including its child classes, Positive_regulation and Negative_regulation). Users can choose an instance from the list in order to open the file containing the annotation, which will automatically locate the cursor on the annotation, so that they can easily make an addition to it.

Bottom Line: To meet these challenges, we introduced new concepts such as Single-facet Annotation and Semantic Typing, which have collectively contributed to successful completion of a large scale annotation.The resulting event-annotated corpus is the largest and one of the best in quality among similar annotation efforts.We expect it to become a valuable resource for NLP (Natural Language Processing)-based TM in the bio-medical domain.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, School of Information Science and Technology, University of Tokyo, Tokyo, Japan. jdkim@is.s.u-tokyo.ac.jp

ABSTRACT

Background: Advanced Text Mining (TM) such as semantic enrichment of papers, event or relation extraction, and intelligent Question Answering have increasingly attracted attention in the bio-medical domain. For such attempts to succeed, text annotation from the biological point of view is indispensable. However, due to the complexity of the task, semantic annotation has never been tried on a large scale, apart from relatively simple term annotation.

Results: We have completed a new type of semantic annotation, event annotation, which is an addition to the existing annotations in the GENIA corpus. The corpus has already been annotated with POS (Parts of Speech), syntactic trees, terms, etc. The new annotation was made on half of the GENIA corpus, consisting of 1,000 Medline abstracts. It contains 9,372 sentences in which 36,114 events are identified. The major challenges during event annotation were (1) to design a scheme of annotation which meets specific requirements of text annotation, (2) to achieve biology-oriented annotation which reflect biologists' interpretation of text, and (3) to ensure the homogeneity of annotation quality across annotators. To meet these challenges, we introduced new concepts such as Single-facet Annotation and Semantic Typing, which have collectively contributed to successful completion of a large scale annotation.

Conclusion: The resulting event-annotated corpus is the largest and one of the best in quality among similar annotation efforts. We expect it to become a valuable resource for NLP (Natural Language Processing)-based TM in the bio-medical domain.

Show MeSH