Limits...
A Hybrid Approach to Extracting Disorder Mentions from Clinical Notes.

Wang C, Akella R - AMIA Jt Summits Transl Sci Proc (2015)

Bottom Line: Identifying disorder mentions is one of the most significant steps in clinical text analysis.Such difficulties have challenged the information extraction systems that focus on identifying explicit mentions.To identify different surface forms, we exploited rich features, especially the semantic, syntactic, and sequential features, for better capturing implicit relationships among words.

View Article: PubMed Central - PubMed

Affiliation: Department of Technology and Information Management University of California Santa Cruz, Santa Cruz, CA 95064.

ABSTRACT
Crucial information on a patient's physical or mental conditions is provided by mentions of disorders, such as disease, syndrome, injury, and abnormality. Identifying disorder mentions is one of the most significant steps in clinical text analysis. However, there are many surface forms of the same concept documented in clinical notes. Some are even recorded disjointedly, briefly, or intuitively. Such difficulties have challenged the information extraction systems that focus on identifying explicit mentions. In this study, we proposed a hybrid approach to disorder extraction, which leverages supervised machine learning, rule-based annotation, and an unsupervised NLP system. To identify different surface forms, we exploited rich features, especially the semantic, syntactic, and sequential features, for better capturing implicit relationships among words. We evaluated our method on the CLEF 2013 eHealth dataset. The experiments showed that our hybrid approach achieves a 0.776 F-score under strict evaluation standards, outperforming any participating systems in the Challenge.

No MeSH data available.


Related in: MedlinePlus

The framework of a hybrid extraction system
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4525272&req=5

f1-2068895: The framework of a hybrid extraction system

Mentions: We present the framework of our hybrid extraction system and data flow in Figure 1. The system is mainly composed of three extraction components and a post-processing module. Firstly, SVMs are learned from training data to predict if a word belongs to disorder concepts. Seven types of features (explained later) are extracted in order to help SVMs acquire both explicit and implicit relationships among words. Then, a rule-based annotator is automatically constructed from frequent errors SVMs made on training data for error correction. Finally, we employ an unsupervised NLP system, MetaMap, as the supplement, to discover concepts unique in test data. The outputs from above three components will be compiled in the post-processing module to determine mention boundaries.


A Hybrid Approach to Extracting Disorder Mentions from Clinical Notes.

Wang C, Akella R - AMIA Jt Summits Transl Sci Proc (2015)

The framework of a hybrid extraction system
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4525272&req=5

f1-2068895: The framework of a hybrid extraction system
Mentions: We present the framework of our hybrid extraction system and data flow in Figure 1. The system is mainly composed of three extraction components and a post-processing module. Firstly, SVMs are learned from training data to predict if a word belongs to disorder concepts. Seven types of features (explained later) are extracted in order to help SVMs acquire both explicit and implicit relationships among words. Then, a rule-based annotator is automatically constructed from frequent errors SVMs made on training data for error correction. Finally, we employ an unsupervised NLP system, MetaMap, as the supplement, to discover concepts unique in test data. The outputs from above three components will be compiled in the post-processing module to determine mention boundaries.

Bottom Line: Identifying disorder mentions is one of the most significant steps in clinical text analysis.Such difficulties have challenged the information extraction systems that focus on identifying explicit mentions.To identify different surface forms, we exploited rich features, especially the semantic, syntactic, and sequential features, for better capturing implicit relationships among words.

View Article: PubMed Central - PubMed

Affiliation: Department of Technology and Information Management University of California Santa Cruz, Santa Cruz, CA 95064.

ABSTRACT
Crucial information on a patient's physical or mental conditions is provided by mentions of disorders, such as disease, syndrome, injury, and abnormality. Identifying disorder mentions is one of the most significant steps in clinical text analysis. However, there are many surface forms of the same concept documented in clinical notes. Some are even recorded disjointedly, briefly, or intuitively. Such difficulties have challenged the information extraction systems that focus on identifying explicit mentions. In this study, we proposed a hybrid approach to disorder extraction, which leverages supervised machine learning, rule-based annotation, and an unsupervised NLP system. To identify different surface forms, we exploited rich features, especially the semantic, syntactic, and sequential features, for better capturing implicit relationships among words. We evaluated our method on the CLEF 2013 eHealth dataset. The experiments showed that our hybrid approach achieves a 0.776 F-score under strict evaluation standards, outperforming any participating systems in the Challenge.

No MeSH data available.


Related in: MedlinePlus