Limits...
Negation's not solved: generalizability versus optimizability in clinical natural language processing.

Wu S, Miller T, Masanz J, Coarr M, Halgrim S, Carrell D, Clark C - PLoS ONE (2014)

Bottom Line: A review of published work in clinical natural language processing (NLP) may suggest that the negation detection task has been "solved." This work proposes that an optimizable solution does not equal a generalizable solution.Various factors (e.g., annotation guidelines, named entity characteristics, the amount of data, and lexical and syntactic context) play a role in making generalizability difficult, but none completely explains the phenomenon.The most reliable means to improve negation detection is to manually annotate in-domain training data (or, perhaps, manually modify rules); this is a strategy for optimizing performance, rather than generalizing it.

View Article: PubMed Central - PubMed

Affiliation: Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, United States of America; Oregon Health and Science University, Portland, Oregon, United States of America.

ABSTRACT
A review of published work in clinical natural language processing (NLP) may suggest that the negation detection task has been "solved." This work proposes that an optimizable solution does not equal a generalizable solution. We introduce a new machine learning-based Polarity Module for detecting negation in clinical text, and extensively compare its performance across domains. Using four manually annotated corpora of clinical text, we show that negation detection performance suffers when there is no in-domain development (for manual methods) or training data (for machine learning-based methods). Various factors (e.g., annotation guidelines, named entity characteristics, the amount of data, and lexical and syntactic context) play a role in making generalizability difficult, but none completely explains the phenomenon. Furthermore, generalizability remains challenging because it is unclear whether to use a single source for accurate data, combine all sources into a single model, or apply domain adaptation methods. The most reliable means to improve negation detection is to manually annotate in-domain training data (or, perhaps, manually modify rules); this is a strategy for optimizing performance, rather than generalizing it. These results suggest a direction for future work in domain-adaptive and task-adaptive methods for clinical NLP.

Show MeSH

Related in: MedlinePlus

The effect of named entity semantic group on the F-score of 6 models.SHARP, MiPACQ, and i2b2 test sets are used for evaluation.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4231086&req=5

pone-0112774-g005: The effect of named entity semantic group on the F-score of 6 models.SHARP, MiPACQ, and i2b2 test sets are used for evaluation.

Mentions: Because the annotation guidelines also differed in which semantic groups to annotate, we considered performance of each model for each specific semantic group, shown in Figure 5. Recall from Table 3 that SHARP and MiPACQ included a broad selection of semantic groups, including anatomical sites (ANAT), chemicals and drugs (CHEM), disorders (DISO), laboratories (LAB), procedures (PROC), and symptoms (SYMP). i2b2 and the NegEx Test Set only specified “problems” and are considered EVENT in Figure 5.


Negation's not solved: generalizability versus optimizability in clinical natural language processing.

Wu S, Miller T, Masanz J, Coarr M, Halgrim S, Carrell D, Clark C - PLoS ONE (2014)

The effect of named entity semantic group on the F-score of 6 models.SHARP, MiPACQ, and i2b2 test sets are used for evaluation.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4231086&req=5

pone-0112774-g005: The effect of named entity semantic group on the F-score of 6 models.SHARP, MiPACQ, and i2b2 test sets are used for evaluation.
Mentions: Because the annotation guidelines also differed in which semantic groups to annotate, we considered performance of each model for each specific semantic group, shown in Figure 5. Recall from Table 3 that SHARP and MiPACQ included a broad selection of semantic groups, including anatomical sites (ANAT), chemicals and drugs (CHEM), disorders (DISO), laboratories (LAB), procedures (PROC), and symptoms (SYMP). i2b2 and the NegEx Test Set only specified “problems” and are considered EVENT in Figure 5.

Bottom Line: A review of published work in clinical natural language processing (NLP) may suggest that the negation detection task has been "solved." This work proposes that an optimizable solution does not equal a generalizable solution.Various factors (e.g., annotation guidelines, named entity characteristics, the amount of data, and lexical and syntactic context) play a role in making generalizability difficult, but none completely explains the phenomenon.The most reliable means to improve negation detection is to manually annotate in-domain training data (or, perhaps, manually modify rules); this is a strategy for optimizing performance, rather than generalizing it.

View Article: PubMed Central - PubMed

Affiliation: Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, United States of America; Oregon Health and Science University, Portland, Oregon, United States of America.

ABSTRACT
A review of published work in clinical natural language processing (NLP) may suggest that the negation detection task has been "solved." This work proposes that an optimizable solution does not equal a generalizable solution. We introduce a new machine learning-based Polarity Module for detecting negation in clinical text, and extensively compare its performance across domains. Using four manually annotated corpora of clinical text, we show that negation detection performance suffers when there is no in-domain development (for manual methods) or training data (for machine learning-based methods). Various factors (e.g., annotation guidelines, named entity characteristics, the amount of data, and lexical and syntactic context) play a role in making generalizability difficult, but none completely explains the phenomenon. Furthermore, generalizability remains challenging because it is unclear whether to use a single source for accurate data, combine all sources into a single model, or apply domain adaptation methods. The most reliable means to improve negation detection is to manually annotate in-domain training data (or, perhaps, manually modify rules); this is a strategy for optimizing performance, rather than generalizing it. These results suggest a direction for future work in domain-adaptive and task-adaptive methods for clinical NLP.

Show MeSH
Related in: MedlinePlus