Limits...
Negation's not solved: generalizability versus optimizability in clinical natural language processing.

Wu S, Miller T, Masanz J, Coarr M, Halgrim S, Carrell D, Clark C - PLoS ONE (2014)

Bottom Line: A review of published work in clinical natural language processing (NLP) may suggest that the negation detection task has been "solved." This work proposes that an optimizable solution does not equal a generalizable solution.Various factors (e.g., annotation guidelines, named entity characteristics, the amount of data, and lexical and syntactic context) play a role in making generalizability difficult, but none completely explains the phenomenon.The most reliable means to improve negation detection is to manually annotate in-domain training data (or, perhaps, manually modify rules); this is a strategy for optimizing performance, rather than generalizing it.

View Article: PubMed Central - PubMed

Affiliation: Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, United States of America; Oregon Health and Science University, Portland, Oregon, United States of America.

ABSTRACT
A review of published work in clinical natural language processing (NLP) may suggest that the negation detection task has been "solved." This work proposes that an optimizable solution does not equal a generalizable solution. We introduce a new machine learning-based Polarity Module for detecting negation in clinical text, and extensively compare its performance across domains. Using four manually annotated corpora of clinical text, we show that negation detection performance suffers when there is no in-domain development (for manual methods) or training data (for machine learning-based methods). Various factors (e.g., annotation guidelines, named entity characteristics, the amount of data, and lexical and syntactic context) play a role in making generalizability difficult, but none completely explains the phenomenon. Furthermore, generalizability remains challenging because it is unclear whether to use a single source for accurate data, combine all sources into a single model, or apply domain adaptation methods. The most reliable means to improve negation detection is to manually annotate in-domain training data (or, perhaps, manually modify rules); this is a strategy for optimizing performance, rather than generalizing it. These results suggest a direction for future work in domain-adaptive and task-adaptive methods for clinical NLP.

Show MeSH

Related in: MedlinePlus

The effect of named entity length (in number of words) on performance for each of 6 training configurations.SHARP, MiPACQ, and i2b2 test sets are used for evaluation.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4231086&req=5

pone-0112774-g004: The effect of named entity length (in number of words) on performance for each of 6 training configurations.SHARP, MiPACQ, and i2b2 test sets are used for evaluation.

Mentions: Negation predictions were further analyzed to see if the differences in NE annotation guidelines influenced performance, since resulting differences in “gold standard” training data could confuse machine learning systems. Because guidelines for annotating NEs differed in how much of a noun phrase to include, we examined NE length in words. Figure 4 shows that the i2b2-trained model has the best overall performance, likely due to its larger number of training samples rather than its similarity to other annotation guidelines. Underscoring this, the NegEx Test Set is the most permissive guideline (allowing whole phrases), yet it obtains similar performance to the restrictive SHARP and MiPACQ guidelines (typically short phrases).


Negation's not solved: generalizability versus optimizability in clinical natural language processing.

Wu S, Miller T, Masanz J, Coarr M, Halgrim S, Carrell D, Clark C - PLoS ONE (2014)

The effect of named entity length (in number of words) on performance for each of 6 training configurations.SHARP, MiPACQ, and i2b2 test sets are used for evaluation.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4231086&req=5

pone-0112774-g004: The effect of named entity length (in number of words) on performance for each of 6 training configurations.SHARP, MiPACQ, and i2b2 test sets are used for evaluation.
Mentions: Negation predictions were further analyzed to see if the differences in NE annotation guidelines influenced performance, since resulting differences in “gold standard” training data could confuse machine learning systems. Because guidelines for annotating NEs differed in how much of a noun phrase to include, we examined NE length in words. Figure 4 shows that the i2b2-trained model has the best overall performance, likely due to its larger number of training samples rather than its similarity to other annotation guidelines. Underscoring this, the NegEx Test Set is the most permissive guideline (allowing whole phrases), yet it obtains similar performance to the restrictive SHARP and MiPACQ guidelines (typically short phrases).

Bottom Line: A review of published work in clinical natural language processing (NLP) may suggest that the negation detection task has been "solved." This work proposes that an optimizable solution does not equal a generalizable solution.Various factors (e.g., annotation guidelines, named entity characteristics, the amount of data, and lexical and syntactic context) play a role in making generalizability difficult, but none completely explains the phenomenon.The most reliable means to improve negation detection is to manually annotate in-domain training data (or, perhaps, manually modify rules); this is a strategy for optimizing performance, rather than generalizing it.

View Article: PubMed Central - PubMed

Affiliation: Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, United States of America; Oregon Health and Science University, Portland, Oregon, United States of America.

ABSTRACT
A review of published work in clinical natural language processing (NLP) may suggest that the negation detection task has been "solved." This work proposes that an optimizable solution does not equal a generalizable solution. We introduce a new machine learning-based Polarity Module for detecting negation in clinical text, and extensively compare its performance across domains. Using four manually annotated corpora of clinical text, we show that negation detection performance suffers when there is no in-domain development (for manual methods) or training data (for machine learning-based methods). Various factors (e.g., annotation guidelines, named entity characteristics, the amount of data, and lexical and syntactic context) play a role in making generalizability difficult, but none completely explains the phenomenon. Furthermore, generalizability remains challenging because it is unclear whether to use a single source for accurate data, combine all sources into a single model, or apply domain adaptation methods. The most reliable means to improve negation detection is to manually annotate in-domain training data (or, perhaps, manually modify rules); this is a strategy for optimizing performance, rather than generalizing it. These results suggest a direction for future work in domain-adaptive and task-adaptive methods for clinical NLP.

Show MeSH
Related in: MedlinePlus