Limits...
Negation's not solved: generalizability versus optimizability in clinical natural language processing.

Wu S, Miller T, Masanz J, Coarr M, Halgrim S, Carrell D, Clark C - PLoS ONE (2014)

Bottom Line: A review of published work in clinical natural language processing (NLP) may suggest that the negation detection task has been "solved." This work proposes that an optimizable solution does not equal a generalizable solution.Various factors (e.g., annotation guidelines, named entity characteristics, the amount of data, and lexical and syntactic context) play a role in making generalizability difficult, but none completely explains the phenomenon.The most reliable means to improve negation detection is to manually annotate in-domain training data (or, perhaps, manually modify rules); this is a strategy for optimizing performance, rather than generalizing it.

View Article: PubMed Central - PubMed

Affiliation: Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, United States of America; Oregon Health and Science University, Portland, Oregon, United States of America.

ABSTRACT
A review of published work in clinical natural language processing (NLP) may suggest that the negation detection task has been "solved." This work proposes that an optimizable solution does not equal a generalizable solution. We introduce a new machine learning-based Polarity Module for detecting negation in clinical text, and extensively compare its performance across domains. Using four manually annotated corpora of clinical text, we show that negation detection performance suffers when there is no in-domain development (for manual methods) or training data (for machine learning-based methods). Various factors (e.g., annotation guidelines, named entity characteristics, the amount of data, and lexical and syntactic context) play a role in making generalizability difficult, but none completely explains the phenomenon. Furthermore, generalizability remains challenging because it is unclear whether to use a single source for accurate data, combine all sources into a single model, or apply domain adaptation methods. The most reliable means to improve negation detection is to manually annotate in-domain training data (or, perhaps, manually modify rules); this is a strategy for optimizing performance, rather than generalizing it. These results suggest a direction for future work in domain-adaptive and task-adaptive methods for clinical NLP.

Show MeSH

Related in: MedlinePlus

Learning curve for i2b2 training data on various corpora.For each proportion of the i2b2 corpus (x axis), the reported F-score (y axis) is an average of 5 randomly sampled runs.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4231086&req=5

pone-0112774-g003: Learning curve for i2b2 training data on various corpora.For each proportion of the i2b2 corpus (x axis), the reported F-score (y axis) is an average of 5 randomly sampled runs.

Mentions: The results are shown in Figure 3. The learning curve for the i2b2 data seems to be increasing even until the very end, as the classifier seems to be making marginal improvements with ever more data. In contrast, in both cross-domain experiments the performance levels off very early, conservatively estimated at around 20% of the i2b2 training data being used. For additional reference, we have also plotted two points taken from Table 4– the in-domain performance for SHARP and Mipacq. The x-axis for each of these points is the size of the training data (counted as the number of instances of negation), while the y-axis is the F-score obtained on each corpus' in-domain evaluation.


Negation's not solved: generalizability versus optimizability in clinical natural language processing.

Wu S, Miller T, Masanz J, Coarr M, Halgrim S, Carrell D, Clark C - PLoS ONE (2014)

Learning curve for i2b2 training data on various corpora.For each proportion of the i2b2 corpus (x axis), the reported F-score (y axis) is an average of 5 randomly sampled runs.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4231086&req=5

pone-0112774-g003: Learning curve for i2b2 training data on various corpora.For each proportion of the i2b2 corpus (x axis), the reported F-score (y axis) is an average of 5 randomly sampled runs.
Mentions: The results are shown in Figure 3. The learning curve for the i2b2 data seems to be increasing even until the very end, as the classifier seems to be making marginal improvements with ever more data. In contrast, in both cross-domain experiments the performance levels off very early, conservatively estimated at around 20% of the i2b2 training data being used. For additional reference, we have also plotted two points taken from Table 4– the in-domain performance for SHARP and Mipacq. The x-axis for each of these points is the size of the training data (counted as the number of instances of negation), while the y-axis is the F-score obtained on each corpus' in-domain evaluation.

Bottom Line: A review of published work in clinical natural language processing (NLP) may suggest that the negation detection task has been "solved." This work proposes that an optimizable solution does not equal a generalizable solution.Various factors (e.g., annotation guidelines, named entity characteristics, the amount of data, and lexical and syntactic context) play a role in making generalizability difficult, but none completely explains the phenomenon.The most reliable means to improve negation detection is to manually annotate in-domain training data (or, perhaps, manually modify rules); this is a strategy for optimizing performance, rather than generalizing it.

View Article: PubMed Central - PubMed

Affiliation: Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, United States of America; Oregon Health and Science University, Portland, Oregon, United States of America.

ABSTRACT
A review of published work in clinical natural language processing (NLP) may suggest that the negation detection task has been "solved." This work proposes that an optimizable solution does not equal a generalizable solution. We introduce a new machine learning-based Polarity Module for detecting negation in clinical text, and extensively compare its performance across domains. Using four manually annotated corpora of clinical text, we show that negation detection performance suffers when there is no in-domain development (for manual methods) or training data (for machine learning-based methods). Various factors (e.g., annotation guidelines, named entity characteristics, the amount of data, and lexical and syntactic context) play a role in making generalizability difficult, but none completely explains the phenomenon. Furthermore, generalizability remains challenging because it is unclear whether to use a single source for accurate data, combine all sources into a single model, or apply domain adaptation methods. The most reliable means to improve negation detection is to manually annotate in-domain training data (or, perhaps, manually modify rules); this is a strategy for optimizing performance, rather than generalizing it. These results suggest a direction for future work in domain-adaptive and task-adaptive methods for clinical NLP.

Show MeSH
Related in: MedlinePlus