Limits...
A sequence labeling approach to link medications and their attributes in clinical notes and clinical trial announcements for information extraction.

Li Q, Zhai H, Deleger L, Lingren T, Kaiser M, Stoutenborough L, Solti I - J Am Med Inform Assoc (2012)

Bottom Line: The experiments showed that the two machine learning approaches performed statistically significantly better than the baseline rule-based approach.The binary SVM classification achieved 0.94 F-measure with individual tokens as features.The SVM model trained on a parsimonious feature set achieved 0.81 F-measure for CN and 0.87 for CTA.

View Article: PubMed Central - PubMed

Affiliation: Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA.

ABSTRACT

Objective: The goal of this work was to evaluate machine learning methods, binary classification and sequence labeling, for medication-attribute linkage detection in two clinical corpora.

Data and methods: We double annotated 3000 clinical trial announcements (CTA) and 1655 clinical notes (CN) for medication named entities and their attributes. A binary support vector machine (SVM) classification method with parsimonious feature sets, and a conditional random fields (CRF)-based multi-layered sequence labeling (MLSL) model were proposed to identify the linkages between the entities and their corresponding attributes. We evaluated the system's performance against the human-generated gold standard.

Results: The experiments showed that the two machine learning approaches performed statistically significantly better than the baseline rule-based approach. The binary SVM classification achieved 0.94 F-measure with individual tokens as features. The SVM model trained on a parsimonious feature set achieved 0.81 F-measure for CN and 0.87 for CTA. The CRF MLSL method achieved 0.80 F-measure on both corpora.

Discussion and conclusions: We compared the novel MLSL method with a binary classification and a rule-based method. The MLSL method performed statistically significantly better than the rule-based method. However, the SVM-based binary classification method was statistically significantly better than the MLSL method for both the CTA and CN corpora. Using parsimonious feature sets both the SVM-based binary classification and CRF-based MLSL methods achieved high performance in detecting medication name and attribute linkages in CTA and CN.

Show MeSH
Examples of linkages between medications and their attributes. Access the article online to view this figure in colour.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3756265&req=5

AMIAJNL2012001487F1: Examples of linkages between medications and their attributes. Access the article online to view this figure in colour.

Mentions: The linkage detection task associates attributes to their corresponding medication entities, assuming medications and attributes have already been identified in a prior step. For example, the algorithm will analyze the sentence: ‘Advair 250/50 diskus 1 puff and Singulair 5 mg chewable 1 tablet once a day.’ In this sentence, Advair and Singulair are the medication names, while 250/50, diskus, 1, puff, 5 mg, chewable, 1, tablet, and once a day are the attributes. In this example, 250/50, diskus, 1, and puff are the attributes of Advair, while 5 mg, chewable, 1, tablet, and once a day are the attributes of Singulair as shown in figure 1. Note that Advair is always a BID (twice a day) medication, so we did not assign once a day as an attribute of Advair.


A sequence labeling approach to link medications and their attributes in clinical notes and clinical trial announcements for information extraction.

Li Q, Zhai H, Deleger L, Lingren T, Kaiser M, Stoutenborough L, Solti I - J Am Med Inform Assoc (2012)

Examples of linkages between medications and their attributes. Access the article online to view this figure in colour.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3756265&req=5

AMIAJNL2012001487F1: Examples of linkages between medications and their attributes. Access the article online to view this figure in colour.
Mentions: The linkage detection task associates attributes to their corresponding medication entities, assuming medications and attributes have already been identified in a prior step. For example, the algorithm will analyze the sentence: ‘Advair 250/50 diskus 1 puff and Singulair 5 mg chewable 1 tablet once a day.’ In this sentence, Advair and Singulair are the medication names, while 250/50, diskus, 1, puff, 5 mg, chewable, 1, tablet, and once a day are the attributes. In this example, 250/50, diskus, 1, and puff are the attributes of Advair, while 5 mg, chewable, 1, tablet, and once a day are the attributes of Singulair as shown in figure 1. Note that Advair is always a BID (twice a day) medication, so we did not assign once a day as an attribute of Advair.

Bottom Line: The experiments showed that the two machine learning approaches performed statistically significantly better than the baseline rule-based approach.The binary SVM classification achieved 0.94 F-measure with individual tokens as features.The SVM model trained on a parsimonious feature set achieved 0.81 F-measure for CN and 0.87 for CTA.

View Article: PubMed Central - PubMed

Affiliation: Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA.

ABSTRACT

Objective: The goal of this work was to evaluate machine learning methods, binary classification and sequence labeling, for medication-attribute linkage detection in two clinical corpora.

Data and methods: We double annotated 3000 clinical trial announcements (CTA) and 1655 clinical notes (CN) for medication named entities and their attributes. A binary support vector machine (SVM) classification method with parsimonious feature sets, and a conditional random fields (CRF)-based multi-layered sequence labeling (MLSL) model were proposed to identify the linkages between the entities and their corresponding attributes. We evaluated the system's performance against the human-generated gold standard.

Results: The experiments showed that the two machine learning approaches performed statistically significantly better than the baseline rule-based approach. The binary SVM classification achieved 0.94 F-measure with individual tokens as features. The SVM model trained on a parsimonious feature set achieved 0.81 F-measure for CN and 0.87 for CTA. The CRF MLSL method achieved 0.80 F-measure on both corpora.

Discussion and conclusions: We compared the novel MLSL method with a binary classification and a rule-based method. The MLSL method performed statistically significantly better than the rule-based method. However, the SVM-based binary classification method was statistically significantly better than the MLSL method for both the CTA and CN corpora. Using parsimonious feature sets both the SVM-based binary classification and CRF-based MLSL methods achieved high performance in detecting medication name and attribute linkages in CTA and CN.

Show MeSH