Limits...
Recognition and Evaluation of Clinical Section Headings in Clinical Documents Using Token-Based Formulation with Conditional Random Fields.

Dai HJ, Syed-Abdul S, Chen CW, Wu CC - Biomed Res Int (2015)

Bottom Line: In order to improve the readability and accessibility of EHRs, this work developed a section heading recognition system for clinical documents.The results of the experiments showed that the proposed method achieved a satisfactory F-score of 0.942, which outperformed the sentence-based approach and the best dictionary-based system by 0.087 and 0.096, respectively.One important advantage of our formulation over the sentence-based approach is that it presented an integrated solution without the need to develop additional heuristics rules for isolating the headings from the surrounding section contents.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science and Information Engineering, National Taitung University, Taitung 95092, Taiwan ; Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei 11042, Taiwan.

ABSTRACT
Electronic health record (EHR) is a digital data format that collects electronic health information about an individual patient or population. To enhance the meaningful use of EHRs, information extraction techniques have been developed to recognize clinical concepts mentioned in EHRs. Nevertheless, the clinical judgment of an EHR cannot be known solely based on the recognized concepts without considering its contextual information. In order to improve the readability and accessibility of EHRs, this work developed a section heading recognition system for clinical documents. In contrast to formulating the section heading recognition task as a sentence classification problem, this work proposed a token-based formulation with the conditional random field (CRF) model. A standard section heading recognition corpus was compiled by annotators with clinical experience to evaluate the performance and compare it with sentence classification and dictionary-based approaches. The results of the experiments showed that the proposed method achieved a satisfactory F-score of 0.942, which outperformed the sentence-based approach and the best dictionary-based system by 0.087 and 0.096, respectively. One important advantage of our formulation over the sentence-based approach is that it presented an integrated solution without the need to develop additional heuristics rules for isolating the headings from the surrounding section contents.

No MeSH data available.


An annotated document sample on brat.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4563061&req=5

fig1: An annotated document sample on brat.

Mentions: For the manual annotation task, the annotators were instructed to only annotate the topmost section headings. We searched literatures extensively for standard section-heading definitions for discharge summaries but observed that each country adopted different definitions. Hence, this work followed the discharge summary exchange standard (http://emrstd.mohw.gov.tw/strdoc/default.aspx) defined by the electronic medical record exchange centre built by the Ministry of Health and Welfare of Taiwan to define the topmost sections. Based on this standard, section headings that can be viewed as subsections themselves or are followed by contents belonging to a superior section were removed from the annotations, regardless of their section level in other documents. For instance, if both the “Laboratory” and “Radiology” sections existed in EHR but can be considered as subsections of the “Data” section, then only the superior section “Data” was annotated. On the contrary, if “Laboratory” and “Radiology” were two separate sections without a common superior section, then both sections were annotated. Consider another example in which “Impression” was annotated if it was the topmost section. However, if the content of the “Impression” section clearly contained the data of certain reports, such as X-ray or echography, and trailed behind other section headings like “Cardiac Echography” or “Chest X-ray”, then the annotation of the “Impression” section was removed. Furthermore, if the name of a topmost section consisted of two merged concepts, it was still annotated as one section heading. For instance, some documents combined the sections “Impression” and “Plan” as one section “Impression/Plan”, while others recorded both sections independently. Finally, section headings were further extended to include punctuation marks and parentheses, such as “Chief Complaints:” and “Medications (updated 8/28/70)”. Figure 1 shows an example of the annotated document within the brat annotation tool.


Recognition and Evaluation of Clinical Section Headings in Clinical Documents Using Token-Based Formulation with Conditional Random Fields.

Dai HJ, Syed-Abdul S, Chen CW, Wu CC - Biomed Res Int (2015)

An annotated document sample on brat.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4563061&req=5

fig1: An annotated document sample on brat.
Mentions: For the manual annotation task, the annotators were instructed to only annotate the topmost section headings. We searched literatures extensively for standard section-heading definitions for discharge summaries but observed that each country adopted different definitions. Hence, this work followed the discharge summary exchange standard (http://emrstd.mohw.gov.tw/strdoc/default.aspx) defined by the electronic medical record exchange centre built by the Ministry of Health and Welfare of Taiwan to define the topmost sections. Based on this standard, section headings that can be viewed as subsections themselves or are followed by contents belonging to a superior section were removed from the annotations, regardless of their section level in other documents. For instance, if both the “Laboratory” and “Radiology” sections existed in EHR but can be considered as subsections of the “Data” section, then only the superior section “Data” was annotated. On the contrary, if “Laboratory” and “Radiology” were two separate sections without a common superior section, then both sections were annotated. Consider another example in which “Impression” was annotated if it was the topmost section. However, if the content of the “Impression” section clearly contained the data of certain reports, such as X-ray or echography, and trailed behind other section headings like “Cardiac Echography” or “Chest X-ray”, then the annotation of the “Impression” section was removed. Furthermore, if the name of a topmost section consisted of two merged concepts, it was still annotated as one section heading. For instance, some documents combined the sections “Impression” and “Plan” as one section “Impression/Plan”, while others recorded both sections independently. Finally, section headings were further extended to include punctuation marks and parentheses, such as “Chief Complaints:” and “Medications (updated 8/28/70)”. Figure 1 shows an example of the annotated document within the brat annotation tool.

Bottom Line: In order to improve the readability and accessibility of EHRs, this work developed a section heading recognition system for clinical documents.The results of the experiments showed that the proposed method achieved a satisfactory F-score of 0.942, which outperformed the sentence-based approach and the best dictionary-based system by 0.087 and 0.096, respectively.One important advantage of our formulation over the sentence-based approach is that it presented an integrated solution without the need to develop additional heuristics rules for isolating the headings from the surrounding section contents.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science and Information Engineering, National Taitung University, Taitung 95092, Taiwan ; Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei 11042, Taiwan.

ABSTRACT
Electronic health record (EHR) is a digital data format that collects electronic health information about an individual patient or population. To enhance the meaningful use of EHRs, information extraction techniques have been developed to recognize clinical concepts mentioned in EHRs. Nevertheless, the clinical judgment of an EHR cannot be known solely based on the recognized concepts without considering its contextual information. In order to improve the readability and accessibility of EHRs, this work developed a section heading recognition system for clinical documents. In contrast to formulating the section heading recognition task as a sentence classification problem, this work proposed a token-based formulation with the conditional random field (CRF) model. A standard section heading recognition corpus was compiled by annotators with clinical experience to evaluate the performance and compare it with sentence classification and dictionary-based approaches. The results of the experiments showed that the proposed method achieved a satisfactory F-score of 0.942, which outperformed the sentence-based approach and the best dictionary-based system by 0.087 and 0.096, respectively. One important advantage of our formulation over the sentence-based approach is that it presented an integrated solution without the need to develop additional heuristics rules for isolating the headings from the surrounding section contents.

No MeSH data available.