Limits...
Identification and Progression of Heart Disease Risk Factors in Diabetic Patients from Longitudinal Electronic Health Records.

Jonnagaddala J, Liaw ST, Ray P, Kumar M, Dai HJ, Hsu CY - Biomed Res Int (2015)

Bottom Line: Unfortunately, most of the valuable information on risk factor data is buried in the form of unstructured clinical notes in electronic health records.The hybrid approach employs both machine learning and rule-based clinical text mining techniques.The developed system achieved an overall microaveraged F-score of 0.8302.

View Article: PubMed Central - PubMed

Affiliation: School of Public Health and Community Medicine, University of New South Wales, Sydney, NSW 2052, Australia ; Asia-Pacific Ubiquitous Healthcare Research Centre, University of New South Wales, Sydney, NSW 2052, Australia ; Prince of Wales Clinical School, University of New South Wales, Sydney, NSW 2052, Australia.

ABSTRACT
Heart disease is the leading cause of death worldwide. Therefore, assessing the risk of its occurrence is a crucial step in predicting serious cardiac events. Identifying heart disease risk factors and tracking their progression is a preliminary step in heart disease risk assessment. A large number of studies have reported the use of risk factor data collected prospectively. Electronic health record systems are a great resource of the required risk factor data. Unfortunately, most of the valuable information on risk factor data is buried in the form of unstructured clinical notes in electronic health records. In this study, we present an information extraction system to extract related information on heart disease risk factors from unstructured clinical notes using a hybrid approach. The hybrid approach employs both machine learning and rule-based clinical text mining techniques. The developed system achieved an overall microaveraged F-score of 0.8302.

No MeSH data available.


Related in: MedlinePlus

Sample EHR with annotations of heart disease risk factors.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4561944&req=5

fig1: Sample EHR with annotations of heart disease risk factors.

Mentions: The authors used the 2014 i2b2/UTHealth shared task 2 dataset in this study [18]. The dataset is a collection of unstructured longitudinal EHRs of diabetic patients provided by Partners Healthcare, USA. The EHRs are deidentified and annotated according to the guidelines. The annotations included heart disease risk factors and information of disease progression [19]. Gold standard annotations for this dataset were also available to evaluate the developed IE system. The dataset included 1304 unstructured EHRs (from here on referred to as records) from 297 patients divided into three sets: training set 1, training set 2, and test set. Training set 1 and training set 2 included 521 and 269 records, respectively, while the test set had 514 records. The dataset was also stratified into three different cohorts of diabetic patients: patients who had CAD, patients who develop CAD, and patients who did not develop CAD over a period of time [15]. Presence of heart risk factors and progression of the disease were defined in the form of risk factor, indicator attribute, and time attribute in the dataset. An overview of risk factors and their corresponding attributes is presented in Table 1. A sample (modified) EHR from the dataset is also illustrated in Figure 1. Each risk factor tag excluding family history and smoking history had time attribute that can take values, before document creation time (DCT), during DCT, and after DCT. The time attribute defines when a risk factor is known to have existed. The indicator attribute defines whether the identified risk factor is a mention, test, or lab value.


Identification and Progression of Heart Disease Risk Factors in Diabetic Patients from Longitudinal Electronic Health Records.

Jonnagaddala J, Liaw ST, Ray P, Kumar M, Dai HJ, Hsu CY - Biomed Res Int (2015)

Sample EHR with annotations of heart disease risk factors.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4561944&req=5

fig1: Sample EHR with annotations of heart disease risk factors.
Mentions: The authors used the 2014 i2b2/UTHealth shared task 2 dataset in this study [18]. The dataset is a collection of unstructured longitudinal EHRs of diabetic patients provided by Partners Healthcare, USA. The EHRs are deidentified and annotated according to the guidelines. The annotations included heart disease risk factors and information of disease progression [19]. Gold standard annotations for this dataset were also available to evaluate the developed IE system. The dataset included 1304 unstructured EHRs (from here on referred to as records) from 297 patients divided into three sets: training set 1, training set 2, and test set. Training set 1 and training set 2 included 521 and 269 records, respectively, while the test set had 514 records. The dataset was also stratified into three different cohorts of diabetic patients: patients who had CAD, patients who develop CAD, and patients who did not develop CAD over a period of time [15]. Presence of heart risk factors and progression of the disease were defined in the form of risk factor, indicator attribute, and time attribute in the dataset. An overview of risk factors and their corresponding attributes is presented in Table 1. A sample (modified) EHR from the dataset is also illustrated in Figure 1. Each risk factor tag excluding family history and smoking history had time attribute that can take values, before document creation time (DCT), during DCT, and after DCT. The time attribute defines when a risk factor is known to have existed. The indicator attribute defines whether the identified risk factor is a mention, test, or lab value.

Bottom Line: Unfortunately, most of the valuable information on risk factor data is buried in the form of unstructured clinical notes in electronic health records.The hybrid approach employs both machine learning and rule-based clinical text mining techniques.The developed system achieved an overall microaveraged F-score of 0.8302.

View Article: PubMed Central - PubMed

Affiliation: School of Public Health and Community Medicine, University of New South Wales, Sydney, NSW 2052, Australia ; Asia-Pacific Ubiquitous Healthcare Research Centre, University of New South Wales, Sydney, NSW 2052, Australia ; Prince of Wales Clinical School, University of New South Wales, Sydney, NSW 2052, Australia.

ABSTRACT
Heart disease is the leading cause of death worldwide. Therefore, assessing the risk of its occurrence is a crucial step in predicting serious cardiac events. Identifying heart disease risk factors and tracking their progression is a preliminary step in heart disease risk assessment. A large number of studies have reported the use of risk factor data collected prospectively. Electronic health record systems are a great resource of the required risk factor data. Unfortunately, most of the valuable information on risk factor data is buried in the form of unstructured clinical notes in electronic health records. In this study, we present an information extraction system to extract related information on heart disease risk factors from unstructured clinical notes using a hybrid approach. The hybrid approach employs both machine learning and rule-based clinical text mining techniques. The developed system achieved an overall microaveraged F-score of 0.8302.

No MeSH data available.


Related in: MedlinePlus