Limits...
A richly interactive exploratory data analysis and visualization tool using electronic medical records.

Huang CW, Lu R, Iqbal U, Lin SH, Nguyen PA, Yang HC, Wang CF, Li J, Ma KL, Li YC, Jian WS - BMC Med Inform Decis Mak (2015)

Bottom Line: It is a repetitive process enabling the user to divide the data into homogeneous subsets that can be visually examined, compared, and refined.The resulting visualizations help uncover hidden information in the data, compare differences between patient groups, determine critical factors that influence a particular disease, and help direct further analyses.The visualization methods such as Sankey diagram can reveal useful knowledge about the particular disease cohort and the trajectories of the disease over time.

View Article: PubMed Central - PubMed

Affiliation: Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan. weigo7729@gmail.com.

ABSTRACT

Background: Electronic medical records (EMRs) contain vast amounts of data that is of great interest to physicians, clinical researchers, and medial policy makers. As the size, complexity, and accessibility of EMRs grow, the ability to extract meaningful information from them has become an increasingly important problem to solve.

Methods: We develop a standardized data analysis process to support cohort study with a focus on a particular disease. We use an interactive divide-and-conquer approach to classify patients into relatively uniform within each group. It is a repetitive process enabling the user to divide the data into homogeneous subsets that can be visually examined, compared, and refined. The final visualization was driven by the transformed data, and user feedback direct to the corresponding operators which completed the repetitive process. The output results are shown in a Sankey diagram-style timeline, which is a particular kind of flow diagram for showing factors' states and transitions over time.

Results: This paper presented a visually rich, interactive web-based application, which could enable researchers to study any cohorts over time by using EMR data. The resulting visualizations help uncover hidden information in the data, compare differences between patient groups, determine critical factors that influence a particular disease, and help direct further analyses. We introduced and demonstrated this tool by using EMRs of 14,567 Chronic Kidney Disease (CKD) patients.

Conclusions: We developed a visual mining system to support exploratory data analysis of multi-dimensional categorical EMR data. By using CKD as a model of disease, it was assembled by automated correlational analysis and human-curated visual evaluation. The visualization methods such as Sankey diagram can reveal useful knowledge about the particular disease cohort and the trajectories of the disease over time.

No MeSH data available.


Related in: MedlinePlus

Explore causal relationship (12,960 patients). a There were 70.2 % of the patients who took HD in the first year of CKD did not develop any other factors, while the rest of them either took PD, RTPL, or died. b After filtering the unconfident associations, the remaining associations only covers 17.4 % of the population. c To perform hierarchical clustering on the patients at the pre-CKD stage and generate ten groups of similar patients. d When we highlighted the group who had a common factor of systemic lupus erythematosus (SLE), we found that none of them took the more serious procedures such as renal transplantation or died. Note: There are three groups labelled “*” because of the groups have no common factor shared by all members in the group
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4643519&req=5

Fig7: Explore causal relationship (12,960 patients). a There were 70.2 % of the patients who took HD in the first year of CKD did not develop any other factors, while the rest of them either took PD, RTPL, or died. b After filtering the unconfident associations, the remaining associations only covers 17.4 % of the population. c To perform hierarchical clustering on the patients at the pre-CKD stage and generate ten groups of similar patients. d When we highlighted the group who had a common factor of systemic lupus erythematosus (SLE), we found that none of them took the more serious procedures such as renal transplantation or died. Note: There are three groups labelled “*” because of the groups have no common factor shared by all members in the group

Mentions: Since there are only a total of 11 CKD/disease or CKD/procedure combinations for first-year-of-CKD stage and the post-CKD patients, we can visualize their clinical courses without any simplification processes. However, there are too many combinations at the pre-CKD stage to be visualized directly. For simplicity, we first group them into one single cluster and focus on the last two time windows. As Fig. 7a shows, we find that 70.2 % of the patients who took hemodialysis in the first year of CKD did not develop any other diseases or procedures related to CKD, while the rest of them either required peritoneal (PD) or renal transplantation (RTPL), or died. Some of the patients who were not on hemodialysis in the first year also died; however, the mortality rate seems lower. We also notice that more than half of the patients who didn’t require hemodialysis in the first year are not associated with any of post-CKD factors of interest. This means they were either in stable condition after the first year or their following treatments were not recorded.Fig. 7


A richly interactive exploratory data analysis and visualization tool using electronic medical records.

Huang CW, Lu R, Iqbal U, Lin SH, Nguyen PA, Yang HC, Wang CF, Li J, Ma KL, Li YC, Jian WS - BMC Med Inform Decis Mak (2015)

Explore causal relationship (12,960 patients). a There were 70.2 % of the patients who took HD in the first year of CKD did not develop any other factors, while the rest of them either took PD, RTPL, or died. b After filtering the unconfident associations, the remaining associations only covers 17.4 % of the population. c To perform hierarchical clustering on the patients at the pre-CKD stage and generate ten groups of similar patients. d When we highlighted the group who had a common factor of systemic lupus erythematosus (SLE), we found that none of them took the more serious procedures such as renal transplantation or died. Note: There are three groups labelled “*” because of the groups have no common factor shared by all members in the group
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4643519&req=5

Fig7: Explore causal relationship (12,960 patients). a There were 70.2 % of the patients who took HD in the first year of CKD did not develop any other factors, while the rest of them either took PD, RTPL, or died. b After filtering the unconfident associations, the remaining associations only covers 17.4 % of the population. c To perform hierarchical clustering on the patients at the pre-CKD stage and generate ten groups of similar patients. d When we highlighted the group who had a common factor of systemic lupus erythematosus (SLE), we found that none of them took the more serious procedures such as renal transplantation or died. Note: There are three groups labelled “*” because of the groups have no common factor shared by all members in the group
Mentions: Since there are only a total of 11 CKD/disease or CKD/procedure combinations for first-year-of-CKD stage and the post-CKD patients, we can visualize their clinical courses without any simplification processes. However, there are too many combinations at the pre-CKD stage to be visualized directly. For simplicity, we first group them into one single cluster and focus on the last two time windows. As Fig. 7a shows, we find that 70.2 % of the patients who took hemodialysis in the first year of CKD did not develop any other diseases or procedures related to CKD, while the rest of them either required peritoneal (PD) or renal transplantation (RTPL), or died. Some of the patients who were not on hemodialysis in the first year also died; however, the mortality rate seems lower. We also notice that more than half of the patients who didn’t require hemodialysis in the first year are not associated with any of post-CKD factors of interest. This means they were either in stable condition after the first year or their following treatments were not recorded.Fig. 7

Bottom Line: It is a repetitive process enabling the user to divide the data into homogeneous subsets that can be visually examined, compared, and refined.The resulting visualizations help uncover hidden information in the data, compare differences between patient groups, determine critical factors that influence a particular disease, and help direct further analyses.The visualization methods such as Sankey diagram can reveal useful knowledge about the particular disease cohort and the trajectories of the disease over time.

View Article: PubMed Central - PubMed

Affiliation: Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan. weigo7729@gmail.com.

ABSTRACT

Background: Electronic medical records (EMRs) contain vast amounts of data that is of great interest to physicians, clinical researchers, and medial policy makers. As the size, complexity, and accessibility of EMRs grow, the ability to extract meaningful information from them has become an increasingly important problem to solve.

Methods: We develop a standardized data analysis process to support cohort study with a focus on a particular disease. We use an interactive divide-and-conquer approach to classify patients into relatively uniform within each group. It is a repetitive process enabling the user to divide the data into homogeneous subsets that can be visually examined, compared, and refined. The final visualization was driven by the transformed data, and user feedback direct to the corresponding operators which completed the repetitive process. The output results are shown in a Sankey diagram-style timeline, which is a particular kind of flow diagram for showing factors' states and transitions over time.

Results: This paper presented a visually rich, interactive web-based application, which could enable researchers to study any cohorts over time by using EMR data. The resulting visualizations help uncover hidden information in the data, compare differences between patient groups, determine critical factors that influence a particular disease, and help direct further analyses. We introduced and demonstrated this tool by using EMRs of 14,567 Chronic Kidney Disease (CKD) patients.

Conclusions: We developed a visual mining system to support exploratory data analysis of multi-dimensional categorical EMR data. By using CKD as a model of disease, it was assembled by automated correlational analysis and human-curated visual evaluation. The visualization methods such as Sankey diagram can reveal useful knowledge about the particular disease cohort and the trajectories of the disease over time.

No MeSH data available.


Related in: MedlinePlus