Limits...
A richly interactive exploratory data analysis and visualization tool using electronic medical records.

Huang CW, Lu R, Iqbal U, Lin SH, Nguyen PA, Yang HC, Wang CF, Li J, Ma KL, Li YC, Jian WS - BMC Med Inform Decis Mak (2015)

Bottom Line: It is a repetitive process enabling the user to divide the data into homogeneous subsets that can be visually examined, compared, and refined.The resulting visualizations help uncover hidden information in the data, compare differences between patient groups, determine critical factors that influence a particular disease, and help direct further analyses.The visualization methods such as Sankey diagram can reveal useful knowledge about the particular disease cohort and the trajectories of the disease over time.

View Article: PubMed Central - PubMed

Affiliation: Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan. weigo7729@gmail.com.

ABSTRACT

Background: Electronic medical records (EMRs) contain vast amounts of data that is of great interest to physicians, clinical researchers, and medial policy makers. As the size, complexity, and accessibility of EMRs grow, the ability to extract meaningful information from them has become an increasingly important problem to solve.

Methods: We develop a standardized data analysis process to support cohort study with a focus on a particular disease. We use an interactive divide-and-conquer approach to classify patients into relatively uniform within each group. It is a repetitive process enabling the user to divide the data into homogeneous subsets that can be visually examined, compared, and refined. The final visualization was driven by the transformed data, and user feedback direct to the corresponding operators which completed the repetitive process. The output results are shown in a Sankey diagram-style timeline, which is a particular kind of flow diagram for showing factors' states and transitions over time.

Results: This paper presented a visually rich, interactive web-based application, which could enable researchers to study any cohorts over time by using EMR data. The resulting visualizations help uncover hidden information in the data, compare differences between patient groups, determine critical factors that influence a particular disease, and help direct further analyses. We introduced and demonstrated this tool by using EMRs of 14,567 Chronic Kidney Disease (CKD) patients.

Conclusions: We developed a visual mining system to support exploratory data analysis of multi-dimensional categorical EMR data. By using CKD as a model of disease, it was assembled by automated correlational analysis and human-curated visual evaluation. The visualization methods such as Sankey diagram can reveal useful knowledge about the particular disease cohort and the trajectories of the disease over time.

No MeSH data available.


Related in: MedlinePlus

Frequency-based Cohort Clustering: Sankey Diagrams for CKD Cohort Sizes of < 250. The trajectories were simplified where larger cohorts were kept and smaller ones were merged into a single “others” group. The light green for others without CKD and light orange for others with CKD
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4643519&req=5

Fig5: Frequency-based Cohort Clustering: Sankey Diagrams for CKD Cohort Sizes of < 250. The trajectories were simplified where larger cohorts were kept and smaller ones were merged into a single “others” group. The light green for others without CKD and light orange for others with CKD

Mentions: Since there are too many comorbidities to visualize clearly as shown in Fig. 3, we apply frequency-based cohort clustering to extract the dominant cohorts. As Fig. 5 shows, the trajectories are simplified where larger cohorts are kept and smaller ones are merged into a single “others” group (light green for others without CKD and light orange for others with CKD). From the overviews, we can learn about the prevalence of different comorbidities and their proportions in the population. For example, we can see from Fig. 5 that the number of patients with a single disease such as hypertension (HTN) (brown) and diabetes (DM) (dark blue) shrinks as the time approaches year 0, which means that patients start to exhibit other diseases. The user can lower the threshold to reveal smaller sized cohorts as shown in Fig. 6.Fig. 5


A richly interactive exploratory data analysis and visualization tool using electronic medical records.

Huang CW, Lu R, Iqbal U, Lin SH, Nguyen PA, Yang HC, Wang CF, Li J, Ma KL, Li YC, Jian WS - BMC Med Inform Decis Mak (2015)

Frequency-based Cohort Clustering: Sankey Diagrams for CKD Cohort Sizes of < 250. The trajectories were simplified where larger cohorts were kept and smaller ones were merged into a single “others” group. The light green for others without CKD and light orange for others with CKD
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4643519&req=5

Fig5: Frequency-based Cohort Clustering: Sankey Diagrams for CKD Cohort Sizes of < 250. The trajectories were simplified where larger cohorts were kept and smaller ones were merged into a single “others” group. The light green for others without CKD and light orange for others with CKD
Mentions: Since there are too many comorbidities to visualize clearly as shown in Fig. 3, we apply frequency-based cohort clustering to extract the dominant cohorts. As Fig. 5 shows, the trajectories are simplified where larger cohorts are kept and smaller ones are merged into a single “others” group (light green for others without CKD and light orange for others with CKD). From the overviews, we can learn about the prevalence of different comorbidities and their proportions in the population. For example, we can see from Fig. 5 that the number of patients with a single disease such as hypertension (HTN) (brown) and diabetes (DM) (dark blue) shrinks as the time approaches year 0, which means that patients start to exhibit other diseases. The user can lower the threshold to reveal smaller sized cohorts as shown in Fig. 6.Fig. 5

Bottom Line: It is a repetitive process enabling the user to divide the data into homogeneous subsets that can be visually examined, compared, and refined.The resulting visualizations help uncover hidden information in the data, compare differences between patient groups, determine critical factors that influence a particular disease, and help direct further analyses.The visualization methods such as Sankey diagram can reveal useful knowledge about the particular disease cohort and the trajectories of the disease over time.

View Article: PubMed Central - PubMed

Affiliation: Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan. weigo7729@gmail.com.

ABSTRACT

Background: Electronic medical records (EMRs) contain vast amounts of data that is of great interest to physicians, clinical researchers, and medial policy makers. As the size, complexity, and accessibility of EMRs grow, the ability to extract meaningful information from them has become an increasingly important problem to solve.

Methods: We develop a standardized data analysis process to support cohort study with a focus on a particular disease. We use an interactive divide-and-conquer approach to classify patients into relatively uniform within each group. It is a repetitive process enabling the user to divide the data into homogeneous subsets that can be visually examined, compared, and refined. The final visualization was driven by the transformed data, and user feedback direct to the corresponding operators which completed the repetitive process. The output results are shown in a Sankey diagram-style timeline, which is a particular kind of flow diagram for showing factors' states and transitions over time.

Results: This paper presented a visually rich, interactive web-based application, which could enable researchers to study any cohorts over time by using EMR data. The resulting visualizations help uncover hidden information in the data, compare differences between patient groups, determine critical factors that influence a particular disease, and help direct further analyses. We introduced and demonstrated this tool by using EMRs of 14,567 Chronic Kidney Disease (CKD) patients.

Conclusions: We developed a visual mining system to support exploratory data analysis of multi-dimensional categorical EMR data. By using CKD as a model of disease, it was assembled by automated correlational analysis and human-curated visual evaluation. The visualization methods such as Sankey diagram can reveal useful knowledge about the particular disease cohort and the trajectories of the disease over time.

No MeSH data available.


Related in: MedlinePlus