Limits...
Predictability Bounds of Electronic Health Records.

Dahlem D, Maniloff D, Ratti C - Sci Rep (2015)

Bottom Line: In our analysis, we progress from zeroth order through temporal informed statistics, both from an individual patient's standpoint and also considering the collective effects.We complement this result by showing the point at which the temporal dependence structure vanishes with increasing orders of the time-correlated statistic.This apparent contradiction with respect to the importance of time-ordered information is indicative of the complexities involved in capturing the health-care process and the difficulties associated with utilising this information in universal prediction algorithms.

View Article: PubMed Central - PubMed

Affiliation: 1] IBM Research-Ireland, Dublin 15, Ireland [2] Senseable City Lab, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.

ABSTRACT
The ability to intervene in disease progression given a person's disease history has the potential to solve one of society's most pressing issues: advancing health care delivery and reducing its cost. Controlling disease progression is inherently associated with the ability to predict possible future diseases given a patient's medical history. We invoke an information-theoretic methodology to quantify the level of predictability inherent in disease histories of a large electronic health records dataset with over half a million patients. In our analysis, we progress from zeroth order through temporal informed statistics, both from an individual patient's standpoint and also considering the collective effects. Our findings confirm our intuition that knowledge of common disease progressions results in higher predictability bounds than treating disease histories independently. We complement this result by showing the point at which the temporal dependence structure vanishes with increasing orders of the time-correlated statistic. Surprisingly, we also show that shuffling individual disease histories only marginally degrades the predictability bounds. This apparent contradiction with respect to the importance of time-ordered information is indicative of the complexities involved in capturing the health-care process and the difficulties associated with utilising this information in universal prediction algorithms.

No MeSH data available.


Related in: MedlinePlus

Entropy distributions of individual medical histories (first row) and upper bound on predictability (second row).The columns represent the entropy and predictability results for the different category-level views of the ICD-9 histories, starting from CAT4 through CAT2. Each curve represents a lens through which we view the data looking at zeroth-order (rnd), first-order (unc) to time-correlated statistics (cor).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4493571&req=5

f2: Entropy distributions of individual medical histories (first row) and upper bound on predictability (second row).The columns represent the entropy and predictability results for the different category-level views of the ICD-9 histories, starting from CAT4 through CAT2. Each curve represents a lens through which we view the data looking at zeroth-order (rnd), first-order (unc) to time-correlated statistics (cor).

Mentions: Calculating these entropies over our entire patient cohort of over half a million people we obtain the results shown in Fig. 2, which displays the distribution of the three entropy statistics, , , and at the category-level view of the ICD-9 codes, shown in different subplots from A for CAT4 through C for CAT2 (see section Data Preliminaries in the “Supplemental Material” for more detail). For each category, note how the distributions of Srnd and Sunc are virtually indistinguishable for the detailed ICD-9 code in Fig. 2A and their parent category in Fig. 2B, suggesting that the occurrence of ICD-9 diagnoses as characterised from the histogram of each sequence is practically uniform. In other words, by examining the EHR of a particular patient, most diagnoses in the sequence are given only once and very few exhibit counts of 2 or more, suggesting little hope for prediction schemes relying on distributional patterns to be successful. A separation between the uncorrelated entropy and the random entropy can only be observed for category CAT2 in Fig. 2C, corresponding to the second lowest (coarser) level of specificity of the ICD-9 hierarchy (for brevity of exposition we omit the lowest level of specificity). This separation occurs because more diagnostic codes are grouped together under equal categories and patterns of repetition begin to emerge, which results in lower entropy values of Sunc. Also along the category variation, the distribution of Scor exhibits significantly lower entropy values compared to both Sunc and Srnd (P-value: < 0.001; one-sided Kolmogorov-Smirnov test), suggesting that knowledge of time-correlated events reduces the entropy of the symbol sequence.


Predictability Bounds of Electronic Health Records.

Dahlem D, Maniloff D, Ratti C - Sci Rep (2015)

Entropy distributions of individual medical histories (first row) and upper bound on predictability (second row).The columns represent the entropy and predictability results for the different category-level views of the ICD-9 histories, starting from CAT4 through CAT2. Each curve represents a lens through which we view the data looking at zeroth-order (rnd), first-order (unc) to time-correlated statistics (cor).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4493571&req=5

f2: Entropy distributions of individual medical histories (first row) and upper bound on predictability (second row).The columns represent the entropy and predictability results for the different category-level views of the ICD-9 histories, starting from CAT4 through CAT2. Each curve represents a lens through which we view the data looking at zeroth-order (rnd), first-order (unc) to time-correlated statistics (cor).
Mentions: Calculating these entropies over our entire patient cohort of over half a million people we obtain the results shown in Fig. 2, which displays the distribution of the three entropy statistics, , , and at the category-level view of the ICD-9 codes, shown in different subplots from A for CAT4 through C for CAT2 (see section Data Preliminaries in the “Supplemental Material” for more detail). For each category, note how the distributions of Srnd and Sunc are virtually indistinguishable for the detailed ICD-9 code in Fig. 2A and their parent category in Fig. 2B, suggesting that the occurrence of ICD-9 diagnoses as characterised from the histogram of each sequence is practically uniform. In other words, by examining the EHR of a particular patient, most diagnoses in the sequence are given only once and very few exhibit counts of 2 or more, suggesting little hope for prediction schemes relying on distributional patterns to be successful. A separation between the uncorrelated entropy and the random entropy can only be observed for category CAT2 in Fig. 2C, corresponding to the second lowest (coarser) level of specificity of the ICD-9 hierarchy (for brevity of exposition we omit the lowest level of specificity). This separation occurs because more diagnostic codes are grouped together under equal categories and patterns of repetition begin to emerge, which results in lower entropy values of Sunc. Also along the category variation, the distribution of Scor exhibits significantly lower entropy values compared to both Sunc and Srnd (P-value: < 0.001; one-sided Kolmogorov-Smirnov test), suggesting that knowledge of time-correlated events reduces the entropy of the symbol sequence.

Bottom Line: In our analysis, we progress from zeroth order through temporal informed statistics, both from an individual patient's standpoint and also considering the collective effects.We complement this result by showing the point at which the temporal dependence structure vanishes with increasing orders of the time-correlated statistic.This apparent contradiction with respect to the importance of time-ordered information is indicative of the complexities involved in capturing the health-care process and the difficulties associated with utilising this information in universal prediction algorithms.

View Article: PubMed Central - PubMed

Affiliation: 1] IBM Research-Ireland, Dublin 15, Ireland [2] Senseable City Lab, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.

ABSTRACT
The ability to intervene in disease progression given a person's disease history has the potential to solve one of society's most pressing issues: advancing health care delivery and reducing its cost. Controlling disease progression is inherently associated with the ability to predict possible future diseases given a patient's medical history. We invoke an information-theoretic methodology to quantify the level of predictability inherent in disease histories of a large electronic health records dataset with over half a million patients. In our analysis, we progress from zeroth order through temporal informed statistics, both from an individual patient's standpoint and also considering the collective effects. Our findings confirm our intuition that knowledge of common disease progressions results in higher predictability bounds than treating disease histories independently. We complement this result by showing the point at which the temporal dependence structure vanishes with increasing orders of the time-correlated statistic. Surprisingly, we also show that shuffling individual disease histories only marginally degrades the predictability bounds. This apparent contradiction with respect to the importance of time-ordered information is indicative of the complexities involved in capturing the health-care process and the difficulties associated with utilising this information in universal prediction algorithms.

No MeSH data available.


Related in: MedlinePlus