Limits...
Predictability Bounds of Electronic Health Records.

Dahlem D, Maniloff D, Ratti C - Sci Rep (2015)

Bottom Line: In our analysis, we progress from zeroth order through temporal informed statistics, both from an individual patient's standpoint and also considering the collective effects.We complement this result by showing the point at which the temporal dependence structure vanishes with increasing orders of the time-correlated statistic.This apparent contradiction with respect to the importance of time-ordered information is indicative of the complexities involved in capturing the health-care process and the difficulties associated with utilising this information in universal prediction algorithms.

View Article: PubMed Central - PubMed

Affiliation: 1] IBM Research-Ireland, Dublin 15, Ireland [2] Senseable City Lab, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.

ABSTRACT
The ability to intervene in disease progression given a person's disease history has the potential to solve one of society's most pressing issues: advancing health care delivery and reducing its cost. Controlling disease progression is inherently associated with the ability to predict possible future diseases given a patient's medical history. We invoke an information-theoretic methodology to quantify the level of predictability inherent in disease histories of a large electronic health records dataset with over half a million patients. In our analysis, we progress from zeroth order through temporal informed statistics, both from an individual patient's standpoint and also considering the collective effects. Our findings confirm our intuition that knowledge of common disease progressions results in higher predictability bounds than treating disease histories independently. We complement this result by showing the point at which the temporal dependence structure vanishes with increasing orders of the time-correlated statistic. Surprisingly, we also show that shuffling individual disease histories only marginally degrades the predictability bounds. This apparent contradiction with respect to the importance of time-ordered information is indicative of the complexities involved in capturing the health-care process and the difficulties associated with utilising this information in universal prediction algorithms.

No MeSH data available.


Related in: MedlinePlus

Medical history of one anonymised patient with 28 hospital visitations and 64 diagnoses over a 9 year period.(A) The personal disease history as plotted according to the top-level category of the ICD-9 classification scheme and aggregated for each quarter of a year. The most common diseases for this patient are related to hormone nutrition immunities, digestive, and genitourinary diseases. (B) visualises possible disease associations for the first level category of diseases. These disease associations are based on the chronological order of the personal disease history, where a connection between diseases is established if a set of diagnoses at at hospital visitation t + 1 follows a set of diagnoses at the previous hospital visitation. (C–E) provide successively more detail on the diagnostic code ranging from the second level category to the actual ICD-9 code.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4493571&req=5

f1: Medical history of one anonymised patient with 28 hospital visitations and 64 diagnoses over a 9 year period.(A) The personal disease history as plotted according to the top-level category of the ICD-9 classification scheme and aggregated for each quarter of a year. The most common diseases for this patient are related to hormone nutrition immunities, digestive, and genitourinary diseases. (B) visualises possible disease associations for the first level category of diseases. These disease associations are based on the chronological order of the personal disease history, where a connection between diseases is established if a set of diagnoses at at hospital visitation t + 1 follows a set of diagnoses at the previous hospital visitation. (C–E) provide successively more detail on the diagnostic code ranging from the second level category to the actual ICD-9 code.

Mentions: For illustration, we begin by running our individual statistics on a random patient, and present the results in Fig. 1. This patient has been diagnosed with 48 distinct ICD-9 codes yielding , while accounting for the distribution of the diagnostic codes we get . This means that if we assume that the patient’s medical history follows a uniformly random pattern, then any prediction scheme cannot guess the next diagnosis with better chance than 1/48 = 0.021. Accounting for the distributional characteristics of this patient’s medical history results in a value very close to the random entropy , and does not improve the predictability substantially. However, considering the temporal order of the medical history, our entropy rate estimate of this patient’s medical history is . This means that ≈3 bits are necessary to encode the information of correlated medical histories for this patient, or the probability of predicting correctly the next disease code is 2−2.91 = 0.13. In other words, and both indicate that each diagnosis in a health encounter produces an average of about 5.58 bits of new information, that is an average of about 25.58 ≈ 48 possible next diagnostic codes. In contrast, a of about 3 bits indicates that the real uncertainty in a new diagnosis is about 23 = 8 codes.


Predictability Bounds of Electronic Health Records.

Dahlem D, Maniloff D, Ratti C - Sci Rep (2015)

Medical history of one anonymised patient with 28 hospital visitations and 64 diagnoses over a 9 year period.(A) The personal disease history as plotted according to the top-level category of the ICD-9 classification scheme and aggregated for each quarter of a year. The most common diseases for this patient are related to hormone nutrition immunities, digestive, and genitourinary diseases. (B) visualises possible disease associations for the first level category of diseases. These disease associations are based on the chronological order of the personal disease history, where a connection between diseases is established if a set of diagnoses at at hospital visitation t + 1 follows a set of diagnoses at the previous hospital visitation. (C–E) provide successively more detail on the diagnostic code ranging from the second level category to the actual ICD-9 code.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4493571&req=5

f1: Medical history of one anonymised patient with 28 hospital visitations and 64 diagnoses over a 9 year period.(A) The personal disease history as plotted according to the top-level category of the ICD-9 classification scheme and aggregated for each quarter of a year. The most common diseases for this patient are related to hormone nutrition immunities, digestive, and genitourinary diseases. (B) visualises possible disease associations for the first level category of diseases. These disease associations are based on the chronological order of the personal disease history, where a connection between diseases is established if a set of diagnoses at at hospital visitation t + 1 follows a set of diagnoses at the previous hospital visitation. (C–E) provide successively more detail on the diagnostic code ranging from the second level category to the actual ICD-9 code.
Mentions: For illustration, we begin by running our individual statistics on a random patient, and present the results in Fig. 1. This patient has been diagnosed with 48 distinct ICD-9 codes yielding , while accounting for the distribution of the diagnostic codes we get . This means that if we assume that the patient’s medical history follows a uniformly random pattern, then any prediction scheme cannot guess the next diagnosis with better chance than 1/48 = 0.021. Accounting for the distributional characteristics of this patient’s medical history results in a value very close to the random entropy , and does not improve the predictability substantially. However, considering the temporal order of the medical history, our entropy rate estimate of this patient’s medical history is . This means that ≈3 bits are necessary to encode the information of correlated medical histories for this patient, or the probability of predicting correctly the next disease code is 2−2.91 = 0.13. In other words, and both indicate that each diagnosis in a health encounter produces an average of about 5.58 bits of new information, that is an average of about 25.58 ≈ 48 possible next diagnostic codes. In contrast, a of about 3 bits indicates that the real uncertainty in a new diagnosis is about 23 = 8 codes.

Bottom Line: In our analysis, we progress from zeroth order through temporal informed statistics, both from an individual patient's standpoint and also considering the collective effects.We complement this result by showing the point at which the temporal dependence structure vanishes with increasing orders of the time-correlated statistic.This apparent contradiction with respect to the importance of time-ordered information is indicative of the complexities involved in capturing the health-care process and the difficulties associated with utilising this information in universal prediction algorithms.

View Article: PubMed Central - PubMed

Affiliation: 1] IBM Research-Ireland, Dublin 15, Ireland [2] Senseable City Lab, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.

ABSTRACT
The ability to intervene in disease progression given a person's disease history has the potential to solve one of society's most pressing issues: advancing health care delivery and reducing its cost. Controlling disease progression is inherently associated with the ability to predict possible future diseases given a patient's medical history. We invoke an information-theoretic methodology to quantify the level of predictability inherent in disease histories of a large electronic health records dataset with over half a million patients. In our analysis, we progress from zeroth order through temporal informed statistics, both from an individual patient's standpoint and also considering the collective effects. Our findings confirm our intuition that knowledge of common disease progressions results in higher predictability bounds than treating disease histories independently. We complement this result by showing the point at which the temporal dependence structure vanishes with increasing orders of the time-correlated statistic. Surprisingly, we also show that shuffling individual disease histories only marginally degrades the predictability bounds. This apparent contradiction with respect to the importance of time-ordered information is indicative of the complexities involved in capturing the health-care process and the difficulties associated with utilising this information in universal prediction algorithms.

No MeSH data available.


Related in: MedlinePlus