Limits...
Predictability Bounds of Electronic Health Records.

Dahlem D, Maniloff D, Ratti C - Sci Rep (2015)

Bottom Line: In our analysis, we progress from zeroth order through temporal informed statistics, both from an individual patient's standpoint and also considering the collective effects.We complement this result by showing the point at which the temporal dependence structure vanishes with increasing orders of the time-correlated statistic.This apparent contradiction with respect to the importance of time-ordered information is indicative of the complexities involved in capturing the health-care process and the difficulties associated with utilising this information in universal prediction algorithms.

View Article: PubMed Central - PubMed

Affiliation: 1] IBM Research-Ireland, Dublin 15, Ireland [2] Senseable City Lab, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.

ABSTRACT
The ability to intervene in disease progression given a person's disease history has the potential to solve one of society's most pressing issues: advancing health care delivery and reducing its cost. Controlling disease progression is inherently associated with the ability to predict possible future diseases given a patient's medical history. We invoke an information-theoretic methodology to quantify the level of predictability inherent in disease histories of a large electronic health records dataset with over half a million patients. In our analysis, we progress from zeroth order through temporal informed statistics, both from an individual patient's standpoint and also considering the collective effects. Our findings confirm our intuition that knowledge of common disease progressions results in higher predictability bounds than treating disease histories independently. We complement this result by showing the point at which the temporal dependence structure vanishes with increasing orders of the time-correlated statistic. Surprisingly, we also show that shuffling individual disease histories only marginally degrades the predictability bounds. This apparent contradiction with respect to the importance of time-ordered information is indicative of the complexities involved in capturing the health-care process and the difficulties associated with utilising this information in universal prediction algorithms.

No MeSH data available.


Related in: MedlinePlus

n-gram model cross-entropy and corresponding predictability bands of the hold-out validation dataset.The statistics are computed for the categories CAT4 through CAT2 and for the original dataset D, individual histories shuffled D′, and the entire dataset shuffled D″ (A,B) and for the Brown Corpus (C,D).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4493571&req=5

f3: n-gram model cross-entropy and corresponding predictability bands of the hold-out validation dataset.The statistics are computed for the categories CAT4 through CAT2 and for the original dataset D, individual histories shuffled D′, and the entire dataset shuffled D″ (A,B) and for the Brown Corpus (C,D).

Mentions: Figure 3A shows the cross-entropy rate estimates from the n-gram models with orders 1 ≤ n ≤ 5. As the time-correlated entropy measure suggests, increasing the n-gram order will decrease the entropy and improve upon the predictability. In fact, with an n-gram order of 2 the upper predictability bound exceeds the previous Πcor = 29% by 61 percentage points to Πcor = 90% (see Fig. 3). As a consequence, only 10% of predictability is due to chance alone. An improvement of the upper bound of predictability beyond the order of n = 2 can only be observed for coarser category levels. However, all pair-wise differences for n > 1 are statistically significant given the results of the 10-fold cross-validation.


Predictability Bounds of Electronic Health Records.

Dahlem D, Maniloff D, Ratti C - Sci Rep (2015)

n-gram model cross-entropy and corresponding predictability bands of the hold-out validation dataset.The statistics are computed for the categories CAT4 through CAT2 and for the original dataset D, individual histories shuffled D′, and the entire dataset shuffled D″ (A,B) and for the Brown Corpus (C,D).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4493571&req=5

f3: n-gram model cross-entropy and corresponding predictability bands of the hold-out validation dataset.The statistics are computed for the categories CAT4 through CAT2 and for the original dataset D, individual histories shuffled D′, and the entire dataset shuffled D″ (A,B) and for the Brown Corpus (C,D).
Mentions: Figure 3A shows the cross-entropy rate estimates from the n-gram models with orders 1 ≤ n ≤ 5. As the time-correlated entropy measure suggests, increasing the n-gram order will decrease the entropy and improve upon the predictability. In fact, with an n-gram order of 2 the upper predictability bound exceeds the previous Πcor = 29% by 61 percentage points to Πcor = 90% (see Fig. 3). As a consequence, only 10% of predictability is due to chance alone. An improvement of the upper bound of predictability beyond the order of n = 2 can only be observed for coarser category levels. However, all pair-wise differences for n > 1 are statistically significant given the results of the 10-fold cross-validation.

Bottom Line: In our analysis, we progress from zeroth order through temporal informed statistics, both from an individual patient's standpoint and also considering the collective effects.We complement this result by showing the point at which the temporal dependence structure vanishes with increasing orders of the time-correlated statistic.This apparent contradiction with respect to the importance of time-ordered information is indicative of the complexities involved in capturing the health-care process and the difficulties associated with utilising this information in universal prediction algorithms.

View Article: PubMed Central - PubMed

Affiliation: 1] IBM Research-Ireland, Dublin 15, Ireland [2] Senseable City Lab, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.

ABSTRACT
The ability to intervene in disease progression given a person's disease history has the potential to solve one of society's most pressing issues: advancing health care delivery and reducing its cost. Controlling disease progression is inherently associated with the ability to predict possible future diseases given a patient's medical history. We invoke an information-theoretic methodology to quantify the level of predictability inherent in disease histories of a large electronic health records dataset with over half a million patients. In our analysis, we progress from zeroth order through temporal informed statistics, both from an individual patient's standpoint and also considering the collective effects. Our findings confirm our intuition that knowledge of common disease progressions results in higher predictability bounds than treating disease histories independently. We complement this result by showing the point at which the temporal dependence structure vanishes with increasing orders of the time-correlated statistic. Surprisingly, we also show that shuffling individual disease histories only marginally degrades the predictability bounds. This apparent contradiction with respect to the importance of time-ordered information is indicative of the complexities involved in capturing the health-care process and the difficulties associated with utilising this information in universal prediction algorithms.

No MeSH data available.


Related in: MedlinePlus