Limits...
Personalized Predictive Modeling and Risk Factor Identification using Patient Similarity.

Ng K, Sun J, Hu J, Wang F - AMIA Jt Summits Transl Sci Proc (2015)

Bottom Line: Compared to global models trained on all patients, they have the potential to produce more accurate risk scores and capture more relevant risk factors for individual patients.A 15,000 patient data set, derived from electronic health records, is used to evaluate the approach.The predictive results show that the personalized models can outperform the global model.

View Article: PubMed Central - PubMed

Affiliation: IBM T. J. Watson Research Center, Yorktown Heights, NY, USA.

ABSTRACT
Personalized predictive models are customized for an individual patient and trained using information from similar patients. Compared to global models trained on all patients, they have the potential to produce more accurate risk scores and capture more relevant risk factors for individual patients. This paper presents an approach for building personalized predictive models and generating personalized risk factor profiles. A locally supervised metric learning (LSML) similarity measure is trained for diabetes onset and used to find clinically similar patients. Personalized risk profiles are created by analyzing the parameters of the trained personalized logistic regression models. A 15,000 patient data set, derived from electronic health records, is used to evaluate the approach. The predictive results show that the personalized models can outperform the global model. Cluster analysis of the risk profiles show groups of patients with similar risk factors, differences in the top risk factors for different groups of patients and differences between the individual and global risk factors.

No MeSH data available.


Related in: MedlinePlus

Hierarchical heat map plot showing the top risk factors for diabetes onset identified by the personalized predictive models for 500 randomly selected patients. Patient specific risk factor profiles (the columns) are clustered along the horizontal axis. Risk factors (the rows) are clustered along the vertical axis. Risk factors captured by the global model are highlighted and have a * prefix in the name. The risk score for each patient is plotted as a vertical bar along the bottom.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4525240&req=5

f2-2087441: Hierarchical heat map plot showing the top risk factors for diabetes onset identified by the personalized predictive models for 500 randomly selected patients. Patient specific risk factor profiles (the columns) are clustered along the horizontal axis. Risk factors (the rows) are clustered along the vertical axis. Risk factors captured by the global model are highlighted and have a * prefix in the name. The risk score for each patient is plotted as a vertical bar along the bottom.

Mentions: To facilitate the analysis of the characteristics and distribution of the patient specific risk factors, agglomerative hierarchical clustering (using a Euclidean distance measure) is performed on the personalized risk factor profiles. Figure 2 is a hierarchical heat map plot showing the top risk factors identified by the personalized predictive models for 500 randomly selected patients. The patient specific risk factor profiles (i.e., the columns) are clustered along the horizontal axis. The individual risk factors (i.e., the rows) are clustered along the vertical axis. The color in the heat map corresponds to the risk factor score values (i.e., beta coefficient values) in the patient risk profiles: red is high while blue is low. Analysis of the risk factor profile clusters shows that some patients share very similar risk factors and are grouped together in the same cluster whereas other patients have very different and almost non-overlapping risk factors and belong to groups that are far apart in the cluster tree. The patient specific risk scores are plotted as vertical bars along the bottom of the horizontal axis; the longer the bar, the higher the risk score. Patients with certain risk factor profiles have consistently higher risk scores. For example, patients with high values For “PROCEDURE:CPT:83086 [glycosylated hemoglobin test]” and “LAB:hemoglobin.a1c/hemoglobin.total” in their risk profiles have much higher risk scores than those with low values for these factors. Patients with similar risk scores can have very different risk factors. For example, the three case patients, highlighted in green as A, B and C on the bottom axis of Figure 2, all have risk scores around 0.75 but the top risk factors for each patient are different: “LAB: estimated glomerular filtration rate” For patient A, “DIAGNOSIS:ICD9:278.00 [obesity nos]” for patient B, and “PROCEDURE:CPT:83036 [glycosylated hemoglobin test]” and “LAB:hemoglobin.a1c/hemoglobin.total” for patient C. The personalized risk factor for each patient can also differ from the risk factors captured by the global model (indicated by highlighted risk factor names with a * prefix). Indeed, a large number of risk factors not captured by the global model are identified in the personalized models as useful predictors. Finally, the risk factor clusters along the vertical axis can be used to identify groups of risk factors that have high co-occurrence rates across the patient risk factor profiles. For example, “PROCEDURE:CPT:84153 [assay of psa, total]” and “DIAGNOSIS:ICD9:v70.0 [routine medical exam]” frequently occur together.


Personalized Predictive Modeling and Risk Factor Identification using Patient Similarity.

Ng K, Sun J, Hu J, Wang F - AMIA Jt Summits Transl Sci Proc (2015)

Hierarchical heat map plot showing the top risk factors for diabetes onset identified by the personalized predictive models for 500 randomly selected patients. Patient specific risk factor profiles (the columns) are clustered along the horizontal axis. Risk factors (the rows) are clustered along the vertical axis. Risk factors captured by the global model are highlighted and have a * prefix in the name. The risk score for each patient is plotted as a vertical bar along the bottom.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4525240&req=5

f2-2087441: Hierarchical heat map plot showing the top risk factors for diabetes onset identified by the personalized predictive models for 500 randomly selected patients. Patient specific risk factor profiles (the columns) are clustered along the horizontal axis. Risk factors (the rows) are clustered along the vertical axis. Risk factors captured by the global model are highlighted and have a * prefix in the name. The risk score for each patient is plotted as a vertical bar along the bottom.
Mentions: To facilitate the analysis of the characteristics and distribution of the patient specific risk factors, agglomerative hierarchical clustering (using a Euclidean distance measure) is performed on the personalized risk factor profiles. Figure 2 is a hierarchical heat map plot showing the top risk factors identified by the personalized predictive models for 500 randomly selected patients. The patient specific risk factor profiles (i.e., the columns) are clustered along the horizontal axis. The individual risk factors (i.e., the rows) are clustered along the vertical axis. The color in the heat map corresponds to the risk factor score values (i.e., beta coefficient values) in the patient risk profiles: red is high while blue is low. Analysis of the risk factor profile clusters shows that some patients share very similar risk factors and are grouped together in the same cluster whereas other patients have very different and almost non-overlapping risk factors and belong to groups that are far apart in the cluster tree. The patient specific risk scores are plotted as vertical bars along the bottom of the horizontal axis; the longer the bar, the higher the risk score. Patients with certain risk factor profiles have consistently higher risk scores. For example, patients with high values For “PROCEDURE:CPT:83086 [glycosylated hemoglobin test]” and “LAB:hemoglobin.a1c/hemoglobin.total” in their risk profiles have much higher risk scores than those with low values for these factors. Patients with similar risk scores can have very different risk factors. For example, the three case patients, highlighted in green as A, B and C on the bottom axis of Figure 2, all have risk scores around 0.75 but the top risk factors for each patient are different: “LAB: estimated glomerular filtration rate” For patient A, “DIAGNOSIS:ICD9:278.00 [obesity nos]” for patient B, and “PROCEDURE:CPT:83036 [glycosylated hemoglobin test]” and “LAB:hemoglobin.a1c/hemoglobin.total” for patient C. The personalized risk factor for each patient can also differ from the risk factors captured by the global model (indicated by highlighted risk factor names with a * prefix). Indeed, a large number of risk factors not captured by the global model are identified in the personalized models as useful predictors. Finally, the risk factor clusters along the vertical axis can be used to identify groups of risk factors that have high co-occurrence rates across the patient risk factor profiles. For example, “PROCEDURE:CPT:84153 [assay of psa, total]” and “DIAGNOSIS:ICD9:v70.0 [routine medical exam]” frequently occur together.

Bottom Line: Compared to global models trained on all patients, they have the potential to produce more accurate risk scores and capture more relevant risk factors for individual patients.A 15,000 patient data set, derived from electronic health records, is used to evaluate the approach.The predictive results show that the personalized models can outperform the global model.

View Article: PubMed Central - PubMed

Affiliation: IBM T. J. Watson Research Center, Yorktown Heights, NY, USA.

ABSTRACT
Personalized predictive models are customized for an individual patient and trained using information from similar patients. Compared to global models trained on all patients, they have the potential to produce more accurate risk scores and capture more relevant risk factors for individual patients. This paper presents an approach for building personalized predictive models and generating personalized risk factor profiles. A locally supervised metric learning (LSML) similarity measure is trained for diabetes onset and used to find clinically similar patients. Personalized risk profiles are created by analyzing the parameters of the trained personalized logistic regression models. A 15,000 patient data set, derived from electronic health records, is used to evaluate the approach. The predictive results show that the personalized models can outperform the global model. Cluster analysis of the risk profiles show groups of patients with similar risk factors, differences in the top risk factors for different groups of patients and differences between the individual and global risk factors.

No MeSH data available.


Related in: MedlinePlus