Limits...
Prediction of adverse cardiac events in emergency department patients with chest pain using machine learning for variable selection.

Liu N, Koh ZX, Goh J, Lin Z, Haaland B, Ting BP, Ong ME - BMC Med Inform Decis Mak (2014)

Bottom Line: Out of 702 patients, 29 (4.1%) met the primary outcome.We conclude that more predictors do not necessarily guarantee better prediction results.Furthermore, machine learning-based variable selection seems promising in discovering a few relevant and significant measures as predictors.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Emergency Medicine, Singapore General Hospital, Outram Road, Singapore 169608, Singapore. marcus.ong.e.h@sgh.com.sg.

ABSTRACT

Background: The key aim of triage in chest pain patients is to identify those with high risk of adverse cardiac events as they require intensive monitoring and early intervention. In this study, we aim to discover the most relevant variables for risk prediction of major adverse cardiac events (MACE) using clinical signs and heart rate variability.

Methods: A total of 702 chest pain patients at the Emergency Department (ED) of a tertiary hospital in Singapore were included in this study. The recruited patients were at least 30 years of age and who presented to the ED with a primary complaint of non-traumatic chest pain. The primary outcome was a composite of MACE such as death and cardiac arrest within 72 h of arrival at the ED. For each patient, eight clinical signs such as blood pressure and temperature were measured, and a 5-min ECG was recorded to derive heart rate variability parameters. A random forest-based novel method was developed to select the most relevant variables. A geometric distance-based machine learning scoring system was then implemented to derive a risk score from 0 to 100.

Results: Out of 702 patients, 29 (4.1%) met the primary outcome. We selected the 3 most relevant variables for predicting MACE, which were systolic blood pressure, the mean RR interval and the mean instantaneous heart rate. The scoring system with these 3 variables produced an area under the curve (AUC) of 0.812, and a cutoff score of 43 gave a sensitivity of 82.8% and specificity of 63.4%, while the scoring system with all the 23 variables had an AUC of 0.736, and a cutoff score of 49 gave a sensitivity of 72.4% and specificity of 63.0%. Conventional thrombolysis in myocardial infarction score and the modified early warning score achieved AUC values of 0.637 and 0.622, respectively.

Conclusions: It is observed that a few predictors outperformed the whole set of variables in predicting MACE within 72 h. We conclude that more predictors do not necessarily guarantee better prediction results. Furthermore, machine learning-based variable selection seems promising in discovering a few relevant and significant measures as predictors.

Show MeSH

Related in: MedlinePlus

Variable selection algorithm. This algorithm creates 500 data subsets for subsequent analysis. Each subset combines 29 patients with MACE and 29 randomly selected patients without MACE. Then, the algorithm runs random forest on each subset to pick 8 top-ranked variables. Having 500 sets of top-ranked variables, the algorithm sorts them according to their corresponding occurrence in the ensemble and chooses 8 variables with the highest appearance. The selection is refined by means of the statistical significance of each individual variable.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4150554&req=5

Figure 1: Variable selection algorithm. This algorithm creates 500 data subsets for subsequent analysis. Each subset combines 29 patients with MACE and 29 randomly selected patients without MACE. Then, the algorithm runs random forest on each subset to pick 8 top-ranked variables. Having 500 sets of top-ranked variables, the algorithm sorts them according to their corresponding occurrence in the ensemble and chooses 8 variables with the highest appearance. The selection is refined by means of the statistical significance of each individual variable.

Mentions: Our proposed variable selection framework is elaborated as follows. Firstly, 29 out of 673 patients (without MACE) were randomly selected and combined with all 29 patients (with MACE) to construct a new subset, on which RF was used to pick eight top-ranked variables. Secondly, the above random sampling process was repeated 500 times to create an ensemble of top-ranked variables. Thirdly, variables in the ensemble were accumulated and sorted according to their corresponding occurrences. As a result, a total of 500 individual models were created by RF, with each picking eight variables to form an ensemble of predictors. In this particular study, eight top-ranked variables were determined as potential predictors of the primary outcome. To optimize the selected variables for future validation, 10-fold cross-validation was implemented to avoid over-training during model construction. Lastly, the statistical significance of each variable was measured. If any one of the eight selected variables was not significant in terms of p-value, it was excluded. FigureĀ 1 depicts the variable selection method.


Prediction of adverse cardiac events in emergency department patients with chest pain using machine learning for variable selection.

Liu N, Koh ZX, Goh J, Lin Z, Haaland B, Ting BP, Ong ME - BMC Med Inform Decis Mak (2014)

Variable selection algorithm. This algorithm creates 500 data subsets for subsequent analysis. Each subset combines 29 patients with MACE and 29 randomly selected patients without MACE. Then, the algorithm runs random forest on each subset to pick 8 top-ranked variables. Having 500 sets of top-ranked variables, the algorithm sorts them according to their corresponding occurrence in the ensemble and chooses 8 variables with the highest appearance. The selection is refined by means of the statistical significance of each individual variable.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4150554&req=5

Figure 1: Variable selection algorithm. This algorithm creates 500 data subsets for subsequent analysis. Each subset combines 29 patients with MACE and 29 randomly selected patients without MACE. Then, the algorithm runs random forest on each subset to pick 8 top-ranked variables. Having 500 sets of top-ranked variables, the algorithm sorts them according to their corresponding occurrence in the ensemble and chooses 8 variables with the highest appearance. The selection is refined by means of the statistical significance of each individual variable.
Mentions: Our proposed variable selection framework is elaborated as follows. Firstly, 29 out of 673 patients (without MACE) were randomly selected and combined with all 29 patients (with MACE) to construct a new subset, on which RF was used to pick eight top-ranked variables. Secondly, the above random sampling process was repeated 500 times to create an ensemble of top-ranked variables. Thirdly, variables in the ensemble were accumulated and sorted according to their corresponding occurrences. As a result, a total of 500 individual models were created by RF, with each picking eight variables to form an ensemble of predictors. In this particular study, eight top-ranked variables were determined as potential predictors of the primary outcome. To optimize the selected variables for future validation, 10-fold cross-validation was implemented to avoid over-training during model construction. Lastly, the statistical significance of each variable was measured. If any one of the eight selected variables was not significant in terms of p-value, it was excluded. FigureĀ 1 depicts the variable selection method.

Bottom Line: Out of 702 patients, 29 (4.1%) met the primary outcome.We conclude that more predictors do not necessarily guarantee better prediction results.Furthermore, machine learning-based variable selection seems promising in discovering a few relevant and significant measures as predictors.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Emergency Medicine, Singapore General Hospital, Outram Road, Singapore 169608, Singapore. marcus.ong.e.h@sgh.com.sg.

ABSTRACT

Background: The key aim of triage in chest pain patients is to identify those with high risk of adverse cardiac events as they require intensive monitoring and early intervention. In this study, we aim to discover the most relevant variables for risk prediction of major adverse cardiac events (MACE) using clinical signs and heart rate variability.

Methods: A total of 702 chest pain patients at the Emergency Department (ED) of a tertiary hospital in Singapore were included in this study. The recruited patients were at least 30 years of age and who presented to the ED with a primary complaint of non-traumatic chest pain. The primary outcome was a composite of MACE such as death and cardiac arrest within 72 h of arrival at the ED. For each patient, eight clinical signs such as blood pressure and temperature were measured, and a 5-min ECG was recorded to derive heart rate variability parameters. A random forest-based novel method was developed to select the most relevant variables. A geometric distance-based machine learning scoring system was then implemented to derive a risk score from 0 to 100.

Results: Out of 702 patients, 29 (4.1%) met the primary outcome. We selected the 3 most relevant variables for predicting MACE, which were systolic blood pressure, the mean RR interval and the mean instantaneous heart rate. The scoring system with these 3 variables produced an area under the curve (AUC) of 0.812, and a cutoff score of 43 gave a sensitivity of 82.8% and specificity of 63.4%, while the scoring system with all the 23 variables had an AUC of 0.736, and a cutoff score of 49 gave a sensitivity of 72.4% and specificity of 63.0%. Conventional thrombolysis in myocardial infarction score and the modified early warning score achieved AUC values of 0.637 and 0.622, respectively.

Conclusions: It is observed that a few predictors outperformed the whole set of variables in predicting MACE within 72 h. We conclude that more predictors do not necessarily guarantee better prediction results. Furthermore, machine learning-based variable selection seems promising in discovering a few relevant and significant measures as predictors.

Show MeSH
Related in: MedlinePlus