Limits...
Fuzzy association rule mining and classification for the prediction of malaria in South Korea.

Buczak AL, Baugher B, Guven E, Ramac-Thomas LC, Elbert Y, Babin SM, Lewis SH - BMC Med Inform Decis Mak (2015)

Bottom Line: The overall FARM results (as measured by F-scores) are significantly better than those obtained by Decision Tree, Random Forest, Support Vector Machine, and Holt-Winters methods for the HIGH class.For the Medium class, Random Forest and FARM obtain comparable results, with FARM being better at F0.5, and Random Forest obtaining a higher F3.This paper demonstrates that our data driven approach can be used for the prediction of different diseases.

View Article: PubMed Central - PubMed

Affiliation: Johns Hopkins University Applied Physics Laboratory, 11100 Johns Hopkins Rd, Laurel, MD, 20723-6099, USA. anna.buczak@jhuapl.edu.

ABSTRACT

Background: Malaria is the world's most prevalent vector-borne disease. Accurate prediction of malaria outbreaks may lead to public health interventions that mitigate disease morbidity and mortality.

Methods: We describe an application of a method for creating prediction models utilizing Fuzzy Association Rule Mining to extract relationships between epidemiological, meteorological, climatic, and socio-economic data from Korea. These relationships are in the form of rules, from which the best set of rules is automatically chosen and forms a classifier. Two classifiers have been built and their results fused to become a malaria prediction model. Future malaria cases are predicted as Low, Medium or High, where these classes are defined as a total of 0-2, 3-16, and above 17 cases, respectively, for a region in South Korea during a two-week period. Based on user recommendations, HIGH is considered an outbreak.

Results: Model accuracy is described by Positive Predictive Value (PPV), Sensitivity, and F-score for each class, computed on test data not previously used to develop the model. For predictions made 7-8 weeks in advance, model PPV and Sensitivity are 0.842 and 0.681, respectively, for the HIGH classes. The F0.5 and F3 scores (which combine PPV and Sensitivity) are 0.804 and 0.694, respectively, for the HIGH classes. The overall FARM results (as measured by F-scores) are significantly better than those obtained by Decision Tree, Random Forest, Support Vector Machine, and Holt-Winters methods for the HIGH class. For the Medium class, Random Forest and FARM obtain comparable results, with FARM being better at F0.5, and Random Forest obtaining a higher F3.

Conclusions: A previously described method for creating disease prediction models has been modified and extended to build models for predicting malaria. In addition, some new input variables were used, including indicators of intervention measures. The South Korea malaria prediction models predict Low, Medium or High cases 7-8 weeks in the future. This paper demonstrates that our data driven approach can be used for the prediction of different diseases.

No MeSH data available.


Related in: MedlinePlus

Malaria incidence rate for eight example regions
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4472166&req=5

Fig4: Malaria incidence rate for eight example regions

Mentions: where α is some constant scaling factor. The high and low classes were determined by selecting a threshold to divide the data into the two classes. This incidence rate threshold was calculated using the training data for all regions and was computed as5\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ T = \mu + \beta \sigma $$\end{document}Τ=μ+βσwhere μ was the mean, σ was the standard deviation and β was some constant. Figure 4 plots the malaria incidence rates for eight Korean regions. Notice that the dark blue peaks in the 5th through 8th years are much higher than the others and that this characteristic increased both the mean and standard deviation used for computing the threshold. Although the resulting incidence rate threshold worked for providing two-class training data, the outlier peaks in years 5 through 8 skewed the threshold computation so that it was too large to provide enough HIGH class samples for the validation and testing years of data.Fig. 4


Fuzzy association rule mining and classification for the prediction of malaria in South Korea.

Buczak AL, Baugher B, Guven E, Ramac-Thomas LC, Elbert Y, Babin SM, Lewis SH - BMC Med Inform Decis Mak (2015)

Malaria incidence rate for eight example regions
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4472166&req=5

Fig4: Malaria incidence rate for eight example regions
Mentions: where α is some constant scaling factor. The high and low classes were determined by selecting a threshold to divide the data into the two classes. This incidence rate threshold was calculated using the training data for all regions and was computed as5\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ T = \mu + \beta \sigma $$\end{document}Τ=μ+βσwhere μ was the mean, σ was the standard deviation and β was some constant. Figure 4 plots the malaria incidence rates for eight Korean regions. Notice that the dark blue peaks in the 5th through 8th years are much higher than the others and that this characteristic increased both the mean and standard deviation used for computing the threshold. Although the resulting incidence rate threshold worked for providing two-class training data, the outlier peaks in years 5 through 8 skewed the threshold computation so that it was too large to provide enough HIGH class samples for the validation and testing years of data.Fig. 4

Bottom Line: The overall FARM results (as measured by F-scores) are significantly better than those obtained by Decision Tree, Random Forest, Support Vector Machine, and Holt-Winters methods for the HIGH class.For the Medium class, Random Forest and FARM obtain comparable results, with FARM being better at F0.5, and Random Forest obtaining a higher F3.This paper demonstrates that our data driven approach can be used for the prediction of different diseases.

View Article: PubMed Central - PubMed

Affiliation: Johns Hopkins University Applied Physics Laboratory, 11100 Johns Hopkins Rd, Laurel, MD, 20723-6099, USA. anna.buczak@jhuapl.edu.

ABSTRACT

Background: Malaria is the world's most prevalent vector-borne disease. Accurate prediction of malaria outbreaks may lead to public health interventions that mitigate disease morbidity and mortality.

Methods: We describe an application of a method for creating prediction models utilizing Fuzzy Association Rule Mining to extract relationships between epidemiological, meteorological, climatic, and socio-economic data from Korea. These relationships are in the form of rules, from which the best set of rules is automatically chosen and forms a classifier. Two classifiers have been built and their results fused to become a malaria prediction model. Future malaria cases are predicted as Low, Medium or High, where these classes are defined as a total of 0-2, 3-16, and above 17 cases, respectively, for a region in South Korea during a two-week period. Based on user recommendations, HIGH is considered an outbreak.

Results: Model accuracy is described by Positive Predictive Value (PPV), Sensitivity, and F-score for each class, computed on test data not previously used to develop the model. For predictions made 7-8 weeks in advance, model PPV and Sensitivity are 0.842 and 0.681, respectively, for the HIGH classes. The F0.5 and F3 scores (which combine PPV and Sensitivity) are 0.804 and 0.694, respectively, for the HIGH classes. The overall FARM results (as measured by F-scores) are significantly better than those obtained by Decision Tree, Random Forest, Support Vector Machine, and Holt-Winters methods for the HIGH class. For the Medium class, Random Forest and FARM obtain comparable results, with FARM being better at F0.5, and Random Forest obtaining a higher F3.

Conclusions: A previously described method for creating disease prediction models has been modified and extended to build models for predicting malaria. In addition, some new input variables were used, including indicators of intervention measures. The South Korea malaria prediction models predict Low, Medium or High cases 7-8 weeks in the future. This paper demonstrates that our data driven approach can be used for the prediction of different diseases.

No MeSH data available.


Related in: MedlinePlus