Limits...
Using web search query data to monitor dengue epidemics: a new model for neglected tropical disease surveillance.

Chan EH, Sahai V, Conrad C, Brownstein JS - PLoS Negl Trop Dis (2011)

Bottom Line: Surveillance efforts have turned to modern data sources, such as Internet search queries, which have been shown to be effective for monitoring influenza-like illnesses.The final models, fit using a training subset of the data, were cross-validated against both the overall dataset and a holdout subset of the data.All models were found to fit the data quite well, with validation correlations ranging from 0.82 to 0.99.

View Article: PubMed Central - PubMed

Affiliation: Children's Hospital Informatics Program, Harvard-Massachusetts Institute of Technology Division of Health Sciences and Technology, Boston, Massachusetts,USA.

ABSTRACT

Background: A variety of obstacles including bureaucracy and lack of resources have interfered with timely detection and reporting of dengue cases in many endemic countries. Surveillance efforts have turned to modern data sources, such as Internet search queries, which have been shown to be effective for monitoring influenza-like illnesses. However, few have evaluated the utility of web search query data for other diseases, especially those of high morbidity and mortality or where a vaccine may not exist. In this study, we aimed to assess whether web search queries are a viable data source for the early detection and monitoring of dengue epidemics.

Methodology/principal findings: Bolivia, Brazil, India, Indonesia and Singapore were chosen for analysis based on available data and adequate search volume. For each country, a univariate linear model was then built by fitting a time series of the fraction of Google search query volume for specific dengue-related queries from that country against a time series of official dengue case counts for a time-frame within 2003-2010. The specific combination of queries used was chosen to maximize model fit. Spurious spikes in the data were also removed prior to model fitting. The final models, fit using a training subset of the data, were cross-validated against both the overall dataset and a holdout subset of the data. All models were found to fit the data quite well, with validation correlations ranging from 0.82 to 0.99.

Conclusions/significance: Web search query data were found to be capable of tracking dengue activity in Bolivia, Brazil, India, Indonesia and Singapore. Whereas traditional dengue data from official sources are often not available until after some substantial delay, web search query data are available in near real-time. These data represent valuable complement to assist with traditional dengue surveillance.

Show MeSH

Related in: MedlinePlus

A comparison of the model-fitted and official case counts dengue epidemic curves in each country.The model-fitted epidemic curve as compared to the official case counts epidemic curve for dengue in each of the five countries for which a model built on Google search volume data was developed. Bolivia and Singapore are shown at a weekly resolution, the others on a monthly resolution. The activity index is a scaled measure of the case counts, representing the relative amount of dengue activity in each country on a scale from 0 to 100. Shaded regions indicate the season held out for testing the final models.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3104029&req=5

pntd-0001206-g001: A comparison of the model-fitted and official case counts dengue epidemic curves in each country.The model-fitted epidemic curve as compared to the official case counts epidemic curve for dengue in each of the five countries for which a model built on Google search volume data was developed. Bolivia and Singapore are shown at a weekly resolution, the others on a monthly resolution. The activity index is a scaled measure of the case counts, representing the relative amount of dengue activity in each country on a scale from 0 to 100. Shaded regions indicate the season held out for testing the final models.

Mentions: Model-fitted “expected” epidemic curves generally matched official case counts “observed” epidemic curves quite well for all five countries in most seasons, with the exception of Bolivia in 2007 when the model over-estimated the activity in that season, and India in 2005 for which it under-estimated (Figure 1). More formally, the correlation between values predicted by models fit to the training data and the holdout set as well as the overall dataset was generally quite high, ranging from 0.82 to 0.99 (Table 1).


Using web search query data to monitor dengue epidemics: a new model for neglected tropical disease surveillance.

Chan EH, Sahai V, Conrad C, Brownstein JS - PLoS Negl Trop Dis (2011)

A comparison of the model-fitted and official case counts dengue epidemic curves in each country.The model-fitted epidemic curve as compared to the official case counts epidemic curve for dengue in each of the five countries for which a model built on Google search volume data was developed. Bolivia and Singapore are shown at a weekly resolution, the others on a monthly resolution. The activity index is a scaled measure of the case counts, representing the relative amount of dengue activity in each country on a scale from 0 to 100. Shaded regions indicate the season held out for testing the final models.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3104029&req=5

pntd-0001206-g001: A comparison of the model-fitted and official case counts dengue epidemic curves in each country.The model-fitted epidemic curve as compared to the official case counts epidemic curve for dengue in each of the five countries for which a model built on Google search volume data was developed. Bolivia and Singapore are shown at a weekly resolution, the others on a monthly resolution. The activity index is a scaled measure of the case counts, representing the relative amount of dengue activity in each country on a scale from 0 to 100. Shaded regions indicate the season held out for testing the final models.
Mentions: Model-fitted “expected” epidemic curves generally matched official case counts “observed” epidemic curves quite well for all five countries in most seasons, with the exception of Bolivia in 2007 when the model over-estimated the activity in that season, and India in 2005 for which it under-estimated (Figure 1). More formally, the correlation between values predicted by models fit to the training data and the holdout set as well as the overall dataset was generally quite high, ranging from 0.82 to 0.99 (Table 1).

Bottom Line: Surveillance efforts have turned to modern data sources, such as Internet search queries, which have been shown to be effective for monitoring influenza-like illnesses.The final models, fit using a training subset of the data, were cross-validated against both the overall dataset and a holdout subset of the data.All models were found to fit the data quite well, with validation correlations ranging from 0.82 to 0.99.

View Article: PubMed Central - PubMed

Affiliation: Children's Hospital Informatics Program, Harvard-Massachusetts Institute of Technology Division of Health Sciences and Technology, Boston, Massachusetts,USA.

ABSTRACT

Background: A variety of obstacles including bureaucracy and lack of resources have interfered with timely detection and reporting of dengue cases in many endemic countries. Surveillance efforts have turned to modern data sources, such as Internet search queries, which have been shown to be effective for monitoring influenza-like illnesses. However, few have evaluated the utility of web search query data for other diseases, especially those of high morbidity and mortality or where a vaccine may not exist. In this study, we aimed to assess whether web search queries are a viable data source for the early detection and monitoring of dengue epidemics.

Methodology/principal findings: Bolivia, Brazil, India, Indonesia and Singapore were chosen for analysis based on available data and adequate search volume. For each country, a univariate linear model was then built by fitting a time series of the fraction of Google search query volume for specific dengue-related queries from that country against a time series of official dengue case counts for a time-frame within 2003-2010. The specific combination of queries used was chosen to maximize model fit. Spurious spikes in the data were also removed prior to model fitting. The final models, fit using a training subset of the data, were cross-validated against both the overall dataset and a holdout subset of the data. All models were found to fit the data quite well, with validation correlations ranging from 0.82 to 0.99.

Conclusions/significance: Web search query data were found to be capable of tracking dengue activity in Bolivia, Brazil, India, Indonesia and Singapore. Whereas traditional dengue data from official sources are often not available until after some substantial delay, web search query data are available in near real-time. These data represent valuable complement to assist with traditional dengue surveillance.

Show MeSH
Related in: MedlinePlus