Limits...
Avian Influenza Risk Surveillance in North America with Online Media

View Article: PubMed Central - PubMed

ABSTRACT

The use of Internet-based sources of information for health surveillance applications has increased in recent years, as a greater share of social and media activity happens through online channels. The potential surveillance value in online sources of information about emergent health events include early warning, situational awareness, risk perception and evaluation of health messaging among others. The challenge in harnessing these sources of data is the vast number of potential sources to monitor and developing the tools to translate dynamic unstructured content into actionable information. In this paper we investigated the use of one social media outlet, Twitter, for surveillance of avian influenza risk in North America. We collected AI-related messages over a five-month period and compared these to official surveillance records of AI outbreaks. A fully automated data extraction and analysis pipeline was developed to acquire, structure, and analyze social media messages in an online context. Two methods of outbreak detection; a static threshold and a cumulative-sum dynamic threshold; based on a time series model of normal activity were evaluated for their ability to discern important time periods of AI-related messaging and media activity. Our findings show that peaks in activity were related to real-world events, with outbreaks in Nigeria, France and the USA receiving the most attention while those in China were less evident in the social media data. Topic models found themes related to specific AI events for the dynamic threshold method, while many for the static method were ambiguous. Further analyses of these data might focus on quantifying the bias in coverage and relation between outbreak characteristics and detectability in social media data. Finally, while the analyses here focused on broad themes and trends, there is likely additional value in developing methods for identifying low-frequency messages, operationalizing this methodology into a comprehensive system for visualizing patterns extracted from the Internet, and integrating these data with other sources of information such as wildlife, environment, and agricultural data.

No MeSH data available.


Data acquisition and processing pipeline
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5120807&req=5

pone.0165688.g001: Data acquisition and processing pipeline

Mentions: We developed a complete data processing and analysis pipeline in order structure and analyze the Twitter database. Our aim was to produce an online-capable set of methods that analyzed data as it arrived, rather than a purely retrospective analysis of the dataset. This was done in order to reflect a more realistic surveillance and/or situational awareness use-case for monitoring online content for disease-related information. A schematic view of the processing and analysis pipeline is presented in Fig 1. During the study period, daily outputs included a 2-week time series graph, a 2-week wordcloud, and a full-time series graph.


Avian Influenza Risk Surveillance in North America with Online Media
Data acquisition and processing pipeline
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5120807&req=5

pone.0165688.g001: Data acquisition and processing pipeline
Mentions: We developed a complete data processing and analysis pipeline in order structure and analyze the Twitter database. Our aim was to produce an online-capable set of methods that analyzed data as it arrived, rather than a purely retrospective analysis of the dataset. This was done in order to reflect a more realistic surveillance and/or situational awareness use-case for monitoring online content for disease-related information. A schematic view of the processing and analysis pipeline is presented in Fig 1. During the study period, daily outputs included a 2-week time series graph, a 2-week wordcloud, and a full-time series graph.

View Article: PubMed Central - PubMed

ABSTRACT

The use of Internet-based sources of information for health surveillance applications has increased in recent years, as a greater share of social and media activity happens through online channels. The potential surveillance value in online sources of information about emergent health events include early warning, situational awareness, risk perception and evaluation of health messaging among others. The challenge in harnessing these sources of data is the vast number of potential sources to monitor and developing the tools to translate dynamic unstructured content into actionable information. In this paper we investigated the use of one social media outlet, Twitter, for surveillance of avian influenza risk in North America. We collected AI-related messages over a five-month period and compared these to official surveillance records of AI outbreaks. A fully automated data extraction and analysis pipeline was developed to acquire, structure, and analyze social media messages in an online context. Two methods of outbreak detection; a static threshold and a cumulative-sum dynamic threshold; based on a time series model of normal activity were evaluated for their ability to discern important time periods of AI-related messaging and media activity. Our findings show that peaks in activity were related to real-world events, with outbreaks in Nigeria, France and the USA receiving the most attention while those in China were less evident in the social media data. Topic models found themes related to specific AI events for the dynamic threshold method, while many for the static method were ambiguous. Further analyses of these data might focus on quantifying the bias in coverage and relation between outbreak characteristics and detectability in social media data. Finally, while the analyses here focused on broad themes and trends, there is likely additional value in developing methods for identifying low-frequency messages, operationalizing this methodology into a comprehensive system for visualizing patterns extracted from the Internet, and integrating these data with other sources of information such as wildlife, environment, and agricultural data.

No MeSH data available.