Limits...
Avian Influenza Risk Surveillance in North America with Online Media

View Article: PubMed Central - PubMed

ABSTRACT

The use of Internet-based sources of information for health surveillance applications has increased in recent years, as a greater share of social and media activity happens through online channels. The potential surveillance value in online sources of information about emergent health events include early warning, situational awareness, risk perception and evaluation of health messaging among others. The challenge in harnessing these sources of data is the vast number of potential sources to monitor and developing the tools to translate dynamic unstructured content into actionable information. In this paper we investigated the use of one social media outlet, Twitter, for surveillance of avian influenza risk in North America. We collected AI-related messages over a five-month period and compared these to official surveillance records of AI outbreaks. A fully automated data extraction and analysis pipeline was developed to acquire, structure, and analyze social media messages in an online context. Two methods of outbreak detection; a static threshold and a cumulative-sum dynamic threshold; based on a time series model of normal activity were evaluated for their ability to discern important time periods of AI-related messaging and media activity. Our findings show that peaks in activity were related to real-world events, with outbreaks in Nigeria, France and the USA receiving the most attention while those in China were less evident in the social media data. Topic models found themes related to specific AI events for the dynamic threshold method, while many for the static method were ambiguous. Further analyses of these data might focus on quantifying the bias in coverage and relation between outbreak characteristics and detectability in social media data. Finally, while the analyses here focused on broad themes and trends, there is likely additional value in developing methods for identifying low-frequency messages, operationalizing this methodology into a comprehensive system for visualizing patterns extracted from the Internet, and integrating these data with other sources of information such as wildlife, environment, and agricultural data.

No MeSH data available.


Observed daily time series of AI-related Twitter activity, black circles indicate significant errors (possible outbreaks) based on the dynamic threshold criterion (blue line).
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5120807&req=5

pone.0165688.g006: Observed daily time series of AI-related Twitter activity, black circles indicate significant errors (possible outbreaks) based on the dynamic threshold criterion (blue line).

Mentions: Two methods were used for outbreak detection. A static threshold (denoted by the horizontal line in Fig 5) was determined based on the 95% confidence interval for the process mean. In Fig 5 days that exceed this threshold are denoted with a dark circle. In total, there were 34 days that were identified as anomalous using this method. The cusum algorithm was much more conservative in nature, identifying only 4 days that were unexpected (Fig 6).


Avian Influenza Risk Surveillance in North America with Online Media
Observed daily time series of AI-related Twitter activity, black circles indicate significant errors (possible outbreaks) based on the dynamic threshold criterion (blue line).
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5120807&req=5

pone.0165688.g006: Observed daily time series of AI-related Twitter activity, black circles indicate significant errors (possible outbreaks) based on the dynamic threshold criterion (blue line).
Mentions: Two methods were used for outbreak detection. A static threshold (denoted by the horizontal line in Fig 5) was determined based on the 95% confidence interval for the process mean. In Fig 5 days that exceed this threshold are denoted with a dark circle. In total, there were 34 days that were identified as anomalous using this method. The cusum algorithm was much more conservative in nature, identifying only 4 days that were unexpected (Fig 6).

View Article: PubMed Central - PubMed

ABSTRACT

The use of Internet-based sources of information for health surveillance applications has increased in recent years, as a greater share of social and media activity happens through online channels. The potential surveillance value in online sources of information about emergent health events include early warning, situational awareness, risk perception and evaluation of health messaging among others. The challenge in harnessing these sources of data is the vast number of potential sources to monitor and developing the tools to translate dynamic unstructured content into actionable information. In this paper we investigated the use of one social media outlet, Twitter, for surveillance of avian influenza risk in North America. We collected AI-related messages over a five-month period and compared these to official surveillance records of AI outbreaks. A fully automated data extraction and analysis pipeline was developed to acquire, structure, and analyze social media messages in an online context. Two methods of outbreak detection; a static threshold and a cumulative-sum dynamic threshold; based on a time series model of normal activity were evaluated for their ability to discern important time periods of AI-related messaging and media activity. Our findings show that peaks in activity were related to real-world events, with outbreaks in Nigeria, France and the USA receiving the most attention while those in China were less evident in the social media data. Topic models found themes related to specific AI events for the dynamic threshold method, while many for the static method were ambiguous. Further analyses of these data might focus on quantifying the bias in coverage and relation between outbreak characteristics and detectability in social media data. Finally, while the analyses here focused on broad themes and trends, there is likely additional value in developing methods for identifying low-frequency messages, operationalizing this methodology into a comprehensive system for visualizing patterns extracted from the Internet, and integrating these data with other sources of information such as wildlife, environment, and agricultural data.

No MeSH data available.