Limits...
Avian Influenza Risk Surveillance in North America with Online Media

View Article: PubMed Central - PubMed

ABSTRACT

The use of Internet-based sources of information for health surveillance applications has increased in recent years, as a greater share of social and media activity happens through online channels. The potential surveillance value in online sources of information about emergent health events include early warning, situational awareness, risk perception and evaluation of health messaging among others. The challenge in harnessing these sources of data is the vast number of potential sources to monitor and developing the tools to translate dynamic unstructured content into actionable information. In this paper we investigated the use of one social media outlet, Twitter, for surveillance of avian influenza risk in North America. We collected AI-related messages over a five-month period and compared these to official surveillance records of AI outbreaks. A fully automated data extraction and analysis pipeline was developed to acquire, structure, and analyze social media messages in an online context. Two methods of outbreak detection; a static threshold and a cumulative-sum dynamic threshold; based on a time series model of normal activity were evaluated for their ability to discern important time periods of AI-related messaging and media activity. Our findings show that peaks in activity were related to real-world events, with outbreaks in Nigeria, France and the USA receiving the most attention while those in China were less evident in the social media data. Topic models found themes related to specific AI events for the dynamic threshold method, while many for the static method were ambiguous. Further analyses of these data might focus on quantifying the bias in coverage and relation between outbreak characteristics and detectability in social media data. Finally, while the analyses here focused on broad themes and trends, there is likely additional value in developing methods for identifying low-frequency messages, operationalizing this methodology into a comprehensive system for visualizing patterns extracted from the Internet, and integrating these data with other sources of information such as wildlife, environment, and agricultural data.

No MeSH data available.


Time series of AI reports provided to the OIE during the study period.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5120807&req=5

pone.0165688.g004: Time series of AI reports provided to the OIE during the study period.

Mentions: OIE data extracted for the coincident time periods are given in Fig 4, giving the total number of reports by month and the relative distribution of AI virus types described in the reports. Qualitatively, the distributions look similar, with the highest number of reports in January. In December, several spikes in the Twitter data at the daily scale indicate a lot of activity during this month as well, and in the OIE data December was the second highest reporting month. To investigate further, we enumerated the month-to-month correlation between the two datasets, finding a Pearson’s correlation coefficient of 0.746, indicating a strong positive association between the monthly observations. However, given the low sample size, we cannot place much confidence in this finding. The difference in magnitude precludes direct comparison at a more granular temporal scale.


Avian Influenza Risk Surveillance in North America with Online Media
Time series of AI reports provided to the OIE during the study period.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5120807&req=5

pone.0165688.g004: Time series of AI reports provided to the OIE during the study period.
Mentions: OIE data extracted for the coincident time periods are given in Fig 4, giving the total number of reports by month and the relative distribution of AI virus types described in the reports. Qualitatively, the distributions look similar, with the highest number of reports in January. In December, several spikes in the Twitter data at the daily scale indicate a lot of activity during this month as well, and in the OIE data December was the second highest reporting month. To investigate further, we enumerated the month-to-month correlation between the two datasets, finding a Pearson’s correlation coefficient of 0.746, indicating a strong positive association between the monthly observations. However, given the low sample size, we cannot place much confidence in this finding. The difference in magnitude precludes direct comparison at a more granular temporal scale.

View Article: PubMed Central - PubMed

ABSTRACT

The use of Internet-based sources of information for health surveillance applications has increased in recent years, as a greater share of social and media activity happens through online channels. The potential surveillance value in online sources of information about emergent health events include early warning, situational awareness, risk perception and evaluation of health messaging among others. The challenge in harnessing these sources of data is the vast number of potential sources to monitor and developing the tools to translate dynamic unstructured content into actionable information. In this paper we investigated the use of one social media outlet, Twitter, for surveillance of avian influenza risk in North America. We collected AI-related messages over a five-month period and compared these to official surveillance records of AI outbreaks. A fully automated data extraction and analysis pipeline was developed to acquire, structure, and analyze social media messages in an online context. Two methods of outbreak detection; a static threshold and a cumulative-sum dynamic threshold; based on a time series model of normal activity were evaluated for their ability to discern important time periods of AI-related messaging and media activity. Our findings show that peaks in activity were related to real-world events, with outbreaks in Nigeria, France and the USA receiving the most attention while those in China were less evident in the social media data. Topic models found themes related to specific AI events for the dynamic threshold method, while many for the static method were ambiguous. Further analyses of these data might focus on quantifying the bias in coverage and relation between outbreak characteristics and detectability in social media data. Finally, while the analyses here focused on broad themes and trends, there is likely additional value in developing methods for identifying low-frequency messages, operationalizing this methodology into a comprehensive system for visualizing patterns extracted from the Internet, and integrating these data with other sources of information such as wildlife, environment, and agricultural data.

No MeSH data available.