Limits...
Towards an integrated food safety surveillance system: a simulation study to explore the potential of combining genomic and epidemiological metadata

View Article: PubMed Central - PubMed

ABSTRACT

Foodborne infection is a result of exposure to complex, dynamic food systems. The efficiency of foodborne infection is driven by ongoing shifts in genetic machinery. Next-generation sequencing technologies can provide high-fidelity data about the genetics of a pathogen. However, food safety surveillance systems do not currently provide similar high-fidelity epidemiological metadata to associate with genetic data. As a consequence, it is rarely possible to transform genetic data into actionable knowledge that can be used to genuinely inform risk assessment or prevent outbreaks. Big data approaches are touted as a revolution in decision support, and pose a potentially attractive method for closing the gap between the fidelity of genetic and epidemiological metadata for food safety surveillance. We therefore developed a simple food chain model to investigate the potential benefits of combining ‘big’ data sources, including both genetic and high-fidelity epidemiological metadata. Our results suggest that, as for any surveillance system, the collected data must be relevant and characterize the important dynamics of a system if we are to properly understand risk: this suggests the need to carefully consider data curation, rather than the more ambitious claims of big data proponents that unstructured and unrelated data sources can be combined to generate consistent insight. Of interest is that the biggest influencers of foodborne infection risk were contamination load and processing temperature, not genotype. This suggests that understanding food chain dynamics would probably more effectively generate insight into foodborne risk than prescribing the hazard in ever more detail in terms of genotype.

No MeSH data available.


Related in: MedlinePlus

Scatterplots showing the relationships between the processing parameters (averages over day) and the average number of reported cases per day. Solid blue lines represent least-squares regression fits with 95% confidence intervals (red dashed-dotted lines). For clarity, we show a common scale across the panels: some confidence intervals are therefore not shown (for example, the lower confidence intervals that fall below zero).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5383817&req=5

RSOS160721F6: Scatterplots showing the relationships between the processing parameters (averages over day) and the average number of reported cases per day. Solid blue lines represent least-squares regression fits with 95% confidence intervals (red dashed-dotted lines). For clarity, we show a common scale across the panels: some confidence intervals are therefore not shown (for example, the lower confidence intervals that fall below zero).

Mentions: The data array produced from the hypothetical food chain simulation gives, for each simulated food unit, the physical and genotype parameters and whether an infection with pathogen g occurred given the consumption of that food unit. This data array provides the foundation for the surveillance system analyses that follow. We first analysed the raw data array to better understand the relationships between parameters and reported infections. We plotted daily parameter averages against average number of reported cases per genotype for phenotype indicators (figure 5) and average cases per day for processing parameters (figure 6). We placed contamination rates in the processing parameter figure as these parameters are generated independently of the genotype. These simple analyses highlight that of the phenotypic indicators, only the thermal inactivation parameters while bacteria reside on the unit (D60g and Zg) and the infection and reporting parameters (pmg and vg), are firmly correlated with reporting rates (see appendix B for detailed definition of parameters). For the processing parameters, the temperature of the food unit appears to be a very strong driver of infection. The response to initial and post-process 1 contamination rates (N0k,g and N1k,g) was similar and showed a modest positive correlation (only N1k,g shown in the figure). While these simple linear fits are not robust statistical analyses (for example, in reality we would expect to see at least some nonlinear relationships, especially for those parameters driving cross-contamination), the visualizations do provide some good clues as to what parameters should drive infection and reporting rates. Therefore, we imported into the machine learning toolbox the four genotypic parameters with reasonable correlations to reported infection (pmg, vg, D60g and Zg), plus the contamination at the end of stage 1 processing and the temperature of the processing conditions during stage 2 processing.Figure 5.


Towards an integrated food safety surveillance system: a simulation study to explore the potential of combining genomic and epidemiological metadata
Scatterplots showing the relationships between the processing parameters (averages over day) and the average number of reported cases per day. Solid blue lines represent least-squares regression fits with 95% confidence intervals (red dashed-dotted lines). For clarity, we show a common scale across the panels: some confidence intervals are therefore not shown (for example, the lower confidence intervals that fall below zero).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5383817&req=5

RSOS160721F6: Scatterplots showing the relationships between the processing parameters (averages over day) and the average number of reported cases per day. Solid blue lines represent least-squares regression fits with 95% confidence intervals (red dashed-dotted lines). For clarity, we show a common scale across the panels: some confidence intervals are therefore not shown (for example, the lower confidence intervals that fall below zero).
Mentions: The data array produced from the hypothetical food chain simulation gives, for each simulated food unit, the physical and genotype parameters and whether an infection with pathogen g occurred given the consumption of that food unit. This data array provides the foundation for the surveillance system analyses that follow. We first analysed the raw data array to better understand the relationships between parameters and reported infections. We plotted daily parameter averages against average number of reported cases per genotype for phenotype indicators (figure 5) and average cases per day for processing parameters (figure 6). We placed contamination rates in the processing parameter figure as these parameters are generated independently of the genotype. These simple analyses highlight that of the phenotypic indicators, only the thermal inactivation parameters while bacteria reside on the unit (D60g and Zg) and the infection and reporting parameters (pmg and vg), are firmly correlated with reporting rates (see appendix B for detailed definition of parameters). For the processing parameters, the temperature of the food unit appears to be a very strong driver of infection. The response to initial and post-process 1 contamination rates (N0k,g and N1k,g) was similar and showed a modest positive correlation (only N1k,g shown in the figure). While these simple linear fits are not robust statistical analyses (for example, in reality we would expect to see at least some nonlinear relationships, especially for those parameters driving cross-contamination), the visualizations do provide some good clues as to what parameters should drive infection and reporting rates. Therefore, we imported into the machine learning toolbox the four genotypic parameters with reasonable correlations to reported infection (pmg, vg, D60g and Zg), plus the contamination at the end of stage 1 processing and the temperature of the processing conditions during stage 2 processing.Figure 5.

View Article: PubMed Central - PubMed

ABSTRACT

Foodborne infection is a result of exposure to complex, dynamic food systems. The efficiency of foodborne infection is driven by ongoing shifts in genetic machinery. Next-generation sequencing technologies can provide high-fidelity data about the genetics of a pathogen. However, food safety surveillance systems do not currently provide similar high-fidelity epidemiological metadata to associate with genetic data. As a consequence, it is rarely possible to transform genetic data into actionable knowledge that can be used to genuinely inform risk assessment or prevent outbreaks. Big data approaches are touted as a revolution in decision support, and pose a potentially attractive method for closing the gap between the fidelity of genetic and epidemiological metadata for food safety surveillance. We therefore developed a simple food chain model to investigate the potential benefits of combining ‘big’ data sources, including both genetic and high-fidelity epidemiological metadata. Our results suggest that, as for any surveillance system, the collected data must be relevant and characterize the important dynamics of a system if we are to properly understand risk: this suggests the need to carefully consider data curation, rather than the more ambitious claims of big data proponents that unstructured and unrelated data sources can be combined to generate consistent insight. Of interest is that the biggest influencers of foodborne infection risk were contamination load and processing temperature, not genotype. This suggests that understanding food chain dynamics would probably more effectively generate insight into foodborne risk than prescribing the hazard in ever more detail in terms of genotype.

No MeSH data available.


Related in: MedlinePlus