Limits...
Compensating for population sampling in simulations of epidemic spread on temporal contact networks.

Génois M, Vestergaard CL, Cattuto C, Barrat A - Nat Commun (2015)

Bottom Line: As a consequence, the study of contagion processes using data-driven models can lead to a severe underestimation of the epidemic risk.We show how the statistical information contained in the resampled data can be used to build a series of surrogate versions of the unknown contacts.We simulate epidemic processes on the resulting reconstructed data sets and show that it is possible to obtain good estimates of the outcome of simulations performed using the complete data set.

View Article: PubMed Central - PubMed

Affiliation: Aix Marseille Université, Université de Toulon, CNRS, CPT, UMR 7332, 13288 Marseille, France.

ABSTRACT
Data describing human interactions often suffer from incomplete sampling of the underlying population. As a consequence, the study of contagion processes using data-driven models can lead to a severe underestimation of the epidemic risk. Here we present a systematic method to alleviate this issue and obtain a better estimation of the risk in the context of epidemic models informed by high-resolution time-resolved contact data. We consider several such data sets collected in various contexts and perform controlled resampling experiments. We show how the statistical information contained in the resampled data can be used to build a series of surrogate versions of the unknown contacts. We simulate epidemic processes on the resulting reconstructed data sets and show that it is possible to obtain good estimates of the outcome of simulations performed using the complete data set. We discuss limitations and potential improvements of our method.

Show MeSH

Related in: MedlinePlus

SIR simulations for the Thiers13 high school case.We compare of the outcome of SIR epidemic simulations performed on resampled and reconstructed contact networks, for different methods of reconstruction. We plot the distribution of epidemic sizes (fraction of recovered individuals) at the end of SIR processes simulated on top of resampled (sample) and reconstructed contact networks, for different values of the fraction f of nodes removed, and for the five reconstruction methods described in the text (0, W, WS, WT and WST). The parameters of the SIR models are β=0.0004 and β/μ=100. The case f=0 corresponds to simulations using the whole data set, that is, the reference case. For each value of f, 1,000 independent simulations were performed.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4660211&req=5

f3: SIR simulations for the Thiers13 high school case.We compare of the outcome of SIR epidemic simulations performed on resampled and reconstructed contact networks, for different methods of reconstruction. We plot the distribution of epidemic sizes (fraction of recovered individuals) at the end of SIR processes simulated on top of resampled (sample) and reconstructed contact networks, for different values of the fraction f of nodes removed, and for the five reconstruction methods described in the text (0, W, WS, WT and WST). The parameters of the SIR models are β=0.0004 and β/μ=100. The case f=0 corresponds to simulations using the whole data set, that is, the reference case. For each value of f, 1,000 independent simulations were performed.

Mentions: Figures 2, 3, 4 display the outcome of SIR spreading simulations performed on surrogate networks obtained using the various reconstruction methods, compared with the outcome of simulations on the resampled data sets, for various values of f. Method 0 leads to a clear overestimation of the outcome and does not capture well the shape of the distribution of outbreak sizes. Method W gives only slightly better results. The overall shape of the distribution is better captured for the three reconstruction methods using more information: WS, WT and WST (note that for the SFHH case the population is not structured, so that W and WS are equivalent, as are WT and WST). The WST method matches best the shape of the distributions and yields distributions much more similar to those obtained by simulating on the whole data set than the simulations performed on the resampled networks. We also show in Fig. 5 the fraction of outbreaks that reach at least 20% of the population and the average epidemic size for these outbreaks. In the case of simulations performed on resampled data, we rapidly lose information about the size and even the existence of large outbreaks as f increases. Simulations using data reconstructed with methods 0 and W, on the contrary, largely overestimate these quantities, which is expected as infections spread easier on random graphs than on structured graphs5052, especially if the heterogeneity of the aggregated contact durations is not considered2220. Taking into account the population structure or using contact sequences that respect the temporal heterogeneities (broad distributions of contact and inter-contact durations) yield better results (WS and WT cases, respectively). Overall, the WST method, for which the surrogate networks respect all these constraints, yields the best results.


Compensating for population sampling in simulations of epidemic spread on temporal contact networks.

Génois M, Vestergaard CL, Cattuto C, Barrat A - Nat Commun (2015)

SIR simulations for the Thiers13 high school case.We compare of the outcome of SIR epidemic simulations performed on resampled and reconstructed contact networks, for different methods of reconstruction. We plot the distribution of epidemic sizes (fraction of recovered individuals) at the end of SIR processes simulated on top of resampled (sample) and reconstructed contact networks, for different values of the fraction f of nodes removed, and for the five reconstruction methods described in the text (0, W, WS, WT and WST). The parameters of the SIR models are β=0.0004 and β/μ=100. The case f=0 corresponds to simulations using the whole data set, that is, the reference case. For each value of f, 1,000 independent simulations were performed.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4660211&req=5

f3: SIR simulations for the Thiers13 high school case.We compare of the outcome of SIR epidemic simulations performed on resampled and reconstructed contact networks, for different methods of reconstruction. We plot the distribution of epidemic sizes (fraction of recovered individuals) at the end of SIR processes simulated on top of resampled (sample) and reconstructed contact networks, for different values of the fraction f of nodes removed, and for the five reconstruction methods described in the text (0, W, WS, WT and WST). The parameters of the SIR models are β=0.0004 and β/μ=100. The case f=0 corresponds to simulations using the whole data set, that is, the reference case. For each value of f, 1,000 independent simulations were performed.
Mentions: Figures 2, 3, 4 display the outcome of SIR spreading simulations performed on surrogate networks obtained using the various reconstruction methods, compared with the outcome of simulations on the resampled data sets, for various values of f. Method 0 leads to a clear overestimation of the outcome and does not capture well the shape of the distribution of outbreak sizes. Method W gives only slightly better results. The overall shape of the distribution is better captured for the three reconstruction methods using more information: WS, WT and WST (note that for the SFHH case the population is not structured, so that W and WS are equivalent, as are WT and WST). The WST method matches best the shape of the distributions and yields distributions much more similar to those obtained by simulating on the whole data set than the simulations performed on the resampled networks. We also show in Fig. 5 the fraction of outbreaks that reach at least 20% of the population and the average epidemic size for these outbreaks. In the case of simulations performed on resampled data, we rapidly lose information about the size and even the existence of large outbreaks as f increases. Simulations using data reconstructed with methods 0 and W, on the contrary, largely overestimate these quantities, which is expected as infections spread easier on random graphs than on structured graphs5052, especially if the heterogeneity of the aggregated contact durations is not considered2220. Taking into account the population structure or using contact sequences that respect the temporal heterogeneities (broad distributions of contact and inter-contact durations) yield better results (WS and WT cases, respectively). Overall, the WST method, for which the surrogate networks respect all these constraints, yields the best results.

Bottom Line: As a consequence, the study of contagion processes using data-driven models can lead to a severe underestimation of the epidemic risk.We show how the statistical information contained in the resampled data can be used to build a series of surrogate versions of the unknown contacts.We simulate epidemic processes on the resulting reconstructed data sets and show that it is possible to obtain good estimates of the outcome of simulations performed using the complete data set.

View Article: PubMed Central - PubMed

Affiliation: Aix Marseille Université, Université de Toulon, CNRS, CPT, UMR 7332, 13288 Marseille, France.

ABSTRACT
Data describing human interactions often suffer from incomplete sampling of the underlying population. As a consequence, the study of contagion processes using data-driven models can lead to a severe underestimation of the epidemic risk. Here we present a systematic method to alleviate this issue and obtain a better estimation of the risk in the context of epidemic models informed by high-resolution time-resolved contact data. We consider several such data sets collected in various contexts and perform controlled resampling experiments. We show how the statistical information contained in the resampled data can be used to build a series of surrogate versions of the unknown contacts. We simulate epidemic processes on the resulting reconstructed data sets and show that it is possible to obtain good estimates of the outcome of simulations performed using the complete data set. We discuss limitations and potential improvements of our method.

Show MeSH
Related in: MedlinePlus