Limits...
Compensating for population sampling in simulations of epidemic spread on temporal contact networks.

Génois M, Vestergaard CL, Cattuto C, Barrat A - Nat Commun (2015)

Bottom Line: As a consequence, the study of contagion processes using data-driven models can lead to a severe underestimation of the epidemic risk.We show how the statistical information contained in the resampled data can be used to build a series of surrogate versions of the unknown contacts.We simulate epidemic processes on the resulting reconstructed data sets and show that it is possible to obtain good estimates of the outcome of simulations performed using the complete data set.

View Article: PubMed Central - PubMed

Affiliation: Aix Marseille Université, Université de Toulon, CNRS, CPT, UMR 7332, 13288 Marseille, France.

ABSTRACT
Data describing human interactions often suffer from incomplete sampling of the underlying population. As a consequence, the study of contagion processes using data-driven models can lead to a severe underestimation of the epidemic risk. Here we present a systematic method to alleviate this issue and obtain a better estimation of the risk in the context of epidemic models informed by high-resolution time-resolved contact data. We consider several such data sets collected in various contexts and perform controlled resampling experiments. We show how the statistical information contained in the resampled data can be used to build a series of surrogate versions of the unknown contacts. We simulate epidemic processes on the resulting reconstructed data sets and show that it is possible to obtain good estimates of the outcome of simulations performed using the complete data set. We discuss limitations and potential improvements of our method.

Show MeSH

Related in: MedlinePlus

SIR simulations for large fractions of missing nodes.We simulate SIR processes on reconstructed contact networks for large values of the fraction f of removed nodes. We plot the distributions of epidemic sizes for simulations on reconstructed networks and on the whole data set (case f=0), for large values of the fraction f of removed nodes. Here β=0.0004 and β/μ=1,000 (InVS) or β/μ=100 (Thiers13 and SFHH) and 1,000 simulations were performed for each value of f. The distributions of epidemic sizes for simulations performed on resampled data sets are not shown since at these high values of f, almost no epidemics occur.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4660211&req=5

f8: SIR simulations for large fractions of missing nodes.We simulate SIR processes on reconstructed contact networks for large values of the fraction f of removed nodes. We plot the distributions of epidemic sizes for simulations on reconstructed networks and on the whole data set (case f=0), for large values of the fraction f of removed nodes. Here β=0.0004 and β/μ=1,000 (InVS) or β/μ=100 (Thiers13 and SFHH) and 1,000 simulations were performed for each value of f. The distributions of epidemic sizes for simulations performed on resampled data sets are not shown since at these high values of f, almost no epidemics occur.

Mentions: When the fraction f of nodes excluded by the resampling procedure becomes large, the properties of the resampled data may start to differ substantially from those of the whole data set (Supplementary Figs 1–2). As a result, the distributions of epidemic sizes of SIR simulations show stronger deviations from those obtained on the whole data set (Fig. 8), even if the epidemic risk evaluation is still better than for simulations on the resampled networks (Fig. 5). Most importantly, the information remaining in the resampled data at large f can be insufficient to construct surrogate contacts. This happens in particular if an entire class or department is absent from the resampled data or if all the resampled nodes of a class/department are disconnected (see Methods for details). We show in the bottom plots of Fig. 5 the failure rate, that is, the fraction of cases in which we are not able to construct surrogate networks from the resampled data. It increases gradually with f for the InVS data since the groups (departments) are of different sizes. For the Thiers13 data, all classes are of similar sizes so that the failure rate reaches abruptly a large value at a given value of f. For the SFHH data, we can always construct surrogate networks as the population is not structured. Another limitation of the reconstruction method lies in the need to know the number of individuals missing in each department or class. If these numbers are completely unknown, giving an estimation of outbreak sizes is impossible as adding arbitrary numbers of nodes and links to the resampled data can lead to arbitrarily large epidemics. The methods are, however, still usable if only partial information is available. For instance, if only the overall missing number of individuals is available, it is possible to use the WT method, which still gives sensible results. Moreover, if f is only approximately known, for example, f is known to be within an interval of possible values (f1, f2), it is possible to perform reconstructions using the respective hypothesis f=f1 and f=f2 and to give an interval of estimates. We provide an example of such procedure in Supplementary Fig. 23.


Compensating for population sampling in simulations of epidemic spread on temporal contact networks.

Génois M, Vestergaard CL, Cattuto C, Barrat A - Nat Commun (2015)

SIR simulations for large fractions of missing nodes.We simulate SIR processes on reconstructed contact networks for large values of the fraction f of removed nodes. We plot the distributions of epidemic sizes for simulations on reconstructed networks and on the whole data set (case f=0), for large values of the fraction f of removed nodes. Here β=0.0004 and β/μ=1,000 (InVS) or β/μ=100 (Thiers13 and SFHH) and 1,000 simulations were performed for each value of f. The distributions of epidemic sizes for simulations performed on resampled data sets are not shown since at these high values of f, almost no epidemics occur.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4660211&req=5

f8: SIR simulations for large fractions of missing nodes.We simulate SIR processes on reconstructed contact networks for large values of the fraction f of removed nodes. We plot the distributions of epidemic sizes for simulations on reconstructed networks and on the whole data set (case f=0), for large values of the fraction f of removed nodes. Here β=0.0004 and β/μ=1,000 (InVS) or β/μ=100 (Thiers13 and SFHH) and 1,000 simulations were performed for each value of f. The distributions of epidemic sizes for simulations performed on resampled data sets are not shown since at these high values of f, almost no epidemics occur.
Mentions: When the fraction f of nodes excluded by the resampling procedure becomes large, the properties of the resampled data may start to differ substantially from those of the whole data set (Supplementary Figs 1–2). As a result, the distributions of epidemic sizes of SIR simulations show stronger deviations from those obtained on the whole data set (Fig. 8), even if the epidemic risk evaluation is still better than for simulations on the resampled networks (Fig. 5). Most importantly, the information remaining in the resampled data at large f can be insufficient to construct surrogate contacts. This happens in particular if an entire class or department is absent from the resampled data or if all the resampled nodes of a class/department are disconnected (see Methods for details). We show in the bottom plots of Fig. 5 the failure rate, that is, the fraction of cases in which we are not able to construct surrogate networks from the resampled data. It increases gradually with f for the InVS data since the groups (departments) are of different sizes. For the Thiers13 data, all classes are of similar sizes so that the failure rate reaches abruptly a large value at a given value of f. For the SFHH data, we can always construct surrogate networks as the population is not structured. Another limitation of the reconstruction method lies in the need to know the number of individuals missing in each department or class. If these numbers are completely unknown, giving an estimation of outbreak sizes is impossible as adding arbitrary numbers of nodes and links to the resampled data can lead to arbitrarily large epidemics. The methods are, however, still usable if only partial information is available. For instance, if only the overall missing number of individuals is available, it is possible to use the WT method, which still gives sensible results. Moreover, if f is only approximately known, for example, f is known to be within an interval of possible values (f1, f2), it is possible to perform reconstructions using the respective hypothesis f=f1 and f=f2 and to give an interval of estimates. We provide an example of such procedure in Supplementary Fig. 23.

Bottom Line: As a consequence, the study of contagion processes using data-driven models can lead to a severe underestimation of the epidemic risk.We show how the statistical information contained in the resampled data can be used to build a series of surrogate versions of the unknown contacts.We simulate epidemic processes on the resulting reconstructed data sets and show that it is possible to obtain good estimates of the outcome of simulations performed using the complete data set.

View Article: PubMed Central - PubMed

Affiliation: Aix Marseille Université, Université de Toulon, CNRS, CPT, UMR 7332, 13288 Marseille, France.

ABSTRACT
Data describing human interactions often suffer from incomplete sampling of the underlying population. As a consequence, the study of contagion processes using data-driven models can lead to a severe underestimation of the epidemic risk. Here we present a systematic method to alleviate this issue and obtain a better estimation of the risk in the context of epidemic models informed by high-resolution time-resolved contact data. We consider several such data sets collected in various contexts and perform controlled resampling experiments. We show how the statistical information contained in the resampled data can be used to build a series of surrogate versions of the unknown contacts. We simulate epidemic processes on the resulting reconstructed data sets and show that it is possible to obtain good estimates of the outcome of simulations performed using the complete data set. We discuss limitations and potential improvements of our method.

Show MeSH
Related in: MedlinePlus