Limits...
Estimating range of influence in case of missing spatial data: a simulation study on binary data.

Bihrmann K, Ersbøll AK - Int J Health Geogr (2015)

Bottom Line: The study was based on the simulation of missing outcomes in a complete data set.The effect of missing observations on the estimated range of influence depended to some extent on the missing data mechanism.In general, the overall effect of missing observations was small compared to the uncertainty of the range estimate.

View Article: PubMed Central - PubMed

Affiliation: Faculty of Medical and Health Sciences, University of Copenhagen, Grønnegårdsvej 8, DK-1870 Frederiksberg C, Denmark. krbi@sund.ku.dk.

ABSTRACT

Background: The range of influence refers to the average distance between locations at which the observed outcome is no longer correlated. In many studies, missing data occur and a popular tool for handling missing data is multiple imputation. The objective of this study was to investigate how the estimated range of influence is affected when 1) the outcome is only observed at some of a given set of locations, and 2) multiple imputation is used to impute the outcome at the non-observed locations.

Methods: The study was based on the simulation of missing outcomes in a complete data set. The range of influence was estimated from a logistic regression model with a spatially structured random effect, modelled by a Gaussian field. Results were evaluated by comparing estimates obtained from complete, missing, and imputed data.

Results: In most simulation scenarios, the range estimates were consistent with ≤25% missing data. In some scenarios, however, the range estimate was affected by even a moderate number of missing observations. Multiple imputation provided a potential improvement in the range estimate with ≥50% missing data, but also increased the uncertainty of the estimate.

Conclusions: The effect of missing observations on the estimated range of influence depended to some extent on the missing data mechanism. In general, the overall effect of missing observations was small compared to the uncertainty of the range estimate.

Show MeSH

Related in: MedlinePlus

Triangulation of the spatial region. The mesh extends beyond the border of the considered region to correct for edge effects. The maximum allowed triangle edge length was 2 km inside the region and 50 km outside the region. The minimum allowed distance between vertices was 0.75 km. The triangulation consisted of a total of 2248 vertices.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4325952&req=5

Fig4: Triangulation of the spatial region. The mesh extends beyond the border of the considered region to correct for edge effects. The maximum allowed triangle edge length was 2 km inside the region and 50 km outside the region. The minimum allowed distance between vertices was 0.75 km. The triangulation consisted of a total of 2248 vertices.

Mentions: Based on a triangulation of the spatial region and the model specified in (7), parameters were estimated using the Integrated Nested Laplace Approximation (INLA) approach proposed by [2]. This approach to Bayesian inference provides deterministic approximations to the posterior marginals for all parameters and is based on Laplace approximations [7]. Computations were done in R version 3.0.2 [8] using the INLA package (http://www.r-inla.org), which includes the SPDE approach as a standard method. The regression parameters α, β were assigned independent, normal prior distributions with precision 0.001, and was assigned the GMRF with precision Q(κ,σ2) as described above. The variance σ2 was parametrised as σ2=1/(2πκ2τ2), and the hyperparameters (log(κ), log(τ)) were assigned normal prior distributions with known precision. Sensitivity analysis to assess the effect of the prior distribution was carried out by considering three values of this precision: 0.1 (the default of the INLA package), 0.001, and 0.00001.The INLA package also provides a function for producing the required triangulation of the spatial region. The triangulation of the spatial region is shown in Figure 4. All 1593 locations were included as vertices, and additional vertices were added to produce a regular mesh. The mesh extends beyond the border of the considered region to correct for edge effects. The maximum allowed triangle edge length was 2 km inside the region and 50 km outside the region. The minimum allowed distance between vertices was 0.75 km. The triangulation consisted of a total of 2248 vertices.Figure 4


Estimating range of influence in case of missing spatial data: a simulation study on binary data.

Bihrmann K, Ersbøll AK - Int J Health Geogr (2015)

Triangulation of the spatial region. The mesh extends beyond the border of the considered region to correct for edge effects. The maximum allowed triangle edge length was 2 km inside the region and 50 km outside the region. The minimum allowed distance between vertices was 0.75 km. The triangulation consisted of a total of 2248 vertices.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4325952&req=5

Fig4: Triangulation of the spatial region. The mesh extends beyond the border of the considered region to correct for edge effects. The maximum allowed triangle edge length was 2 km inside the region and 50 km outside the region. The minimum allowed distance between vertices was 0.75 km. The triangulation consisted of a total of 2248 vertices.
Mentions: Based on a triangulation of the spatial region and the model specified in (7), parameters were estimated using the Integrated Nested Laplace Approximation (INLA) approach proposed by [2]. This approach to Bayesian inference provides deterministic approximations to the posterior marginals for all parameters and is based on Laplace approximations [7]. Computations were done in R version 3.0.2 [8] using the INLA package (http://www.r-inla.org), which includes the SPDE approach as a standard method. The regression parameters α, β were assigned independent, normal prior distributions with precision 0.001, and was assigned the GMRF with precision Q(κ,σ2) as described above. The variance σ2 was parametrised as σ2=1/(2πκ2τ2), and the hyperparameters (log(κ), log(τ)) were assigned normal prior distributions with known precision. Sensitivity analysis to assess the effect of the prior distribution was carried out by considering three values of this precision: 0.1 (the default of the INLA package), 0.001, and 0.00001.The INLA package also provides a function for producing the required triangulation of the spatial region. The triangulation of the spatial region is shown in Figure 4. All 1593 locations were included as vertices, and additional vertices were added to produce a regular mesh. The mesh extends beyond the border of the considered region to correct for edge effects. The maximum allowed triangle edge length was 2 km inside the region and 50 km outside the region. The minimum allowed distance between vertices was 0.75 km. The triangulation consisted of a total of 2248 vertices.Figure 4

Bottom Line: The study was based on the simulation of missing outcomes in a complete data set.The effect of missing observations on the estimated range of influence depended to some extent on the missing data mechanism.In general, the overall effect of missing observations was small compared to the uncertainty of the range estimate.

View Article: PubMed Central - PubMed

Affiliation: Faculty of Medical and Health Sciences, University of Copenhagen, Grønnegårdsvej 8, DK-1870 Frederiksberg C, Denmark. krbi@sund.ku.dk.

ABSTRACT

Background: The range of influence refers to the average distance between locations at which the observed outcome is no longer correlated. In many studies, missing data occur and a popular tool for handling missing data is multiple imputation. The objective of this study was to investigate how the estimated range of influence is affected when 1) the outcome is only observed at some of a given set of locations, and 2) multiple imputation is used to impute the outcome at the non-observed locations.

Methods: The study was based on the simulation of missing outcomes in a complete data set. The range of influence was estimated from a logistic regression model with a spatially structured random effect, modelled by a Gaussian field. Results were evaluated by comparing estimates obtained from complete, missing, and imputed data.

Results: In most simulation scenarios, the range estimates were consistent with ≤25% missing data. In some scenarios, however, the range estimate was affected by even a moderate number of missing observations. Multiple imputation provided a potential improvement in the range estimate with ≥50% missing data, but also increased the uncertainty of the estimate.

Conclusions: The effect of missing observations on the estimated range of influence depended to some extent on the missing data mechanism. In general, the overall effect of missing observations was small compared to the uncertainty of the range estimate.

Show MeSH
Related in: MedlinePlus