Limits...
Estimating range of influence in case of missing spatial data: a simulation study on binary data.

Bihrmann K, Ersbøll AK - Int J Health Geogr (2015)

Bottom Line: The study was based on the simulation of missing outcomes in a complete data set.The effect of missing observations on the estimated range of influence depended to some extent on the missing data mechanism.In general, the overall effect of missing observations was small compared to the uncertainty of the range estimate.

View Article: PubMed Central - PubMed

Affiliation: Faculty of Medical and Health Sciences, University of Copenhagen, Grønnegårdsvej 8, DK-1870 Frederiksberg C, Denmark. krbi@sund.ku.dk.

ABSTRACT

Background: The range of influence refers to the average distance between locations at which the observed outcome is no longer correlated. In many studies, missing data occur and a popular tool for handling missing data is multiple imputation. The objective of this study was to investigate how the estimated range of influence is affected when 1) the outcome is only observed at some of a given set of locations, and 2) multiple imputation is used to impute the outcome at the non-observed locations.

Methods: The study was based on the simulation of missing outcomes in a complete data set. The range of influence was estimated from a logistic regression model with a spatially structured random effect, modelled by a Gaussian field. Results were evaluated by comparing estimates obtained from complete, missing, and imputed data.

Results: In most simulation scenarios, the range estimates were consistent with ≤25% missing data. In some scenarios, however, the range estimate was affected by even a moderate number of missing observations. Multiple imputation provided a potential improvement in the range estimate with ≥50% missing data, but also increased the uncertainty of the estimate.

Conclusions: The effect of missing observations on the estimated range of influence depended to some extent on the missing data mechanism. In general, the overall effect of missing observations was small compared to the uncertainty of the range estimate.

Show MeSH

Related in: MedlinePlus

Range of influence in multiple imputed data. Estimated range of influence in complete data (solid line) and after multiple imputation of simulated missing data. Simulation scenarios were: A: MCAR, B: MAR0 OR=1/3, C: MAR0 OR=3, D: MAR1 OR=1/3, E: MAR1 OR=3, F: MNAR OR=1/3, G: MNAR OR=3.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4325952&req=5

Fig2: Range of influence in multiple imputed data. Estimated range of influence in complete data (solid line) and after multiple imputation of simulated missing data. Simulation scenarios were: A: MCAR, B: MAR0 OR=1/3, C: MAR0 OR=3, D: MAR1 OR=1/3, E: MAR1 OR=3, F: MNAR OR=1/3, G: MNAR OR=3.

Mentions: Multiple imputation did not remove the bias of the regression parameter estimates introduced by the missing observations (Table 2). This was as expected, since only the outcome was missing. In that case, it is well-known that imputation will not remedy any bias of regression parameter estimates, e.g. von Hippel [5].The median of the estimated range of influence within each simulation scenario (Figure 2) ranged from 10.5 km (SD 5.2) (MAR1 OR=1/3, 75%) to 16.1 km (SD 20.3) (MAR0 OR=3, 75%). In general, the estimated range tended to be larger than the range obtained from the missing data, and hence also larger than the range obtained from the complete data set. Overall, the standard deviation of each range estimate increased after multiple imputation. With multiple imputation of less than 50% missing observations, the RMeSE tended to be slightly larger than the results obtained from the missing data. With multiple imputation of ≥ 50% missing observations, the RMeSE was slightly smaller. Therefore, considering estimation of the range of influence, at least 50% missing observations were required to potentially benefit from multiple imputation, and this was at the expense of an increased standard deviation. It should be noted, however, that the results with imputation of ≥ 50% missing observations were based on the informative prior distribution, which in case of missing data was only used with 75% missing observations.Figure 2


Estimating range of influence in case of missing spatial data: a simulation study on binary data.

Bihrmann K, Ersbøll AK - Int J Health Geogr (2015)

Range of influence in multiple imputed data. Estimated range of influence in complete data (solid line) and after multiple imputation of simulated missing data. Simulation scenarios were: A: MCAR, B: MAR0 OR=1/3, C: MAR0 OR=3, D: MAR1 OR=1/3, E: MAR1 OR=3, F: MNAR OR=1/3, G: MNAR OR=3.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4325952&req=5

Fig2: Range of influence in multiple imputed data. Estimated range of influence in complete data (solid line) and after multiple imputation of simulated missing data. Simulation scenarios were: A: MCAR, B: MAR0 OR=1/3, C: MAR0 OR=3, D: MAR1 OR=1/3, E: MAR1 OR=3, F: MNAR OR=1/3, G: MNAR OR=3.
Mentions: Multiple imputation did not remove the bias of the regression parameter estimates introduced by the missing observations (Table 2). This was as expected, since only the outcome was missing. In that case, it is well-known that imputation will not remedy any bias of regression parameter estimates, e.g. von Hippel [5].The median of the estimated range of influence within each simulation scenario (Figure 2) ranged from 10.5 km (SD 5.2) (MAR1 OR=1/3, 75%) to 16.1 km (SD 20.3) (MAR0 OR=3, 75%). In general, the estimated range tended to be larger than the range obtained from the missing data, and hence also larger than the range obtained from the complete data set. Overall, the standard deviation of each range estimate increased after multiple imputation. With multiple imputation of less than 50% missing observations, the RMeSE tended to be slightly larger than the results obtained from the missing data. With multiple imputation of ≥ 50% missing observations, the RMeSE was slightly smaller. Therefore, considering estimation of the range of influence, at least 50% missing observations were required to potentially benefit from multiple imputation, and this was at the expense of an increased standard deviation. It should be noted, however, that the results with imputation of ≥ 50% missing observations were based on the informative prior distribution, which in case of missing data was only used with 75% missing observations.Figure 2

Bottom Line: The study was based on the simulation of missing outcomes in a complete data set.The effect of missing observations on the estimated range of influence depended to some extent on the missing data mechanism.In general, the overall effect of missing observations was small compared to the uncertainty of the range estimate.

View Article: PubMed Central - PubMed

Affiliation: Faculty of Medical and Health Sciences, University of Copenhagen, Grønnegårdsvej 8, DK-1870 Frederiksberg C, Denmark. krbi@sund.ku.dk.

ABSTRACT

Background: The range of influence refers to the average distance between locations at which the observed outcome is no longer correlated. In many studies, missing data occur and a popular tool for handling missing data is multiple imputation. The objective of this study was to investigate how the estimated range of influence is affected when 1) the outcome is only observed at some of a given set of locations, and 2) multiple imputation is used to impute the outcome at the non-observed locations.

Methods: The study was based on the simulation of missing outcomes in a complete data set. The range of influence was estimated from a logistic regression model with a spatially structured random effect, modelled by a Gaussian field. Results were evaluated by comparing estimates obtained from complete, missing, and imputed data.

Results: In most simulation scenarios, the range estimates were consistent with ≤25% missing data. In some scenarios, however, the range estimate was affected by even a moderate number of missing observations. Multiple imputation provided a potential improvement in the range estimate with ≥50% missing data, but also increased the uncertainty of the estimate.

Conclusions: The effect of missing observations on the estimated range of influence depended to some extent on the missing data mechanism. In general, the overall effect of missing observations was small compared to the uncertainty of the range estimate.

Show MeSH
Related in: MedlinePlus