Limits...
Estimating range of influence in case of missing spatial data: a simulation study on binary data.

Bihrmann K, Ersbøll AK - Int J Health Geogr (2015)

Bottom Line: The study was based on the simulation of missing outcomes in a complete data set.The effect of missing observations on the estimated range of influence depended to some extent on the missing data mechanism.In general, the overall effect of missing observations was small compared to the uncertainty of the range estimate.

View Article: PubMed Central - PubMed

Affiliation: Faculty of Medical and Health Sciences, University of Copenhagen, Grønnegårdsvej 8, DK-1870 Frederiksberg C, Denmark. krbi@sund.ku.dk.

ABSTRACT

Background: The range of influence refers to the average distance between locations at which the observed outcome is no longer correlated. In many studies, missing data occur and a popular tool for handling missing data is multiple imputation. The objective of this study was to investigate how the estimated range of influence is affected when 1) the outcome is only observed at some of a given set of locations, and 2) multiple imputation is used to impute the outcome at the non-observed locations.

Methods: The study was based on the simulation of missing outcomes in a complete data set. The range of influence was estimated from a logistic regression model with a spatially structured random effect, modelled by a Gaussian field. Results were evaluated by comparing estimates obtained from complete, missing, and imputed data.

Results: In most simulation scenarios, the range estimates were consistent with ≤25% missing data. In some scenarios, however, the range estimate was affected by even a moderate number of missing observations. Multiple imputation provided a potential improvement in the range estimate with ≥50% missing data, but also increased the uncertainty of the estimate.

Conclusions: The effect of missing observations on the estimated range of influence depended to some extent on the missing data mechanism. In general, the overall effect of missing observations was small compared to the uncertainty of the range estimate.

Show MeSH

Related in: MedlinePlus

Descriptive maps. Denmark divided into 8 geographic regions (a), including NJS (southern part of Northern Jutland) with Salmonella Dublin status of all cattle herds (b), total number of cattle within herds (c), and number of herds within a 5 km radius (d).
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4325952&req=5

Fig3: Descriptive maps. Denmark divided into 8 geographic regions (a), including NJS (southern part of Northern Jutland) with Salmonella Dublin status of all cattle herds (b), total number of cattle within herds (c), and number of herds within a 5 km radius (d).

Mentions: The complete data set included all Danish cattle herds from the beginning of 2003 to the end of 2009. For all herds, information from the Danish Cattle Database (hosted by Knowledge Centre for Agriculture, Aarhus N, Denmark) included unique herd ID number, geographical coordinates in UTM-format, geographical region (Figure 3a), herd size (total number of cattle), Salmonella Dublin ELISA measurements on bulk-tank milk or blood samples, and date of bulk-tank milk or blood sampling. Based on this, the number of herds per km 2 within a 5 km radius of each herd was calculated (herd density), and all herds had a Salmonella Dublin classification status (positive/negative) assigned for each quarter of the year. For details on the definition of herd infection status, please refer to [6].Figure 3


Estimating range of influence in case of missing spatial data: a simulation study on binary data.

Bihrmann K, Ersbøll AK - Int J Health Geogr (2015)

Descriptive maps. Denmark divided into 8 geographic regions (a), including NJS (southern part of Northern Jutland) with Salmonella Dublin status of all cattle herds (b), total number of cattle within herds (c), and number of herds within a 5 km radius (d).
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4325952&req=5

Fig3: Descriptive maps. Denmark divided into 8 geographic regions (a), including NJS (southern part of Northern Jutland) with Salmonella Dublin status of all cattle herds (b), total number of cattle within herds (c), and number of herds within a 5 km radius (d).
Mentions: The complete data set included all Danish cattle herds from the beginning of 2003 to the end of 2009. For all herds, information from the Danish Cattle Database (hosted by Knowledge Centre for Agriculture, Aarhus N, Denmark) included unique herd ID number, geographical coordinates in UTM-format, geographical region (Figure 3a), herd size (total number of cattle), Salmonella Dublin ELISA measurements on bulk-tank milk or blood samples, and date of bulk-tank milk or blood sampling. Based on this, the number of herds per km 2 within a 5 km radius of each herd was calculated (herd density), and all herds had a Salmonella Dublin classification status (positive/negative) assigned for each quarter of the year. For details on the definition of herd infection status, please refer to [6].Figure 3

Bottom Line: The study was based on the simulation of missing outcomes in a complete data set.The effect of missing observations on the estimated range of influence depended to some extent on the missing data mechanism.In general, the overall effect of missing observations was small compared to the uncertainty of the range estimate.

View Article: PubMed Central - PubMed

Affiliation: Faculty of Medical and Health Sciences, University of Copenhagen, Grønnegårdsvej 8, DK-1870 Frederiksberg C, Denmark. krbi@sund.ku.dk.

ABSTRACT

Background: The range of influence refers to the average distance between locations at which the observed outcome is no longer correlated. In many studies, missing data occur and a popular tool for handling missing data is multiple imputation. The objective of this study was to investigate how the estimated range of influence is affected when 1) the outcome is only observed at some of a given set of locations, and 2) multiple imputation is used to impute the outcome at the non-observed locations.

Methods: The study was based on the simulation of missing outcomes in a complete data set. The range of influence was estimated from a logistic regression model with a spatially structured random effect, modelled by a Gaussian field. Results were evaluated by comparing estimates obtained from complete, missing, and imputed data.

Results: In most simulation scenarios, the range estimates were consistent with ≤25% missing data. In some scenarios, however, the range estimate was affected by even a moderate number of missing observations. Multiple imputation provided a potential improvement in the range estimate with ≥50% missing data, but also increased the uncertainty of the estimate.

Conclusions: The effect of missing observations on the estimated range of influence depended to some extent on the missing data mechanism. In general, the overall effect of missing observations was small compared to the uncertainty of the range estimate.

Show MeSH
Related in: MedlinePlus