Limits...
Estimating the accuracy of geographical imputation.

Henry KA, Boscoe FP - Int J Health Geogr (2008)

Bottom Line: To reduce the number of non-geocoded cases researchers and organizations sometimes include cases geocoded to postal code centroids along with cases geocoded with the greater precision of a full street address.Assigning cases to census tracts using the race/ethnicity population distribution within a postal code resulted in more correctly assigned cases than when using postal code centroids.Geo-imputation appears to offer some advantages and no serious drawbacks as compared with the alternative of assigning cases to census tracts based on postal code centroids.

View Article: PubMed Central - HTML - PubMed

Affiliation: New Jersey Department of Health & Senior Services, Cancer Epidemiology Services, New Jersey State Cancer Registry, Trenton, New Jersey, USA. kevin.henry@doh.state.nj.us

ABSTRACT

Background: To reduce the number of non-geocoded cases researchers and organizations sometimes include cases geocoded to postal code centroids along with cases geocoded with the greater precision of a full street address. Some analysts then use the postal code to assign information to the cases from finer-level geographies such as a census tract. Assignment is commonly completed using either a postal centroid or by a geographical imputation method which assigns a location by using both the demographic characteristics of the case and the population characteristics of the postal delivery area. To date no systematic evaluation of geographical imputation methods ("geo-imputation") has been completed. The objective of this study was to determine the accuracy of census tract assignment using geo-imputation.

Methods: Using a large dataset of breast, prostate and colorectal cancer cases reported to the New Jersey Cancer Registry, we determined how often cases were assigned to the correct census tract using alternate strategies of demographic based geo-imputation, and using assignments obtained from postal code centroids. Assignment accuracy was measured by comparing the tract assigned with the tract originally identified from the full street address.

Results: Assigning cases to census tracts using the race/ethnicity population distribution within a postal code resulted in more correctly assigned cases than when using postal code centroids. The addition of age characteristics increased the match rates even further. Match rates were highly dependent on both the geographic distribution of race/ethnicity groups and population density.

Conclusion: Geo-imputation appears to offer some advantages and no serious drawbacks as compared with the alternative of assigning cases to census tracts based on postal code centroids. For a specific analysis, researchers will still need to consider the potential impact of geocoding quality on their results and evaluate the possibility that it might introduce geographical bias.

Show MeSH

Related in: MedlinePlus

Census block centroid populations are used to calculate the proportion of census tract populations which fall within the boundaries of ZIP codes. For example, the portion of census tract 1811.00 within ZIP code 07524 receives only 3,101 individuals of the total census tract population of 6,774.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2266732&req=5

Figure 1: Census block centroid populations are used to calculate the proportion of census tract populations which fall within the boundaries of ZIP codes. For example, the portion of census tract 1811.00 within ZIP code 07524 receives only 3,101 individuals of the total census tract population of 6,774.

Mentions: Because some census tracts overlap postal ZIP code boundaries, we used census block centroid locations to recalculate the portion of the tract population falling within each ZIP code [36]. We used a point in polygon operation in ArcGIS 9.1 software to assign each census block a ZIP code and then aggregated the data based on unique county, census tract, and ZIP code identifiers (Figure 1) [32]. Figure 1 provides an example of how populations were assigned to census tracts using census block centroids. This figure illustrates how within ZIP code 07524, tract 1811.00 receives only a portion of the total census tract population based on the census blocks falling within its boundary. Using data from the 2000 U.S. Census Short form (SF1), for each ZIP code we calculated the percent of the population in each tract by race/ethnicity alone and by race/ethnicity for the four defined age groups. The following race/ethnicity and age census populations were included: total population (P001001), non-Hispanic White (P004005, PCT12I), non-Hispanic Asian (P004008, PCT12L), non-Hispanic Black (P004006, PCT12J), and Hispanic (P004002, PCT12H) [33].


Estimating the accuracy of geographical imputation.

Henry KA, Boscoe FP - Int J Health Geogr (2008)

Census block centroid populations are used to calculate the proportion of census tract populations which fall within the boundaries of ZIP codes. For example, the portion of census tract 1811.00 within ZIP code 07524 receives only 3,101 individuals of the total census tract population of 6,774.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2266732&req=5

Figure 1: Census block centroid populations are used to calculate the proportion of census tract populations which fall within the boundaries of ZIP codes. For example, the portion of census tract 1811.00 within ZIP code 07524 receives only 3,101 individuals of the total census tract population of 6,774.
Mentions: Because some census tracts overlap postal ZIP code boundaries, we used census block centroid locations to recalculate the portion of the tract population falling within each ZIP code [36]. We used a point in polygon operation in ArcGIS 9.1 software to assign each census block a ZIP code and then aggregated the data based on unique county, census tract, and ZIP code identifiers (Figure 1) [32]. Figure 1 provides an example of how populations were assigned to census tracts using census block centroids. This figure illustrates how within ZIP code 07524, tract 1811.00 receives only a portion of the total census tract population based on the census blocks falling within its boundary. Using data from the 2000 U.S. Census Short form (SF1), for each ZIP code we calculated the percent of the population in each tract by race/ethnicity alone and by race/ethnicity for the four defined age groups. The following race/ethnicity and age census populations were included: total population (P001001), non-Hispanic White (P004005, PCT12I), non-Hispanic Asian (P004008, PCT12L), non-Hispanic Black (P004006, PCT12J), and Hispanic (P004002, PCT12H) [33].

Bottom Line: To reduce the number of non-geocoded cases researchers and organizations sometimes include cases geocoded to postal code centroids along with cases geocoded with the greater precision of a full street address.Assigning cases to census tracts using the race/ethnicity population distribution within a postal code resulted in more correctly assigned cases than when using postal code centroids.Geo-imputation appears to offer some advantages and no serious drawbacks as compared with the alternative of assigning cases to census tracts based on postal code centroids.

View Article: PubMed Central - HTML - PubMed

Affiliation: New Jersey Department of Health & Senior Services, Cancer Epidemiology Services, New Jersey State Cancer Registry, Trenton, New Jersey, USA. kevin.henry@doh.state.nj.us

ABSTRACT

Background: To reduce the number of non-geocoded cases researchers and organizations sometimes include cases geocoded to postal code centroids along with cases geocoded with the greater precision of a full street address. Some analysts then use the postal code to assign information to the cases from finer-level geographies such as a census tract. Assignment is commonly completed using either a postal centroid or by a geographical imputation method which assigns a location by using both the demographic characteristics of the case and the population characteristics of the postal delivery area. To date no systematic evaluation of geographical imputation methods ("geo-imputation") has been completed. The objective of this study was to determine the accuracy of census tract assignment using geo-imputation.

Methods: Using a large dataset of breast, prostate and colorectal cancer cases reported to the New Jersey Cancer Registry, we determined how often cases were assigned to the correct census tract using alternate strategies of demographic based geo-imputation, and using assignments obtained from postal code centroids. Assignment accuracy was measured by comparing the tract assigned with the tract originally identified from the full street address.

Results: Assigning cases to census tracts using the race/ethnicity population distribution within a postal code resulted in more correctly assigned cases than when using postal code centroids. The addition of age characteristics increased the match rates even further. Match rates were highly dependent on both the geographic distribution of race/ethnicity groups and population density.

Conclusion: Geo-imputation appears to offer some advantages and no serious drawbacks as compared with the alternative of assigning cases to census tracts based on postal code centroids. For a specific analysis, researchers will still need to consider the potential impact of geocoding quality on their results and evaluate the possibility that it might introduce geographical bias.

Show MeSH
Related in: MedlinePlus