Limits...
A comparison of spatial clustering and cluster detection techniques for childhood leukemia incidence in Ohio, 1996-2003.

Wheeler DC - Int J Health Geogr (2007)

Bottom Line: Numerous studies in the literature have focused on childhood leukemia because of its relatively large incidence among children compared with other malignant diseases and substantial public concern over elevated leukemia incidence.We found some evidence, although inconclusive, of significant local clusters in childhood leukemia in Ohio, but no significant overall clustering.The findings are consistent for the different tests of global clustering, where no significant clustering is demonstrated with any of the techniques when all age cases are considered together.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biostatistics, Emory University, Atlanta, GA, USA. dcwheel@sph.emory.edu

ABSTRACT

Background: Spatial cluster detection is an important tool in cancer surveillance to identify areas of elevated risk and to generate hypotheses about cancer etiology. There are many cluster detection methods used in spatial epidemiology to investigate suspicious groupings of cancer occurrences in regional count data and case-control data, where controls are sampled from the at-risk population. Numerous studies in the literature have focused on childhood leukemia because of its relatively large incidence among children compared with other malignant diseases and substantial public concern over elevated leukemia incidence. The main focus of this paper is an analysis of the spatial distribution of leukemia incidence among children from 0 to 14 years of age in Ohio from 1996-2003 using individual case data from the Ohio Cancer Incidence Surveillance System (OCISS).Specifically, we explore whether there is statistically significant global clustering and if there are statistically significant local clusters of individual leukemia cases in Ohio using numerous published methods of spatial cluster detection, including spatial point process summary methods, a nearest neighbor method, and a local rate scanning method. We use the K function, Cuzick and Edward's method, and the kernel intensity function to test for significant global clustering and the kernel intensity function and Kulldorff's spatial scan statistic in SaTScan to test for significant local clusters.

Results: We found some evidence, although inconclusive, of significant local clusters in childhood leukemia in Ohio, but no significant overall clustering. The findings from the local cluster detection analyses are not consistent for the different cluster detection techniques, where the spatial scan method in SaTScan does not find statistically significant local clusters, while the kernel intensity function method suggests statistically significant clusters in areas of central, southern, and eastern Ohio. The findings are consistent for the different tests of global clustering, where no significant clustering is demonstrated with any of the techniques when all age cases are considered together.

Conclusion: This comparative study for childhood leukemia clustering and clusters in Ohio revealed several research issues in practical spatial cluster detection. Among them, flexibility in cluster shape detection should be an issue for consideration.

Show MeSH

Related in: MedlinePlus

Childhood leukemia cases (unfilled circles) and controls (filled circles) for years 1996–2003.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1851703&req=5

Figure 1: Childhood leukemia cases (unfilled circles) and controls (filled circles) for years 1996–2003.

Mentions: In the subsequent analysis, we use 738 individual OCISS cases diagnosed between 1996–2003, geocoded to the street level using geographic information system (GIS) software from ESRI [36]. The use of the cancer data in this study was approved by the Ohio Department of Health Institutional Review Board. The childhood (0–14) leukemia rate for Ohio between years 1996–2003 was 4.2 per 100,000 persons, compared to the SEER rate of 4.8 per 100,000 persons [37]. The completeness of incidence data in OCISS varies by year, for example, the percent of completeness was 85% in 1996, 92% in 1998, and 95% in 1999 [38]. We excluded cases from the analysis that were not address matched to the street level and were matched only to the ZIP Code centroid level. There were 86 cases that were matched to the centroid level and omitted to avoid inducing spurious clustering. A map of these cases showed an essentially random pattern across Ohio, neither occurring in exclusively urban or rural areas, and the lack of pattern or concentration in the cases helped to justify removing them from the study. As stated earlier, this paper focuses on a spatial case-control study, which requires controls sampled from the at-risk population for leukemia that did not develop leukemia during the same time period of births as the reported cases. We used as controls births sampled from the Ohio Vital Statistics (OVS) records where there were digital files available, from 1989–2003, which contains most of the possible birth years of cases (1982–2003). More specifically, we began with 21,906 randomly sampled birth records from OVS that were geocoded to the street level and then systematically sampled 7,302 records as controls, selecting every third record where the birth records were ordered by longitude and latitude. Presumably, any rural bias in the failure to locate addresses in the geocoding process would affect both cases and controls, so any impact in the analysis presented here is likely slight. The systematic sampling scheme was employed to provide a geographically representative sample of the at-risk population and resulted in a control-case ratio of approximately 10 to 1. Visual comparison of the controls and the larger set of birth records suggested the controls were a spatially representative sample. The control-case ratio used was a compromise between using as many controls as possible and computation considerations for certain methods. The idea of using as many controls as possible draws from Peter Diggle's comments in his written discussion of Cuzick and Edwards' paper [39] introducing their nearest neighbor test for clustering. In fact, in a preliminary analysis with the Cuzick and Edwards method we used a control-case ratio of 3 to 1 to align with traditional case-control studies in epidemiology, but found significant clustering at small distances that appeared to be due to a lack of an adequate number of controls in some rural areas. A visual display of the controls using this ratio suggested that controls underrepresented the at-risk population in some rural areas. The ideal number of controls to use relative to the number of observed cases and the underlying population structure is an important issue left for future research. A map of the sampled controls from a 10 to 1 ratio of controls to cases shows a pattern that appears to better approximate the general distribution of population in Ohio. Figure 1 displays the sampled controls as filled circles and the cases as open circles, where points have been uniformly randomly shifted from their true locations for data confidentiality [40]. Based on the figure, it appears there is no clear overall clustering in the cases and no obvious clusters of cases, after visually accounting for the distribution of population, as represented by the controls. However, the map of cases can be misleading because of the potential for many cases to be located at nearly the same location given the map scale, and a statistical analysis is needed to formally test for clustering and the presence and location of clusters. To investigate potential clustering and local clusters, we assume a realization of a heterogeneous Poisson point process for the controls and a second such process for the cases, with a constant risk hypothesis where more cases are expected with a larger population at risk. To test for spatial heterogeneity in leukemia risk among groups, we perform four total sub-analyses, one for cases of acute lymphocytic leukemia (ALL), the dominant sub-type of leukemia among children, and three for mutually exclusive age groups of 0–4, 5–9, and 10–14 with the Cuzick and Edwards method and the scan statistic in SaTScan.


A comparison of spatial clustering and cluster detection techniques for childhood leukemia incidence in Ohio, 1996-2003.

Wheeler DC - Int J Health Geogr (2007)

Childhood leukemia cases (unfilled circles) and controls (filled circles) for years 1996–2003.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1851703&req=5

Figure 1: Childhood leukemia cases (unfilled circles) and controls (filled circles) for years 1996–2003.
Mentions: In the subsequent analysis, we use 738 individual OCISS cases diagnosed between 1996–2003, geocoded to the street level using geographic information system (GIS) software from ESRI [36]. The use of the cancer data in this study was approved by the Ohio Department of Health Institutional Review Board. The childhood (0–14) leukemia rate for Ohio between years 1996–2003 was 4.2 per 100,000 persons, compared to the SEER rate of 4.8 per 100,000 persons [37]. The completeness of incidence data in OCISS varies by year, for example, the percent of completeness was 85% in 1996, 92% in 1998, and 95% in 1999 [38]. We excluded cases from the analysis that were not address matched to the street level and were matched only to the ZIP Code centroid level. There were 86 cases that were matched to the centroid level and omitted to avoid inducing spurious clustering. A map of these cases showed an essentially random pattern across Ohio, neither occurring in exclusively urban or rural areas, and the lack of pattern or concentration in the cases helped to justify removing them from the study. As stated earlier, this paper focuses on a spatial case-control study, which requires controls sampled from the at-risk population for leukemia that did not develop leukemia during the same time period of births as the reported cases. We used as controls births sampled from the Ohio Vital Statistics (OVS) records where there were digital files available, from 1989–2003, which contains most of the possible birth years of cases (1982–2003). More specifically, we began with 21,906 randomly sampled birth records from OVS that were geocoded to the street level and then systematically sampled 7,302 records as controls, selecting every third record where the birth records were ordered by longitude and latitude. Presumably, any rural bias in the failure to locate addresses in the geocoding process would affect both cases and controls, so any impact in the analysis presented here is likely slight. The systematic sampling scheme was employed to provide a geographically representative sample of the at-risk population and resulted in a control-case ratio of approximately 10 to 1. Visual comparison of the controls and the larger set of birth records suggested the controls were a spatially representative sample. The control-case ratio used was a compromise between using as many controls as possible and computation considerations for certain methods. The idea of using as many controls as possible draws from Peter Diggle's comments in his written discussion of Cuzick and Edwards' paper [39] introducing their nearest neighbor test for clustering. In fact, in a preliminary analysis with the Cuzick and Edwards method we used a control-case ratio of 3 to 1 to align with traditional case-control studies in epidemiology, but found significant clustering at small distances that appeared to be due to a lack of an adequate number of controls in some rural areas. A visual display of the controls using this ratio suggested that controls underrepresented the at-risk population in some rural areas. The ideal number of controls to use relative to the number of observed cases and the underlying population structure is an important issue left for future research. A map of the sampled controls from a 10 to 1 ratio of controls to cases shows a pattern that appears to better approximate the general distribution of population in Ohio. Figure 1 displays the sampled controls as filled circles and the cases as open circles, where points have been uniformly randomly shifted from their true locations for data confidentiality [40]. Based on the figure, it appears there is no clear overall clustering in the cases and no obvious clusters of cases, after visually accounting for the distribution of population, as represented by the controls. However, the map of cases can be misleading because of the potential for many cases to be located at nearly the same location given the map scale, and a statistical analysis is needed to formally test for clustering and the presence and location of clusters. To investigate potential clustering and local clusters, we assume a realization of a heterogeneous Poisson point process for the controls and a second such process for the cases, with a constant risk hypothesis where more cases are expected with a larger population at risk. To test for spatial heterogeneity in leukemia risk among groups, we perform four total sub-analyses, one for cases of acute lymphocytic leukemia (ALL), the dominant sub-type of leukemia among children, and three for mutually exclusive age groups of 0–4, 5–9, and 10–14 with the Cuzick and Edwards method and the scan statistic in SaTScan.

Bottom Line: Numerous studies in the literature have focused on childhood leukemia because of its relatively large incidence among children compared with other malignant diseases and substantial public concern over elevated leukemia incidence.We found some evidence, although inconclusive, of significant local clusters in childhood leukemia in Ohio, but no significant overall clustering.The findings are consistent for the different tests of global clustering, where no significant clustering is demonstrated with any of the techniques when all age cases are considered together.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biostatistics, Emory University, Atlanta, GA, USA. dcwheel@sph.emory.edu

ABSTRACT

Background: Spatial cluster detection is an important tool in cancer surveillance to identify areas of elevated risk and to generate hypotheses about cancer etiology. There are many cluster detection methods used in spatial epidemiology to investigate suspicious groupings of cancer occurrences in regional count data and case-control data, where controls are sampled from the at-risk population. Numerous studies in the literature have focused on childhood leukemia because of its relatively large incidence among children compared with other malignant diseases and substantial public concern over elevated leukemia incidence. The main focus of this paper is an analysis of the spatial distribution of leukemia incidence among children from 0 to 14 years of age in Ohio from 1996-2003 using individual case data from the Ohio Cancer Incidence Surveillance System (OCISS).Specifically, we explore whether there is statistically significant global clustering and if there are statistically significant local clusters of individual leukemia cases in Ohio using numerous published methods of spatial cluster detection, including spatial point process summary methods, a nearest neighbor method, and a local rate scanning method. We use the K function, Cuzick and Edward's method, and the kernel intensity function to test for significant global clustering and the kernel intensity function and Kulldorff's spatial scan statistic in SaTScan to test for significant local clusters.

Results: We found some evidence, although inconclusive, of significant local clusters in childhood leukemia in Ohio, but no significant overall clustering. The findings from the local cluster detection analyses are not consistent for the different cluster detection techniques, where the spatial scan method in SaTScan does not find statistically significant local clusters, while the kernel intensity function method suggests statistically significant clusters in areas of central, southern, and eastern Ohio. The findings are consistent for the different tests of global clustering, where no significant clustering is demonstrated with any of the techniques when all age cases are considered together.

Conclusion: This comparative study for childhood leukemia clustering and clusters in Ohio revealed several research issues in practical spatial cluster detection. Among them, flexibility in cluster shape detection should be an issue for consideration.

Show MeSH
Related in: MedlinePlus