Limits...
A comparison of spatial clustering and cluster detection techniques for childhood leukemia incidence in Ohio, 1996-2003.

Wheeler DC - Int J Health Geogr (2007)

Bottom Line: Numerous studies in the literature have focused on childhood leukemia because of its relatively large incidence among children compared with other malignant diseases and substantial public concern over elevated leukemia incidence.We found some evidence, although inconclusive, of significant local clusters in childhood leukemia in Ohio, but no significant overall clustering.The findings are consistent for the different tests of global clustering, where no significant clustering is demonstrated with any of the techniques when all age cases are considered together.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biostatistics, Emory University, Atlanta, GA, USA. dcwheel@sph.emory.edu

ABSTRACT

Background: Spatial cluster detection is an important tool in cancer surveillance to identify areas of elevated risk and to generate hypotheses about cancer etiology. There are many cluster detection methods used in spatial epidemiology to investigate suspicious groupings of cancer occurrences in regional count data and case-control data, where controls are sampled from the at-risk population. Numerous studies in the literature have focused on childhood leukemia because of its relatively large incidence among children compared with other malignant diseases and substantial public concern over elevated leukemia incidence. The main focus of this paper is an analysis of the spatial distribution of leukemia incidence among children from 0 to 14 years of age in Ohio from 1996-2003 using individual case data from the Ohio Cancer Incidence Surveillance System (OCISS).Specifically, we explore whether there is statistically significant global clustering and if there are statistically significant local clusters of individual leukemia cases in Ohio using numerous published methods of spatial cluster detection, including spatial point process summary methods, a nearest neighbor method, and a local rate scanning method. We use the K function, Cuzick and Edward's method, and the kernel intensity function to test for significant global clustering and the kernel intensity function and Kulldorff's spatial scan statistic in SaTScan to test for significant local clusters.

Results: We found some evidence, although inconclusive, of significant local clusters in childhood leukemia in Ohio, but no significant overall clustering. The findings from the local cluster detection analyses are not consistent for the different cluster detection techniques, where the spatial scan method in SaTScan does not find statistically significant local clusters, while the kernel intensity function method suggests statistically significant clusters in areas of central, southern, and eastern Ohio. The findings are consistent for the different tests of global clustering, where no significant clustering is demonstrated with any of the techniques when all age cases are considered together.

Conclusion: This comparative study for childhood leukemia clustering and clusters in Ohio revealed several research issues in practical spatial cluster detection. Among them, flexibility in cluster shape detection should be an issue for consideration.

Show MeSH

Related in: MedlinePlus

Contours of estimated kernel density functions for cases and controls with UTM coordinates.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1851703&req=5

Figure 4: Contours of estimated kernel density functions for cases and controls with UTM coordinates.

Mentions: While the K function is designed to test for clustering, the kernel intensity function introduced by Kelsall and Diggle [45] can be used to test for clustering and the presence and location of local clusters. In fact, it is the only test in this comparison that can explicitly evaluate both conditions. The kernel intensity function calculates the number of events expected in an area at location s (intensity) or the probability of an event occurring at location s (density) using a kernel function. The intensity and density functions are proportional and are often used interchangeably in practice [19]. The kernel function requires a bandwidth that determines the size of the kernel and the overall smoothness of the resulting estimate. In a Gaussian kernel, which we make use of in this study, the bandwidth corresponds to the standard deviation and larger bandwidths result in smoother kernel intensity functions. We use Scott's [46] rule for optimal bandwidth selection in a Gaussian kernel, where Scott's rule considers the number of events and spatial variance of events in a point pattern when calculating the bandwidth. The two-dimensional Gaussian kernel we use has a bandwidth in both the u and v directions, where the map coordinates are in the form of (u, v). Applying Scott's rule to the Ohio data results in bandwidths of 34,627 meters in the u direction and 30,882 meters in the v direction for cases and bandwidths of 23,753 meters in the u direction and meters units in the v direction for controls. The kernel function uses distance between a location s and all other points as input to calculate an intensity function at s. We evaluate the kernel function at each point on a 40 × 40 grid that completely contains the study area, where the distance between adjacent grid points is approximately 11,619 meters. Figure 4 contains contour plots of the kernel density function for cases and controls separately. The plots show similar patterns in the probability of an event occurring at a given point in the study area, where the probabilities are highest in the three largest metropolitan areas of Cincinnati, Columbus, and Cleveland. While the plots are somewhat informative, a formal test of difference in the patterns would be helpful.


A comparison of spatial clustering and cluster detection techniques for childhood leukemia incidence in Ohio, 1996-2003.

Wheeler DC - Int J Health Geogr (2007)

Contours of estimated kernel density functions for cases and controls with UTM coordinates.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1851703&req=5

Figure 4: Contours of estimated kernel density functions for cases and controls with UTM coordinates.
Mentions: While the K function is designed to test for clustering, the kernel intensity function introduced by Kelsall and Diggle [45] can be used to test for clustering and the presence and location of local clusters. In fact, it is the only test in this comparison that can explicitly evaluate both conditions. The kernel intensity function calculates the number of events expected in an area at location s (intensity) or the probability of an event occurring at location s (density) using a kernel function. The intensity and density functions are proportional and are often used interchangeably in practice [19]. The kernel function requires a bandwidth that determines the size of the kernel and the overall smoothness of the resulting estimate. In a Gaussian kernel, which we make use of in this study, the bandwidth corresponds to the standard deviation and larger bandwidths result in smoother kernel intensity functions. We use Scott's [46] rule for optimal bandwidth selection in a Gaussian kernel, where Scott's rule considers the number of events and spatial variance of events in a point pattern when calculating the bandwidth. The two-dimensional Gaussian kernel we use has a bandwidth in both the u and v directions, where the map coordinates are in the form of (u, v). Applying Scott's rule to the Ohio data results in bandwidths of 34,627 meters in the u direction and 30,882 meters in the v direction for cases and bandwidths of 23,753 meters in the u direction and meters units in the v direction for controls. The kernel function uses distance between a location s and all other points as input to calculate an intensity function at s. We evaluate the kernel function at each point on a 40 × 40 grid that completely contains the study area, where the distance between adjacent grid points is approximately 11,619 meters. Figure 4 contains contour plots of the kernel density function for cases and controls separately. The plots show similar patterns in the probability of an event occurring at a given point in the study area, where the probabilities are highest in the three largest metropolitan areas of Cincinnati, Columbus, and Cleveland. While the plots are somewhat informative, a formal test of difference in the patterns would be helpful.

Bottom Line: Numerous studies in the literature have focused on childhood leukemia because of its relatively large incidence among children compared with other malignant diseases and substantial public concern over elevated leukemia incidence.We found some evidence, although inconclusive, of significant local clusters in childhood leukemia in Ohio, but no significant overall clustering.The findings are consistent for the different tests of global clustering, where no significant clustering is demonstrated with any of the techniques when all age cases are considered together.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biostatistics, Emory University, Atlanta, GA, USA. dcwheel@sph.emory.edu

ABSTRACT

Background: Spatial cluster detection is an important tool in cancer surveillance to identify areas of elevated risk and to generate hypotheses about cancer etiology. There are many cluster detection methods used in spatial epidemiology to investigate suspicious groupings of cancer occurrences in regional count data and case-control data, where controls are sampled from the at-risk population. Numerous studies in the literature have focused on childhood leukemia because of its relatively large incidence among children compared with other malignant diseases and substantial public concern over elevated leukemia incidence. The main focus of this paper is an analysis of the spatial distribution of leukemia incidence among children from 0 to 14 years of age in Ohio from 1996-2003 using individual case data from the Ohio Cancer Incidence Surveillance System (OCISS).Specifically, we explore whether there is statistically significant global clustering and if there are statistically significant local clusters of individual leukemia cases in Ohio using numerous published methods of spatial cluster detection, including spatial point process summary methods, a nearest neighbor method, and a local rate scanning method. We use the K function, Cuzick and Edward's method, and the kernel intensity function to test for significant global clustering and the kernel intensity function and Kulldorff's spatial scan statistic in SaTScan to test for significant local clusters.

Results: We found some evidence, although inconclusive, of significant local clusters in childhood leukemia in Ohio, but no significant overall clustering. The findings from the local cluster detection analyses are not consistent for the different cluster detection techniques, where the spatial scan method in SaTScan does not find statistically significant local clusters, while the kernel intensity function method suggests statistically significant clusters in areas of central, southern, and eastern Ohio. The findings are consistent for the different tests of global clustering, where no significant clustering is demonstrated with any of the techniques when all age cases are considered together.

Conclusion: This comparative study for childhood leukemia clustering and clusters in Ohio revealed several research issues in practical spatial cluster detection. Among them, flexibility in cluster shape detection should be an issue for consideration.

Show MeSH
Related in: MedlinePlus