Limits...
A comparison of spatial clustering and cluster detection techniques for childhood leukemia incidence in Ohio, 1996-2003.

Wheeler DC - Int J Health Geogr (2007)

Bottom Line: Numerous studies in the literature have focused on childhood leukemia because of its relatively large incidence among children compared with other malignant diseases and substantial public concern over elevated leukemia incidence.We found some evidence, although inconclusive, of significant local clusters in childhood leukemia in Ohio, but no significant overall clustering.The findings are consistent for the different tests of global clustering, where no significant clustering is demonstrated with any of the techniques when all age cases are considered together.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biostatistics, Emory University, Atlanta, GA, USA. dcwheel@sph.emory.edu

ABSTRACT

Background: Spatial cluster detection is an important tool in cancer surveillance to identify areas of elevated risk and to generate hypotheses about cancer etiology. There are many cluster detection methods used in spatial epidemiology to investigate suspicious groupings of cancer occurrences in regional count data and case-control data, where controls are sampled from the at-risk population. Numerous studies in the literature have focused on childhood leukemia because of its relatively large incidence among children compared with other malignant diseases and substantial public concern over elevated leukemia incidence. The main focus of this paper is an analysis of the spatial distribution of leukemia incidence among children from 0 to 14 years of age in Ohio from 1996-2003 using individual case data from the Ohio Cancer Incidence Surveillance System (OCISS).Specifically, we explore whether there is statistically significant global clustering and if there are statistically significant local clusters of individual leukemia cases in Ohio using numerous published methods of spatial cluster detection, including spatial point process summary methods, a nearest neighbor method, and a local rate scanning method. We use the K function, Cuzick and Edward's method, and the kernel intensity function to test for significant global clustering and the kernel intensity function and Kulldorff's spatial scan statistic in SaTScan to test for significant local clusters.

Results: We found some evidence, although inconclusive, of significant local clusters in childhood leukemia in Ohio, but no significant overall clustering. The findings from the local cluster detection analyses are not consistent for the different cluster detection techniques, where the spatial scan method in SaTScan does not find statistically significant local clusters, while the kernel intensity function method suggests statistically significant clusters in areas of central, southern, and eastern Ohio. The findings are consistent for the different tests of global clustering, where no significant clustering is demonstrated with any of the techniques when all age cases are considered together.

Conclusion: This comparative study for childhood leukemia clustering and clusters in Ohio revealed several research issues in practical spatial cluster detection. Among them, flexibility in cluster shape detection should be an issue for consideration.

Show MeSH

Related in: MedlinePlus

Simulated values for the test of global clustering using kernel density functions (p-value = 0.27).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1851703&req=5

Figure 6: Simulated values for the test of global clustering using kernel density functions (p-value = 0.27).

Mentions: Conveniently, one can calculate a log ratio of kernel intensity functions for cases and controls to get a log relative risk at a location on the grid. When considering all grid points that cover the study area, this yields a log relative risk surface. To calculate this log relative risk surface, we first redefine the kernel bandwidth with the kernel intensity function ratio because it is beneficial to have the same kernel bandwidth in both cases and controls in order to have an equal spatial extent covered in the numerator and denominator of the ratio. We initially choose for a kernel bandwidth in both dimensions the mean of the control optimal bandwidths calculated previously, which is 22,647 distance units. We favor the controls in this bandwidth selection because there are many more of them than cases and they should in theory reflect the underlying population distribution. This bandwidth yields a smaller kernel than with the cases, and will reveal more detail in the estimated kernel intensity function but will also be more variable. With the kernel intensity function ratio, one can again use Monte Carlo randomization of the case labels to detect significant local differences in case and control intensities. Figure 5 shows the log relative risk surface using the log ratio of kernel density functions for cases and controls and also shows the significant areas of log relative risk using the 2.5% lower and 97.5% upper tolerance limits from 999 Monte Carlo randomizations of the case labels. The hatched ("+") areas in the plot on the right indicate significant local clusters of elevated log relative risk and correspond to the higher points on the log relative risk surface map. The areas with "-" symbols have significantly low disease incidence. The contour lines on this plot correspond to smoothed levels of log relative risk. The contour lines with value 0.5 indicate relative risks of approximately 1.6 and the contour line of 1 signifies relative risks of approximately 2.7. The highest log relative risk is just over 1.5 (relative risk of 4.5) and is found in the hatched area in eastern Ohio. The plots suggest that there are areas of higher disease incidence in central, southern, and eastern Ohio, and also an area of lower incidence southeast of Cincinnati. As mentioned earlier, one can also test for overall clustering with the kernel intensity function method using a mathematical summary of the local function ratios. The test statistic is a sum of squared log ratios of kernel intensity functions across the study area. Monte Carlo randomization is used to assess significance of the test statistic for clustering. Figure 6 is a histogram of the values of the test statistic from the Monte Carlo randomizations of the case labels, along with the test statistic for the original data plotted on the histogram as a vertical line. The p-value of 0.27 indicates that there is no significant global clustering in the cancer cases, considering the distribution of the at-risk population. To explore the sensitivity of the results to the selected kernel bandwidth, we next choose a compromise kernel bandwidth in both kernel dimensions as the mean of the optimal case and control bandwidths calculated previously, which results in a bandwidth of 27,701 distance units. The log relative risk surface and significant risk areas with this new kernel bandwidth are plotted in Figure 7. The bandwidth used to generate Figure 7 is larger than the one used to generate Figure 5, and the new resulting risk surface is slightly smoother than the one in Figure 5. The areas of significant elevated risk visible in Figure 5 are also present in Figure 7, but the larger bandwidth extends the significant cluster areas and now two north-south swaths are clearly apparent. The largest log relative risk with this bandwidth is approximately 1.4 (relative risk 4.4) and is located in the same hatched area in eastern Ohio as in Figure 5. As was the case with the first bandwidth, the test for overall clustering using the summary of the local log ratios of kernel functions with the second bandwidth is not significant, but the p-value decreases to 0.08 in this case.


A comparison of spatial clustering and cluster detection techniques for childhood leukemia incidence in Ohio, 1996-2003.

Wheeler DC - Int J Health Geogr (2007)

Simulated values for the test of global clustering using kernel density functions (p-value = 0.27).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1851703&req=5

Figure 6: Simulated values for the test of global clustering using kernel density functions (p-value = 0.27).
Mentions: Conveniently, one can calculate a log ratio of kernel intensity functions for cases and controls to get a log relative risk at a location on the grid. When considering all grid points that cover the study area, this yields a log relative risk surface. To calculate this log relative risk surface, we first redefine the kernel bandwidth with the kernel intensity function ratio because it is beneficial to have the same kernel bandwidth in both cases and controls in order to have an equal spatial extent covered in the numerator and denominator of the ratio. We initially choose for a kernel bandwidth in both dimensions the mean of the control optimal bandwidths calculated previously, which is 22,647 distance units. We favor the controls in this bandwidth selection because there are many more of them than cases and they should in theory reflect the underlying population distribution. This bandwidth yields a smaller kernel than with the cases, and will reveal more detail in the estimated kernel intensity function but will also be more variable. With the kernel intensity function ratio, one can again use Monte Carlo randomization of the case labels to detect significant local differences in case and control intensities. Figure 5 shows the log relative risk surface using the log ratio of kernel density functions for cases and controls and also shows the significant areas of log relative risk using the 2.5% lower and 97.5% upper tolerance limits from 999 Monte Carlo randomizations of the case labels. The hatched ("+") areas in the plot on the right indicate significant local clusters of elevated log relative risk and correspond to the higher points on the log relative risk surface map. The areas with "-" symbols have significantly low disease incidence. The contour lines on this plot correspond to smoothed levels of log relative risk. The contour lines with value 0.5 indicate relative risks of approximately 1.6 and the contour line of 1 signifies relative risks of approximately 2.7. The highest log relative risk is just over 1.5 (relative risk of 4.5) and is found in the hatched area in eastern Ohio. The plots suggest that there are areas of higher disease incidence in central, southern, and eastern Ohio, and also an area of lower incidence southeast of Cincinnati. As mentioned earlier, one can also test for overall clustering with the kernel intensity function method using a mathematical summary of the local function ratios. The test statistic is a sum of squared log ratios of kernel intensity functions across the study area. Monte Carlo randomization is used to assess significance of the test statistic for clustering. Figure 6 is a histogram of the values of the test statistic from the Monte Carlo randomizations of the case labels, along with the test statistic for the original data plotted on the histogram as a vertical line. The p-value of 0.27 indicates that there is no significant global clustering in the cancer cases, considering the distribution of the at-risk population. To explore the sensitivity of the results to the selected kernel bandwidth, we next choose a compromise kernel bandwidth in both kernel dimensions as the mean of the optimal case and control bandwidths calculated previously, which results in a bandwidth of 27,701 distance units. The log relative risk surface and significant risk areas with this new kernel bandwidth are plotted in Figure 7. The bandwidth used to generate Figure 7 is larger than the one used to generate Figure 5, and the new resulting risk surface is slightly smoother than the one in Figure 5. The areas of significant elevated risk visible in Figure 5 are also present in Figure 7, but the larger bandwidth extends the significant cluster areas and now two north-south swaths are clearly apparent. The largest log relative risk with this bandwidth is approximately 1.4 (relative risk 4.4) and is located in the same hatched area in eastern Ohio as in Figure 5. As was the case with the first bandwidth, the test for overall clustering using the summary of the local log ratios of kernel functions with the second bandwidth is not significant, but the p-value decreases to 0.08 in this case.

Bottom Line: Numerous studies in the literature have focused on childhood leukemia because of its relatively large incidence among children compared with other malignant diseases and substantial public concern over elevated leukemia incidence.We found some evidence, although inconclusive, of significant local clusters in childhood leukemia in Ohio, but no significant overall clustering.The findings are consistent for the different tests of global clustering, where no significant clustering is demonstrated with any of the techniques when all age cases are considered together.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biostatistics, Emory University, Atlanta, GA, USA. dcwheel@sph.emory.edu

ABSTRACT

Background: Spatial cluster detection is an important tool in cancer surveillance to identify areas of elevated risk and to generate hypotheses about cancer etiology. There are many cluster detection methods used in spatial epidemiology to investigate suspicious groupings of cancer occurrences in regional count data and case-control data, where controls are sampled from the at-risk population. Numerous studies in the literature have focused on childhood leukemia because of its relatively large incidence among children compared with other malignant diseases and substantial public concern over elevated leukemia incidence. The main focus of this paper is an analysis of the spatial distribution of leukemia incidence among children from 0 to 14 years of age in Ohio from 1996-2003 using individual case data from the Ohio Cancer Incidence Surveillance System (OCISS).Specifically, we explore whether there is statistically significant global clustering and if there are statistically significant local clusters of individual leukemia cases in Ohio using numerous published methods of spatial cluster detection, including spatial point process summary methods, a nearest neighbor method, and a local rate scanning method. We use the K function, Cuzick and Edward's method, and the kernel intensity function to test for significant global clustering and the kernel intensity function and Kulldorff's spatial scan statistic in SaTScan to test for significant local clusters.

Results: We found some evidence, although inconclusive, of significant local clusters in childhood leukemia in Ohio, but no significant overall clustering. The findings from the local cluster detection analyses are not consistent for the different cluster detection techniques, where the spatial scan method in SaTScan does not find statistically significant local clusters, while the kernel intensity function method suggests statistically significant clusters in areas of central, southern, and eastern Ohio. The findings are consistent for the different tests of global clustering, where no significant clustering is demonstrated with any of the techniques when all age cases are considered together.

Conclusion: This comparative study for childhood leukemia clustering and clusters in Ohio revealed several research issues in practical spatial cluster detection. Among them, flexibility in cluster shape detection should be an issue for consideration.

Show MeSH
Related in: MedlinePlus