Limits...
Early detection of tuberculosis outbreaks among the San Francisco homeless: trade-offs between spatial resolution and temporal scale.

Higgs BW, Mohtashemi M, Grinsdale J, Kawamura LM - PLoS ONE (2007)

Bottom Line: We examine the effect of varying the spatial resolution in the TB data within the San Francisco homeless population on detection sensitivity, timeliness, and the amount of historical data needed to achieve better performance measures.Systematic characterization of the spatio-temporal distribution of TB cases can widely benefit real time surveillance and guide public health investigations of TB outbreaks as to what level of spatial resolution results in improved detection sensitivity and timeliness.This study is a step forward in this direction.

View Article: PubMed Central - PubMed

Affiliation: MITRE Corporation, McLean, Virginia, United States of America. bhiggs100@yahoo.com

ABSTRACT

Background: San Francisco has the highest rate of tuberculosis (TB) in the U.S. with recurrent outbreaks among the homeless and marginally housed. It has been shown for syndromic data that when exact geographic coordinates of individual patients are used as the spatial base for outbreak detection, higher detection rates and accuracy are achieved compared to when data are aggregated into administrative regions such as zip codes and census tracts. We examine the effect of varying the spatial resolution in the TB data within the San Francisco homeless population on detection sensitivity, timeliness, and the amount of historical data needed to achieve better performance measures.

Methods and findings: We apply a variation of space-time permutation scan statistic to the TB data in which a patient's location is either represented by its exact coordinates or by the centroid of its census tract. We show that the detection sensitivity and timeliness of the method generally improve when exact locations are used to identify real TB outbreaks. When outbreaks are simulated, while the detection timeliness is consistently improved when exact coordinates are used, the detection sensitivity varies depending on the size of the spatial scanning window and the number of tracts in which cases are simulated. Finally, we show that when exact locations are used, smaller amount of historical data is required for training the model.

Conclusion: Systematic characterization of the spatio-temporal distribution of TB cases can widely benefit real time surveillance and guide public health investigations of TB outbreaks as to what level of spatial resolution results in improved detection sensitivity and timeliness. Trading higher spatial resolution for better performance is ultimately a tradeoff between maintaining patient confidentiality and improving public health when sharing data. Understanding such tradeoffs is critical to managing the complex interplay between public policy and public health. This study is a step forward in this direction.

Show MeSH

Related in: MedlinePlus

An illustration of partitioning the map of San Francisco using the overlapping square grid approach.Three overlapping squares, representing three spatial scanning windows in the detection method, are overlaid upon the northwest quadrant of San Francisco. The map is partitioned by census tracts. A single square size and only a few census tract centroids (blue ‘x’ symbols) are represented in this example for illustrative purposes.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2100173&req=5

pone-0001284-g002: An illustration of partitioning the map of San Francisco using the overlapping square grid approach.Three overlapping squares, representing three spatial scanning windows in the detection method, are overlaid upon the northwest quadrant of San Francisco. The map is partitioned by census tracts. A single square size and only a few census tract centroids (blue ‘x’ symbols) are represented in this example for illustrative purposes.

Mentions: A variation to the space-time permutation scan statistic introduced by Kulldorff et al. [10] using a square grid approach provided by Neill et al. [17] is implemented here for space-time investigation of TB outbreaks in the San Francisco homeless population. Briefly, the method can be described as follows. Instead of using circles of multiple radii as spatial bases for scanning cylinders [10], a square grid approach [17] is employed here. The method iteratively runs varied sizes of squares ranging from 0.02 km to 1 km in width, akin to varied radii sizes of circles in [10]. In every iteration of the algorithm, overlapping grids containing p squares, each of area r2 are placed over the entire region, where the grid overlap is permitted at half the width of each square, representing the spatial domain. That is to say, for a particular row using a specified square size, each square is overlaid upon an adjacent square with half of the width of both squares overlapping. Then, for the next row, the same procedure is implemented, where the width of a square in the subsequent row overlaps with the adjacent square in the same row (half the width) and the square from the preceding row (half the height). Figure 2 illustrates an example of such a geographic partitioning approach using three overlapping squares overlaid on a map of the Northwest quadrant of San Francisco. The time domain, analogous to Kulldorff et al. [10], is represented by the height of such (square) cylinders. For each square, the expected number of cases, conditioned on the observed marginals is denoted by μ where μ is defined as the summation of expected number of cases in a cylinder, given by where s is the spatial cluster and t is the time span used, and , where N is the total number of cases and nst is the number of cases in either the space or time window (according to the summation term). The observed number of cases for the same cylinder is denoted by n. It is important to note that because we do not have an accurate account of the population at risk (e.g. the size of homeless population), the expected values are derived from the TB case counts. Then the Poisson generalized likelihood ratio (GLR), which is used as a measure for a potential outbreak in the current cylinder, is given by [18]. To assign a degree of significance to the GLR value for each cylinder, Monte Carlo hypothesis testing [19] is conducted, where the observed cases are randomly shuffled, though the spatial and temporal marginals are unchanged, and the GLR value is calculated for each square [10]. This process of randomly shuffling is conducted over 999 trials and the random GLR values are ranked. A p-value for the original GLR is then assigned by its relative ranking within the random GLR values.


Early detection of tuberculosis outbreaks among the San Francisco homeless: trade-offs between spatial resolution and temporal scale.

Higgs BW, Mohtashemi M, Grinsdale J, Kawamura LM - PLoS ONE (2007)

An illustration of partitioning the map of San Francisco using the overlapping square grid approach.Three overlapping squares, representing three spatial scanning windows in the detection method, are overlaid upon the northwest quadrant of San Francisco. The map is partitioned by census tracts. A single square size and only a few census tract centroids (blue ‘x’ symbols) are represented in this example for illustrative purposes.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2100173&req=5

pone-0001284-g002: An illustration of partitioning the map of San Francisco using the overlapping square grid approach.Three overlapping squares, representing three spatial scanning windows in the detection method, are overlaid upon the northwest quadrant of San Francisco. The map is partitioned by census tracts. A single square size and only a few census tract centroids (blue ‘x’ symbols) are represented in this example for illustrative purposes.
Mentions: A variation to the space-time permutation scan statistic introduced by Kulldorff et al. [10] using a square grid approach provided by Neill et al. [17] is implemented here for space-time investigation of TB outbreaks in the San Francisco homeless population. Briefly, the method can be described as follows. Instead of using circles of multiple radii as spatial bases for scanning cylinders [10], a square grid approach [17] is employed here. The method iteratively runs varied sizes of squares ranging from 0.02 km to 1 km in width, akin to varied radii sizes of circles in [10]. In every iteration of the algorithm, overlapping grids containing p squares, each of area r2 are placed over the entire region, where the grid overlap is permitted at half the width of each square, representing the spatial domain. That is to say, for a particular row using a specified square size, each square is overlaid upon an adjacent square with half of the width of both squares overlapping. Then, for the next row, the same procedure is implemented, where the width of a square in the subsequent row overlaps with the adjacent square in the same row (half the width) and the square from the preceding row (half the height). Figure 2 illustrates an example of such a geographic partitioning approach using three overlapping squares overlaid on a map of the Northwest quadrant of San Francisco. The time domain, analogous to Kulldorff et al. [10], is represented by the height of such (square) cylinders. For each square, the expected number of cases, conditioned on the observed marginals is denoted by μ where μ is defined as the summation of expected number of cases in a cylinder, given by where s is the spatial cluster and t is the time span used, and , where N is the total number of cases and nst is the number of cases in either the space or time window (according to the summation term). The observed number of cases for the same cylinder is denoted by n. It is important to note that because we do not have an accurate account of the population at risk (e.g. the size of homeless population), the expected values are derived from the TB case counts. Then the Poisson generalized likelihood ratio (GLR), which is used as a measure for a potential outbreak in the current cylinder, is given by [18]. To assign a degree of significance to the GLR value for each cylinder, Monte Carlo hypothesis testing [19] is conducted, where the observed cases are randomly shuffled, though the spatial and temporal marginals are unchanged, and the GLR value is calculated for each square [10]. This process of randomly shuffling is conducted over 999 trials and the random GLR values are ranked. A p-value for the original GLR is then assigned by its relative ranking within the random GLR values.

Bottom Line: We examine the effect of varying the spatial resolution in the TB data within the San Francisco homeless population on detection sensitivity, timeliness, and the amount of historical data needed to achieve better performance measures.Systematic characterization of the spatio-temporal distribution of TB cases can widely benefit real time surveillance and guide public health investigations of TB outbreaks as to what level of spatial resolution results in improved detection sensitivity and timeliness.This study is a step forward in this direction.

View Article: PubMed Central - PubMed

Affiliation: MITRE Corporation, McLean, Virginia, United States of America. bhiggs100@yahoo.com

ABSTRACT

Background: San Francisco has the highest rate of tuberculosis (TB) in the U.S. with recurrent outbreaks among the homeless and marginally housed. It has been shown for syndromic data that when exact geographic coordinates of individual patients are used as the spatial base for outbreak detection, higher detection rates and accuracy are achieved compared to when data are aggregated into administrative regions such as zip codes and census tracts. We examine the effect of varying the spatial resolution in the TB data within the San Francisco homeless population on detection sensitivity, timeliness, and the amount of historical data needed to achieve better performance measures.

Methods and findings: We apply a variation of space-time permutation scan statistic to the TB data in which a patient's location is either represented by its exact coordinates or by the centroid of its census tract. We show that the detection sensitivity and timeliness of the method generally improve when exact locations are used to identify real TB outbreaks. When outbreaks are simulated, while the detection timeliness is consistently improved when exact coordinates are used, the detection sensitivity varies depending on the size of the spatial scanning window and the number of tracts in which cases are simulated. Finally, we show that when exact locations are used, smaller amount of historical data is required for training the model.

Conclusion: Systematic characterization of the spatio-temporal distribution of TB cases can widely benefit real time surveillance and guide public health investigations of TB outbreaks as to what level of spatial resolution results in improved detection sensitivity and timeliness. Trading higher spatial resolution for better performance is ultimately a tradeoff between maintaining patient confidentiality and improving public health when sharing data. Understanding such tradeoffs is critical to managing the complex interplay between public policy and public health. This study is a step forward in this direction.

Show MeSH
Related in: MedlinePlus