Limits...
A novel framework for validating and applying standardized small area measurement strategies.

Srebotnjak T, Mokdad AH, Murray CJ - Popul Health Metr (2010)

Bottom Line: Small area estimation of important health outcomes and risk factors can be improved using a systematic modeling and validation framework, which consistently outperformed single-year direct survey estimates and demonstrated the potential leverage of including relevant domain-specific covariates compared to pure measurement models.The proposed validation strategy can be applied to other disease outcomes and risk factors in the US as well as to resource-scarce situations, including low-income countries.These estimates are needed by public health officials to identify at-risk groups, to design targeted prevention and intervention programs, and to monitor and evaluate results over time.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute for Health Metrics and Evaluation, University of Washington, 2301 5th Ave, Suite 600, Seattle, WA 98121, USA. cjlm@uw.edu.

ABSTRACT

Background: Local measurements of health behaviors, diseases, and use of health services are critical inputs into local, state, and national decision-making. Small area measurement methods can deliver more precise and accurate local-level information than direct estimates from surveys or administrative records, where sample sizes are often too small to yield acceptable standard errors. However, small area measurement requires careful validation using approaches other than conventional statistical methods such as in-sample or cross-validation methods because they do not solve the problem of validating estimates in data-sparse domains.

Methods: A new general framework for small area estimation and validation is developed and applied to estimate Type 2 diabetes prevalence in US counties using data from the Behavioral Risk Factor Surveillance System (BRFSS). The framework combines the three conventional approaches to small area measurement: (1) pooling data across time by combining multiple survey years; (2) exploiting spatial correlation by including a spatial component; and (3) utilizing structured relationships between the outcome variable and domain-specific covariates to define four increasingly complex model types - coined the Naive, Geospatial, Covariate, and Full models. The validation framework uses direct estimates of prevalence in large domains as the gold standard and compares model estimates against it using (i) all available observations for the large domains and (ii) systematically reduced sample sizes obtained through random sampling with replacement. At each sampling level, the model is rerun repeatedly, and the validity of the model estimates from the four model types is then determined by calculating the (average) concordance correlation coefficient (CCC) and (average) root mean squared error (RMSE) against the gold standard. The CCC is closely related to the intraclass correlation coefficient and can be used when the units are organized in groups and when it is of interest to measure the agreement between units in the same group (e.g., counties). The RMSE is often used to measure the differences between values predicted by a model or an estimator and the actually observed values. It is a useful measure to capture the precision of the model or estimator.

Results: All model types have substantially higher CCC and lower RMSE than the direct, single-year BRFSS estimates. In addition, the inclusion of relevant domain-specific covariates generally improves predictive validity, especially at small sample sizes, and their leverage can be equivalent to a five- to tenfold increase in sample size.

Conclusions: Small area estimation of important health outcomes and risk factors can be improved using a systematic modeling and validation framework, which consistently outperformed single-year direct survey estimates and demonstrated the potential leverage of including relevant domain-specific covariates compared to pure measurement models. The proposed validation strategy can be applied to other disease outcomes and risk factors in the US as well as to resource-scarce situations, including low-income countries. These estimates are needed by public health officials to identify at-risk groups, to design targeted prevention and intervention programs, and to monitor and evaluate results over time.

No MeSH data available.


Related in: MedlinePlus

County estimates and 95% confidence intervals for estimates of Type 2 diabetes prevalence in 2008 for men aged 30 years and older by county. Note: Intervals are colored according to sample size, with green corresponding to counties with more than 900 observations in 2000-2008, yellow for counties with more than 100 observations, and red for counties with 100 or fewer observations per county-year. The solid black line indicates the national average for men, and the dashed lines represent a standard deviation from the national average for the validation set for men. The correlation is between the estimated diabetes prevalence and the US Department of Agriculture's Urban-Rural Continuum code for 2003 with categorical values ranging from 1 for most urban to 9 for most rural. See http://www.ers.usda.gov/briefing/rurality/ruralurbcon/ for exact definitions.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2958154&req=5

Figure 6: County estimates and 95% confidence intervals for estimates of Type 2 diabetes prevalence in 2008 for men aged 30 years and older by county. Note: Intervals are colored according to sample size, with green corresponding to counties with more than 900 observations in 2000-2008, yellow for counties with more than 100 observations, and red for counties with 100 or fewer observations per county-year. The solid black line indicates the national average for men, and the dashed lines represent a standard deviation from the national average for the validation set for men. The correlation is between the estimated diabetes prevalence and the US Department of Agriculture's Urban-Rural Continuum code for 2003 with categorical values ranging from 1 for most urban to 9 for most rural. See http://www.ers.usda.gov/briefing/rurality/ruralurbcon/ for exact definitions.

Mentions: The pattern is very similar for men, as shown in Figure 6, except for a slightly higher national prevalence of 8.8% and wider confidence intervals due to smaller sample size.


A novel framework for validating and applying standardized small area measurement strategies.

Srebotnjak T, Mokdad AH, Murray CJ - Popul Health Metr (2010)

County estimates and 95% confidence intervals for estimates of Type 2 diabetes prevalence in 2008 for men aged 30 years and older by county. Note: Intervals are colored according to sample size, with green corresponding to counties with more than 900 observations in 2000-2008, yellow for counties with more than 100 observations, and red for counties with 100 or fewer observations per county-year. The solid black line indicates the national average for men, and the dashed lines represent a standard deviation from the national average for the validation set for men. The correlation is between the estimated diabetes prevalence and the US Department of Agriculture's Urban-Rural Continuum code for 2003 with categorical values ranging from 1 for most urban to 9 for most rural. See http://www.ers.usda.gov/briefing/rurality/ruralurbcon/ for exact definitions.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2958154&req=5

Figure 6: County estimates and 95% confidence intervals for estimates of Type 2 diabetes prevalence in 2008 for men aged 30 years and older by county. Note: Intervals are colored according to sample size, with green corresponding to counties with more than 900 observations in 2000-2008, yellow for counties with more than 100 observations, and red for counties with 100 or fewer observations per county-year. The solid black line indicates the national average for men, and the dashed lines represent a standard deviation from the national average for the validation set for men. The correlation is between the estimated diabetes prevalence and the US Department of Agriculture's Urban-Rural Continuum code for 2003 with categorical values ranging from 1 for most urban to 9 for most rural. See http://www.ers.usda.gov/briefing/rurality/ruralurbcon/ for exact definitions.
Mentions: The pattern is very similar for men, as shown in Figure 6, except for a slightly higher national prevalence of 8.8% and wider confidence intervals due to smaller sample size.

Bottom Line: Small area estimation of important health outcomes and risk factors can be improved using a systematic modeling and validation framework, which consistently outperformed single-year direct survey estimates and demonstrated the potential leverage of including relevant domain-specific covariates compared to pure measurement models.The proposed validation strategy can be applied to other disease outcomes and risk factors in the US as well as to resource-scarce situations, including low-income countries.These estimates are needed by public health officials to identify at-risk groups, to design targeted prevention and intervention programs, and to monitor and evaluate results over time.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute for Health Metrics and Evaluation, University of Washington, 2301 5th Ave, Suite 600, Seattle, WA 98121, USA. cjlm@uw.edu.

ABSTRACT

Background: Local measurements of health behaviors, diseases, and use of health services are critical inputs into local, state, and national decision-making. Small area measurement methods can deliver more precise and accurate local-level information than direct estimates from surveys or administrative records, where sample sizes are often too small to yield acceptable standard errors. However, small area measurement requires careful validation using approaches other than conventional statistical methods such as in-sample or cross-validation methods because they do not solve the problem of validating estimates in data-sparse domains.

Methods: A new general framework for small area estimation and validation is developed and applied to estimate Type 2 diabetes prevalence in US counties using data from the Behavioral Risk Factor Surveillance System (BRFSS). The framework combines the three conventional approaches to small area measurement: (1) pooling data across time by combining multiple survey years; (2) exploiting spatial correlation by including a spatial component; and (3) utilizing structured relationships between the outcome variable and domain-specific covariates to define four increasingly complex model types - coined the Naive, Geospatial, Covariate, and Full models. The validation framework uses direct estimates of prevalence in large domains as the gold standard and compares model estimates against it using (i) all available observations for the large domains and (ii) systematically reduced sample sizes obtained through random sampling with replacement. At each sampling level, the model is rerun repeatedly, and the validity of the model estimates from the four model types is then determined by calculating the (average) concordance correlation coefficient (CCC) and (average) root mean squared error (RMSE) against the gold standard. The CCC is closely related to the intraclass correlation coefficient and can be used when the units are organized in groups and when it is of interest to measure the agreement between units in the same group (e.g., counties). The RMSE is often used to measure the differences between values predicted by a model or an estimator and the actually observed values. It is a useful measure to capture the precision of the model or estimator.

Results: All model types have substantially higher CCC and lower RMSE than the direct, single-year BRFSS estimates. In addition, the inclusion of relevant domain-specific covariates generally improves predictive validity, especially at small sample sizes, and their leverage can be equivalent to a five- to tenfold increase in sample size.

Conclusions: Small area estimation of important health outcomes and risk factors can be improved using a systematic modeling and validation framework, which consistently outperformed single-year direct survey estimates and demonstrated the potential leverage of including relevant domain-specific covariates compared to pure measurement models. The proposed validation strategy can be applied to other disease outcomes and risk factors in the US as well as to resource-scarce situations, including low-income countries. These estimates are needed by public health officials to identify at-risk groups, to design targeted prevention and intervention programs, and to monitor and evaluate results over time.

No MeSH data available.


Related in: MedlinePlus