Limits...
Random-effects, fixed-effects and the within-between specification for clustered data in observational health studies: a simulation study.

Dieleman JL, Templin T - PLoS ONE (2014)

Bottom Line: Estimator preference is determined by lowest mean squared error of the estimated marginal effect and root mean squared error of fitted values.The WB approach has been underutilized, particularly for inference on marginal effects in small samples.Blindly applying any estimator can lead to bias, inefficiency, and flawed inference.

View Article: PubMed Central - PubMed

Affiliation: Institute for Health Metrics and Evaluation, University of Washington, Seattle, Washington, United States of America.

ABSTRACT

Background: When unaccounted-for group-level characteristics affect an outcome variable, traditional linear regression is inefficient and can be biased. The random- and fixed-effects estimators (RE and FE, respectively) are two competing methods that address these problems. While each estimator controls for otherwise unaccounted-for effects, the two estimators require different assumptions. Health researchers tend to favor RE estimation, while researchers from some other disciplines tend to favor FE estimation. In addition to RE and FE, an alternative method called within-between (WB) was suggested by Mundlak in 1978, although is utilized infrequently.

Methods: We conduct a simulation study to compare RE, FE, and WB estimation across 16,200 scenarios. The scenarios vary in the number of groups, the size of the groups, within-group variation, goodness-of-fit of the model, and the degree to which the model is correctly specified. Estimator preference is determined by lowest mean squared error of the estimated marginal effect and root mean squared error of fitted values.

Results: Although there are scenarios when each estimator is most appropriate, the cases in which traditional RE estimation is preferred are less common. In finite samples, the WB approach outperforms both traditional estimators. The Hausman test guides the practitioner to the estimator with the smallest absolute error only 61% of the time, and in many sample sizes simply applying the WB approach produces smaller absolute errors than following the suggestion of the test.

Conclusions: Specification and estimation should be carefully considered and ultimately guided by the objective of the analysis and characteristics of the data. The WB approach has been underutilized, particularly for inference on marginal effects in small samples. Blindly applying any estimator can lead to bias, inefficiency, and flawed inference.

Show MeSH

Related in: MedlinePlus

Significant within-group variation relative to between-group variation.Row 1 (interpreted like Figure 3) shows the distribution of the errors in marginal effects estimates from the RE estimation (red), FE estimation (blue), and WB approach (green). Row 2 (interpreted like Figure 4) shows MSE associated with the RE estimation (red), FE estimation (blue), and WB approach (green) errors. Row 3 (interpreted like Figure 6) shows the distribution of the RMSE from the fitted values estimated using RE estimation (red), FE estimation (blue), and WB approach (green). The within-group variation is set to 0.75, while the between-group variation is 0.25. All other simulation input parameters are set to baseline.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4208783&req=5

pone-0110257-g008: Significant within-group variation relative to between-group variation.Row 1 (interpreted like Figure 3) shows the distribution of the errors in marginal effects estimates from the RE estimation (red), FE estimation (blue), and WB approach (green). Row 2 (interpreted like Figure 4) shows MSE associated with the RE estimation (red), FE estimation (blue), and WB approach (green) errors. Row 3 (interpreted like Figure 6) shows the distribution of the RMSE from the fitted values estimated using RE estimation (red), FE estimation (blue), and WB approach (green). The within-group variation is set to 0.75, while the between-group variation is 0.25. All other simulation input parameters are set to baseline.

Mentions: Figure 8 illustrates the baseline setup adjusted so that 75% of the variation of the explanatory variable is found within groups, while only 25% of the variation is between groups. This type of data is characteristic of health outcomes grouped by facility, national mortality rates grouped by region, or HIV prevalence over time. In these settings there is a great deal of heterogeneity within the group. Figure 8 shows that except for the smallest samples with ρ at or very near 0, FE estimation and the WB approach preform as well as or better than RE estimation.


Random-effects, fixed-effects and the within-between specification for clustered data in observational health studies: a simulation study.

Dieleman JL, Templin T - PLoS ONE (2014)

Significant within-group variation relative to between-group variation.Row 1 (interpreted like Figure 3) shows the distribution of the errors in marginal effects estimates from the RE estimation (red), FE estimation (blue), and WB approach (green). Row 2 (interpreted like Figure 4) shows MSE associated with the RE estimation (red), FE estimation (blue), and WB approach (green) errors. Row 3 (interpreted like Figure 6) shows the distribution of the RMSE from the fitted values estimated using RE estimation (red), FE estimation (blue), and WB approach (green). The within-group variation is set to 0.75, while the between-group variation is 0.25. All other simulation input parameters are set to baseline.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4208783&req=5

pone-0110257-g008: Significant within-group variation relative to between-group variation.Row 1 (interpreted like Figure 3) shows the distribution of the errors in marginal effects estimates from the RE estimation (red), FE estimation (blue), and WB approach (green). Row 2 (interpreted like Figure 4) shows MSE associated with the RE estimation (red), FE estimation (blue), and WB approach (green) errors. Row 3 (interpreted like Figure 6) shows the distribution of the RMSE from the fitted values estimated using RE estimation (red), FE estimation (blue), and WB approach (green). The within-group variation is set to 0.75, while the between-group variation is 0.25. All other simulation input parameters are set to baseline.
Mentions: Figure 8 illustrates the baseline setup adjusted so that 75% of the variation of the explanatory variable is found within groups, while only 25% of the variation is between groups. This type of data is characteristic of health outcomes grouped by facility, national mortality rates grouped by region, or HIV prevalence over time. In these settings there is a great deal of heterogeneity within the group. Figure 8 shows that except for the smallest samples with ρ at or very near 0, FE estimation and the WB approach preform as well as or better than RE estimation.

Bottom Line: Estimator preference is determined by lowest mean squared error of the estimated marginal effect and root mean squared error of fitted values.The WB approach has been underutilized, particularly for inference on marginal effects in small samples.Blindly applying any estimator can lead to bias, inefficiency, and flawed inference.

View Article: PubMed Central - PubMed

Affiliation: Institute for Health Metrics and Evaluation, University of Washington, Seattle, Washington, United States of America.

ABSTRACT

Background: When unaccounted-for group-level characteristics affect an outcome variable, traditional linear regression is inefficient and can be biased. The random- and fixed-effects estimators (RE and FE, respectively) are two competing methods that address these problems. While each estimator controls for otherwise unaccounted-for effects, the two estimators require different assumptions. Health researchers tend to favor RE estimation, while researchers from some other disciplines tend to favor FE estimation. In addition to RE and FE, an alternative method called within-between (WB) was suggested by Mundlak in 1978, although is utilized infrequently.

Methods: We conduct a simulation study to compare RE, FE, and WB estimation across 16,200 scenarios. The scenarios vary in the number of groups, the size of the groups, within-group variation, goodness-of-fit of the model, and the degree to which the model is correctly specified. Estimator preference is determined by lowest mean squared error of the estimated marginal effect and root mean squared error of fitted values.

Results: Although there are scenarios when each estimator is most appropriate, the cases in which traditional RE estimation is preferred are less common. In finite samples, the WB approach outperforms both traditional estimators. The Hausman test guides the practitioner to the estimator with the smallest absolute error only 61% of the time, and in many sample sizes simply applying the WB approach produces smaller absolute errors than following the suggestion of the test.

Conclusions: Specification and estimation should be carefully considered and ultimately guided by the objective of the analysis and characteristics of the data. The WB approach has been underutilized, particularly for inference on marginal effects in small samples. Blindly applying any estimator can lead to bias, inefficiency, and flawed inference.

Show MeSH
Related in: MedlinePlus