Limits...
The use of bootstrapping when using propensity-score matching without replacement: a simulation study.

Austin PC, Small DS - Stat Med (2014)

Bottom Line: An important issue when using propensity-score matching is how to estimate the standard error of the estimated treatment effect.The second method involved drawing bootstrap samples from the original sample and estimating the propensity score separately in each bootstrap sample and creating a matched sample within each of these bootstrap samples.The former approach was found to result in estimates of the standard error that were closer to the empirical standard deviation of the sampling distribution of estimated effects.

View Article: PubMed Central - PubMed

Affiliation: Institute for Clinical Evaluative Sciences, Toronto, Canada; Institute of Health Management, Policy and Evaluation, University of Toronto, Toronto, Canada; Schulich Heart Research Program, Sunnybrook Research Institute, Toronto, Canada.

Show MeSH

Related in: MedlinePlus

Empirical versus estimated standard errors (estimated propensity score).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4260115&req=5

fig01: Empirical versus estimated standard errors (estimated propensity score).

Mentions: The mean estimated standard errors and the empirical estimates of the standard deviation of the estimated treatment effects are reported in Figure 1. There is one panel for each of the three types of outcomes (continuous, binary, and time-to-event). Each panel consists of a series of dotcharts, with one row for each of the 12 combinations of matching method and prevalence of treatment (3 matching algorithms×4 treatment prevalence). When outcomes were continuous, several observations merit comment. First, all four methods of estimating the standard error of the difference in means resulted in estimates that overestimated the standard deviation of the sampling distribution of the difference in means. Across the 12 combinations of prevalence of treatment (5%, 10%, 20%, and 25%) and the three matching methods, the mean ratio of the estimated standard error to the empirical standard error was 1.08 for the naïve matched estimator, 1.04 for the matched estimator that accounted for the matched nature of the sample, 1.04 for the simple bootstrap estimator of the standard error, and 1.07 for the complex bootstrap estimator of the standard error. Second, the naïve parametric estimator tended to result in estimates that resulted in the greatest overestimation of the variability of the sampling distribution. Third, the matched parametric estimator and the naïve bootstrap estimator tended to results in estimates that most closely reflected the empirical sampling variability of the estimated difference in means. In most settings, these two methods resulted in very similar estimates of standard error. Fourth, the complex bootstrap tended to have inferior performance compared with the simple bootstrap method. Fifth, results for the three different matching algorithms tended to be similar to one another. Sixth, differences between the methods for estimating the variability of the sampling distribution of the difference in means tended to diminish as the prevalence of treatment increased. Seventh, differences between the standard deviation of the empirical sampling distribution of the estimated difference in means and the mean estimated standard error for the four different methods tended to diminish as the prevalence of treatment increased. Thus, the estimates of standard error were most accurate when a higher proportion of subjects were treated. Similar results were observed when outcomes were binary, and the risk difference was used as the measure of treatment effect. When outcomes were binary, across the 12 combinations of prevalence of treatment and the three matching methods, the mean ratio of the estimated standard error to the empirical standard error was 1.04 for the naïve matched estimator, 1.00 for the matched estimator that accounted for the matched nature of the sample, 1.00 for the simple bootstrap estimator of the standard error, and 1.03 for the complex bootstrap estimator of the standard error. With time-to-event outcomes, similar results were observed with one primary exception. With time-to-event outcomes, the estimate of sampling variability obtained from the naïve parametric estimator was substantially larger than that obtained using the other three estimates. When outcomes were time-to-event in nature, across the 12 combinations of prevalence of treatment and the three matching methods, the mean ratio of the estimated standard error to the empirical standard error was 1.30 for the naïve matched estimator, 1.09 for the matched estimator that accounted for the matched nature of the sample, 1.09 for the simple bootstrap estimator of the standard error, and 1.10 for the complex bootstrap estimator of the standard error.


The use of bootstrapping when using propensity-score matching without replacement: a simulation study.

Austin PC, Small DS - Stat Med (2014)

Empirical versus estimated standard errors (estimated propensity score).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4260115&req=5

fig01: Empirical versus estimated standard errors (estimated propensity score).
Mentions: The mean estimated standard errors and the empirical estimates of the standard deviation of the estimated treatment effects are reported in Figure 1. There is one panel for each of the three types of outcomes (continuous, binary, and time-to-event). Each panel consists of a series of dotcharts, with one row for each of the 12 combinations of matching method and prevalence of treatment (3 matching algorithms×4 treatment prevalence). When outcomes were continuous, several observations merit comment. First, all four methods of estimating the standard error of the difference in means resulted in estimates that overestimated the standard deviation of the sampling distribution of the difference in means. Across the 12 combinations of prevalence of treatment (5%, 10%, 20%, and 25%) and the three matching methods, the mean ratio of the estimated standard error to the empirical standard error was 1.08 for the naïve matched estimator, 1.04 for the matched estimator that accounted for the matched nature of the sample, 1.04 for the simple bootstrap estimator of the standard error, and 1.07 for the complex bootstrap estimator of the standard error. Second, the naïve parametric estimator tended to result in estimates that resulted in the greatest overestimation of the variability of the sampling distribution. Third, the matched parametric estimator and the naïve bootstrap estimator tended to results in estimates that most closely reflected the empirical sampling variability of the estimated difference in means. In most settings, these two methods resulted in very similar estimates of standard error. Fourth, the complex bootstrap tended to have inferior performance compared with the simple bootstrap method. Fifth, results for the three different matching algorithms tended to be similar to one another. Sixth, differences between the methods for estimating the variability of the sampling distribution of the difference in means tended to diminish as the prevalence of treatment increased. Seventh, differences between the standard deviation of the empirical sampling distribution of the estimated difference in means and the mean estimated standard error for the four different methods tended to diminish as the prevalence of treatment increased. Thus, the estimates of standard error were most accurate when a higher proportion of subjects were treated. Similar results were observed when outcomes were binary, and the risk difference was used as the measure of treatment effect. When outcomes were binary, across the 12 combinations of prevalence of treatment and the three matching methods, the mean ratio of the estimated standard error to the empirical standard error was 1.04 for the naïve matched estimator, 1.00 for the matched estimator that accounted for the matched nature of the sample, 1.00 for the simple bootstrap estimator of the standard error, and 1.03 for the complex bootstrap estimator of the standard error. With time-to-event outcomes, similar results were observed with one primary exception. With time-to-event outcomes, the estimate of sampling variability obtained from the naïve parametric estimator was substantially larger than that obtained using the other three estimates. When outcomes were time-to-event in nature, across the 12 combinations of prevalence of treatment and the three matching methods, the mean ratio of the estimated standard error to the empirical standard error was 1.30 for the naïve matched estimator, 1.09 for the matched estimator that accounted for the matched nature of the sample, 1.09 for the simple bootstrap estimator of the standard error, and 1.10 for the complex bootstrap estimator of the standard error.

Bottom Line: An important issue when using propensity-score matching is how to estimate the standard error of the estimated treatment effect.The second method involved drawing bootstrap samples from the original sample and estimating the propensity score separately in each bootstrap sample and creating a matched sample within each of these bootstrap samples.The former approach was found to result in estimates of the standard error that were closer to the empirical standard deviation of the sampling distribution of estimated effects.

View Article: PubMed Central - PubMed

Affiliation: Institute for Clinical Evaluative Sciences, Toronto, Canada; Institute of Health Management, Policy and Evaluation, University of Toronto, Toronto, Canada; Schulich Heart Research Program, Sunnybrook Research Institute, Toronto, Canada.

Show MeSH
Related in: MedlinePlus