Limits...
Validation of prediction models based on lasso regression with multiply imputed data.

Musoro JZ, Zwinderman AH, Puhan MA, ter Riet G, Geskus RB - BMC Med Res Methodol (2014)

Bottom Line: Incorporating the MI procedure in the validation yields estimates of optimism that are closer to the true value, albeit slightly too larger.Performance of prognostic models constructed using the lasso technique can be optimistic as well.Results of the internal validation are sensitive to how bootstrap resampling is performed.

View Article: PubMed Central - PubMed

Affiliation: Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Academic Medical Center, University of Amsterdam, Meibergdreef 9, 1105 Amsterdam, the Netherlands. z.j.musoro@amc.nl.

ABSTRACT

Background: In prognostic studies, the lasso technique is attractive since it improves the quality of predictions by shrinking regression coefficients, compared to predictions based on a model fitted via unpenalized maximum likelihood. Since some coefficients are set to zero, parsimony is achieved as well. It is unclear whether the performance of a model fitted using the lasso still shows some optimism. Bootstrap methods have been advocated to quantify optimism and generalize model performance to new subjects. It is unclear how resampling should be performed in the presence of multiply imputed data.

Method: The data were based on a cohort of Chronic Obstructive Pulmonary Disease patients. We constructed models to predict Chronic Respiratory Questionnaire dyspnea 6 months ahead. Optimism of the lasso model was investigated by comparing 4 approaches of handling multiply imputed data in the bootstrap procedure, using the study data and simulated data sets. In the first 3 approaches, data sets that had been completed via multiple imputation (MI) were resampled, while the fourth approach resampled the incomplete data set and then performed MI.

Results: The discriminative model performance of the lasso was optimistic. There was suboptimal calibration due to over-shrinkage. The estimate of optimism was sensitive to the choice of handling imputed data in the bootstrap resampling procedure. Resampling the completed data sets underestimates optimism, especially if, within a bootstrap step, selected individuals differ over the imputed data sets. Incorporating the MI procedure in the validation yields estimates of optimism that are closer to the true value, albeit slightly too larger.

Conclusion: Performance of prognostic models constructed using the lasso technique can be optimistic as well. Results of the internal validation are sensitive to how bootstrap resampling is performed.

No MeSH data available.


Related in: MedlinePlus

Performance profile to determine the optimal lasso penalty tuning parameter (on one imputed data set) for a grid of 40 penalty values based on 100 bootstrap samples. The optimal penalty value corresponding to the best model is that which generated the smallest average MSE over the bootstrap samples. A tolerance model can be estimated as that with MSE within 3% of the optimum in the direction of the stronger penalties.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4209042&req=5

Fig3: Performance profile to determine the optimal lasso penalty tuning parameter (on one imputed data set) for a grid of 40 penalty values based on 100 bootstrap samples. The optimal penalty value corresponding to the best model is that which generated the smallest average MSE over the bootstrap samples. A tolerance model can be estimated as that with MSE within 3% of the optimum in the direction of the stronger penalties.

Mentions: In Figure 3, a summary of the parameter tuning procedure, showing the bootstrap performance of all 40 penalty values (on one imputed data set) is given. This illustrates that an optimal λ value was identifiable. The optimal lambda varied between 0.063 and 0.082 over the imputed data sets for the best model, and between 0.064 and 0.166 for the tolerance model. In Table 1 we report averaged coefficients of the best and tolerance models, and the number of times each variable was retained across the imputed data sets. In total, 19 and 10 covariates were retained at least once across the imputed data sets for the best and tolerance model respectively. The estimated optimism, calculated according to the four approaches described above, along with the apparent and optimism-corrected MSE’s are presented in Table 2. The estimate of optimism was sensitive to the choice of handling imputed data in the bootstrap procedure. Estimates from approach 1, 3 and 4 suggested that there was substantial optimism in the apparent performance. Larger values of optimism were observed with approach 4. On the other hand, approach 2 suggested there was very little or no optimism.Figure 3


Validation of prediction models based on lasso regression with multiply imputed data.

Musoro JZ, Zwinderman AH, Puhan MA, ter Riet G, Geskus RB - BMC Med Res Methodol (2014)

Performance profile to determine the optimal lasso penalty tuning parameter (on one imputed data set) for a grid of 40 penalty values based on 100 bootstrap samples. The optimal penalty value corresponding to the best model is that which generated the smallest average MSE over the bootstrap samples. A tolerance model can be estimated as that with MSE within 3% of the optimum in the direction of the stronger penalties.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4209042&req=5

Fig3: Performance profile to determine the optimal lasso penalty tuning parameter (on one imputed data set) for a grid of 40 penalty values based on 100 bootstrap samples. The optimal penalty value corresponding to the best model is that which generated the smallest average MSE over the bootstrap samples. A tolerance model can be estimated as that with MSE within 3% of the optimum in the direction of the stronger penalties.
Mentions: In Figure 3, a summary of the parameter tuning procedure, showing the bootstrap performance of all 40 penalty values (on one imputed data set) is given. This illustrates that an optimal λ value was identifiable. The optimal lambda varied between 0.063 and 0.082 over the imputed data sets for the best model, and between 0.064 and 0.166 for the tolerance model. In Table 1 we report averaged coefficients of the best and tolerance models, and the number of times each variable was retained across the imputed data sets. In total, 19 and 10 covariates were retained at least once across the imputed data sets for the best and tolerance model respectively. The estimated optimism, calculated according to the four approaches described above, along with the apparent and optimism-corrected MSE’s are presented in Table 2. The estimate of optimism was sensitive to the choice of handling imputed data in the bootstrap procedure. Estimates from approach 1, 3 and 4 suggested that there was substantial optimism in the apparent performance. Larger values of optimism were observed with approach 4. On the other hand, approach 2 suggested there was very little or no optimism.Figure 3

Bottom Line: Incorporating the MI procedure in the validation yields estimates of optimism that are closer to the true value, albeit slightly too larger.Performance of prognostic models constructed using the lasso technique can be optimistic as well.Results of the internal validation are sensitive to how bootstrap resampling is performed.

View Article: PubMed Central - PubMed

Affiliation: Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Academic Medical Center, University of Amsterdam, Meibergdreef 9, 1105 Amsterdam, the Netherlands. z.j.musoro@amc.nl.

ABSTRACT

Background: In prognostic studies, the lasso technique is attractive since it improves the quality of predictions by shrinking regression coefficients, compared to predictions based on a model fitted via unpenalized maximum likelihood. Since some coefficients are set to zero, parsimony is achieved as well. It is unclear whether the performance of a model fitted using the lasso still shows some optimism. Bootstrap methods have been advocated to quantify optimism and generalize model performance to new subjects. It is unclear how resampling should be performed in the presence of multiply imputed data.

Method: The data were based on a cohort of Chronic Obstructive Pulmonary Disease patients. We constructed models to predict Chronic Respiratory Questionnaire dyspnea 6 months ahead. Optimism of the lasso model was investigated by comparing 4 approaches of handling multiply imputed data in the bootstrap procedure, using the study data and simulated data sets. In the first 3 approaches, data sets that had been completed via multiple imputation (MI) were resampled, while the fourth approach resampled the incomplete data set and then performed MI.

Results: The discriminative model performance of the lasso was optimistic. There was suboptimal calibration due to over-shrinkage. The estimate of optimism was sensitive to the choice of handling imputed data in the bootstrap resampling procedure. Resampling the completed data sets underestimates optimism, especially if, within a bootstrap step, selected individuals differ over the imputed data sets. Incorporating the MI procedure in the validation yields estimates of optimism that are closer to the true value, albeit slightly too larger.

Conclusion: Performance of prognostic models constructed using the lasso technique can be optimistic as well. Results of the internal validation are sensitive to how bootstrap resampling is performed.

No MeSH data available.


Related in: MedlinePlus