Limits...
Quantitative assessment of tissue biomarkers and construction of a model to predict outcome in breast cancer using multiple imputation.

Emerson JW, Dolled-Filhart M, Harris L, Rimm DL, Tuck DP - Cancer Inform (2008)

Bottom Line: The use of either lymph node status or the four protein expression levels provides similar improvements in goodness-of-fit, with both significantly better than a baseline clinical model.Using the same multiple imputation strategy, we then validate the results out-of-sample on a larger independent cohort.Our approach of integrating multiple imputation with each stage of the analysis serves as an example that may be replicated or adapted in future studies with missing values.

View Article: PubMed Central - PubMed

Affiliation: Department of Statistics, Yale University, New Haven, Connecticut 06520, USA.

ABSTRACT
Missing data pose one of the greatest challenges in the rigorous evaluation of biomarkers. The limited availability of specimens with complete clinical annotation and quality biomaterial often leads to underpowered studies. Tissue microarray studies, for example, may be further handicapped by the loss of data points because of unevaluable staining, core loss, or the lack of tumor in the histospot. This paper presents a novel approach to these common problems in the context of a tissue protein biomarker analysis in a cohort of patients with breast cancer. Our analysis develops techniques based on multiple imputation to address the missing value problem. We first select markers using a training cohort, identifying a small subset of protein expression levels that are most useful in predicting patient survival. The best model is obtained by including both protein markers (including COX6C, GATA3, NAT1, and ESR1) and lymph node status. The use of either lymph node status or the four protein expression levels provides similar improvements in goodness-of-fit, with both significantly better than a baseline clinical model. Using the same multiple imputation strategy, we then validate the results out-of-sample on a larger independent cohort. Our approach of integrating multiple imputation with each stage of the analysis serves as an example that may be replicated or adapted in future studies with missing values.

No MeSH data available.


Related in: MedlinePlus

Out-of-sample validation.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC2664700&req=5

f2-cin-07-29: Out-of-sample validation.

Mentions: Candidate models were identified and fit using only the training cohort; out-of-sample validation tests the goodness-of-fit and compares the models using the validation cohort. This provides an objective and rigorous means of validating the results of the study. Once again, missing values required specialized statistical analysis. Figure 2 outlines the validation procedure used to compare three models to a baseline clinical model including age at diagnosis, nuclear grade, and tumor size. The first model included positive nodes, the second included the four selected markers, and the full model added both positive nodes and the protein markers to the baseline model.


Quantitative assessment of tissue biomarkers and construction of a model to predict outcome in breast cancer using multiple imputation.

Emerson JW, Dolled-Filhart M, Harris L, Rimm DL, Tuck DP - Cancer Inform (2008)

Out-of-sample validation.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC2664700&req=5

f2-cin-07-29: Out-of-sample validation.
Mentions: Candidate models were identified and fit using only the training cohort; out-of-sample validation tests the goodness-of-fit and compares the models using the validation cohort. This provides an objective and rigorous means of validating the results of the study. Once again, missing values required specialized statistical analysis. Figure 2 outlines the validation procedure used to compare three models to a baseline clinical model including age at diagnosis, nuclear grade, and tumor size. The first model included positive nodes, the second included the four selected markers, and the full model added both positive nodes and the protein markers to the baseline model.

Bottom Line: The use of either lymph node status or the four protein expression levels provides similar improvements in goodness-of-fit, with both significantly better than a baseline clinical model.Using the same multiple imputation strategy, we then validate the results out-of-sample on a larger independent cohort.Our approach of integrating multiple imputation with each stage of the analysis serves as an example that may be replicated or adapted in future studies with missing values.

View Article: PubMed Central - PubMed

Affiliation: Department of Statistics, Yale University, New Haven, Connecticut 06520, USA.

ABSTRACT
Missing data pose one of the greatest challenges in the rigorous evaluation of biomarkers. The limited availability of specimens with complete clinical annotation and quality biomaterial often leads to underpowered studies. Tissue microarray studies, for example, may be further handicapped by the loss of data points because of unevaluable staining, core loss, or the lack of tumor in the histospot. This paper presents a novel approach to these common problems in the context of a tissue protein biomarker analysis in a cohort of patients with breast cancer. Our analysis develops techniques based on multiple imputation to address the missing value problem. We first select markers using a training cohort, identifying a small subset of protein expression levels that are most useful in predicting patient survival. The best model is obtained by including both protein markers (including COX6C, GATA3, NAT1, and ESR1) and lymph node status. The use of either lymph node status or the four protein expression levels provides similar improvements in goodness-of-fit, with both significantly better than a baseline clinical model. Using the same multiple imputation strategy, we then validate the results out-of-sample on a larger independent cohort. Our approach of integrating multiple imputation with each stage of the analysis serves as an example that may be replicated or adapted in future studies with missing values.

No MeSH data available.


Related in: MedlinePlus