Limits...
Cross-study projections of genomic biomarkers: an evaluation in cancer genomics.

Lucas JE, Carvalho CM, Chen JL, Chi JT, West M - PLoS ONE (2009)

Bottom Line: We address this with a framework and methods to dissect, enhance and extend the in vivo utility of in vitro derived gene expression signatures.These factors retain their relationship to the original, one-dimensional in vitro signature but better describe the diversity of in vivo biology.In a breast cancer analysis, we show that factors can reflect fundamentally different biological processes linked to molecular and clinical features of human cancers, and that in combination they can improve prediction of clinical outcomes.

View Article: PubMed Central - PubMed

Affiliation: Institute for Genome Sciences and Policy, Duke University, Durham, North Carolina, United States of America. joe@stat.duke.edu

ABSTRACT
Human disease studies using DNA microarrays in both clinical/observational and experimental/controlled studies are having increasing impact on our understanding of the complexity of human diseases. A fundamental concept is the use of gene expression as a "common currency" that links the results of in vitro controlled experiments to in vivo observational human studies. Many studies--in cancer and other diseases--have shown promise in using in vitro cell manipulations to improve understanding of in vivo biology, but experiments often simply fail to reflect the enormous phenotypic variation seen in human diseases. We address this with a framework and methods to dissect, enhance and extend the in vivo utility of in vitro derived gene expression signatures. From an experimentally defined gene expression signature we use statistical factor analysis to generate multiple quantitative factors in human cancer gene expression data. These factors retain their relationship to the original, one-dimensional in vitro signature but better describe the diversity of in vivo biology. In a breast cancer analysis, we show that factors can reflect fundamentally different biological processes linked to molecular and clinical features of human cancers, and that in combination they can improve prediction of clinical outcomes.

Show MeSH

Related in: MedlinePlus

Predicting survival and drug response.(a) Predicted survival times from an average of Weibull survival models where used to split the 251 samples from [21] according to above/below median predictions, and the resulting empirical survival curves (Kaplan Meier curves) are shown. The red/blue stratification of patients is from the analysis using subsets of the 67 factors (red - high risk 50%, blue low risk 50%); the grey curves are from the same analysis using all of the original five signatures (thus there is no compensation for over-fitting here). The p-values in each of the plots correspond to stratification by factor analysis (top, black) and stratification using the signatures (bottom, grey). Data from [21] was used to identify the survival models, therefore this plot represents fitted values. The four additional plots represent prediction in the four different breast tumor samples based on the analysis of only the training data. The predictive relevance, and importance, of the factors is evident and consistent across studies, and consistently improves on that achieved by use of signatures alone. (b) The first Lactic Acidosis factor predicts survival in patients who were treated with Tamoxifen (left half), but shows no predictive value in patients who did not receive the drug (right half). In all of these figures, p-values represent significance in a cox proportional hazards model.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2638006&req=5

pone-0004523-g004: Predicting survival and drug response.(a) Predicted survival times from an average of Weibull survival models where used to split the 251 samples from [21] according to above/below median predictions, and the resulting empirical survival curves (Kaplan Meier curves) are shown. The red/blue stratification of patients is from the analysis using subsets of the 67 factors (red - high risk 50%, blue low risk 50%); the grey curves are from the same analysis using all of the original five signatures (thus there is no compensation for over-fitting here). The p-values in each of the plots correspond to stratification by factor analysis (top, black) and stratification using the signatures (bottom, grey). Data from [21] was used to identify the survival models, therefore this plot represents fitted values. The four additional plots represent prediction in the four different breast tumor samples based on the analysis of only the training data. The predictive relevance, and importance, of the factors is evident and consistent across studies, and consistently improves on that achieved by use of signatures alone. (b) The first Lactic Acidosis factor predicts survival in patients who were treated with Tamoxifen (left half), but shows no predictive value in patients who did not receive the drug (right half). In all of these figures, p-values represent significance in a cox proportional hazards model.

Mentions: Subsets of the 67 factors were evaluated in Weibull survival regression models using the SSS method to identify and score models predicting survival. Each model in a resulting set of highly scoring models produces fitted survival curves and also may be used to predict survival for new samples. Bayesian analysis mandates averaging predictions from such a set of models, and this was done to result in Figure 4a. This shows fits of survival curves for the training data set [21], together with out of sample predictions in four of the other data sets for which information regarding survival exists. Recall that these are data sets from quite distinct and diverse studies, so we are assessing a model fitted to one data set on four quite challenging out of sample validation data sets. Though not described further here, the BFRM statistical model analysis used by the SFPA also addresses issues of gene-sample-study specific effects within the analysis and is able to correct enough of the idiosyncracies and bias inherent in microarray assays to retain predictive accuracy [19], [31]. The results demonstrate that the factorprofiles of these in vitro environmental signatures can improve survival prediction significantly across several test data sets. Similar results are obtained for the prediction of metastasis-free survival.


Cross-study projections of genomic biomarkers: an evaluation in cancer genomics.

Lucas JE, Carvalho CM, Chen JL, Chi JT, West M - PLoS ONE (2009)

Predicting survival and drug response.(a) Predicted survival times from an average of Weibull survival models where used to split the 251 samples from [21] according to above/below median predictions, and the resulting empirical survival curves (Kaplan Meier curves) are shown. The red/blue stratification of patients is from the analysis using subsets of the 67 factors (red - high risk 50%, blue low risk 50%); the grey curves are from the same analysis using all of the original five signatures (thus there is no compensation for over-fitting here). The p-values in each of the plots correspond to stratification by factor analysis (top, black) and stratification using the signatures (bottom, grey). Data from [21] was used to identify the survival models, therefore this plot represents fitted values. The four additional plots represent prediction in the four different breast tumor samples based on the analysis of only the training data. The predictive relevance, and importance, of the factors is evident and consistent across studies, and consistently improves on that achieved by use of signatures alone. (b) The first Lactic Acidosis factor predicts survival in patients who were treated with Tamoxifen (left half), but shows no predictive value in patients who did not receive the drug (right half). In all of these figures, p-values represent significance in a cox proportional hazards model.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2638006&req=5

pone-0004523-g004: Predicting survival and drug response.(a) Predicted survival times from an average of Weibull survival models where used to split the 251 samples from [21] according to above/below median predictions, and the resulting empirical survival curves (Kaplan Meier curves) are shown. The red/blue stratification of patients is from the analysis using subsets of the 67 factors (red - high risk 50%, blue low risk 50%); the grey curves are from the same analysis using all of the original five signatures (thus there is no compensation for over-fitting here). The p-values in each of the plots correspond to stratification by factor analysis (top, black) and stratification using the signatures (bottom, grey). Data from [21] was used to identify the survival models, therefore this plot represents fitted values. The four additional plots represent prediction in the four different breast tumor samples based on the analysis of only the training data. The predictive relevance, and importance, of the factors is evident and consistent across studies, and consistently improves on that achieved by use of signatures alone. (b) The first Lactic Acidosis factor predicts survival in patients who were treated with Tamoxifen (left half), but shows no predictive value in patients who did not receive the drug (right half). In all of these figures, p-values represent significance in a cox proportional hazards model.
Mentions: Subsets of the 67 factors were evaluated in Weibull survival regression models using the SSS method to identify and score models predicting survival. Each model in a resulting set of highly scoring models produces fitted survival curves and also may be used to predict survival for new samples. Bayesian analysis mandates averaging predictions from such a set of models, and this was done to result in Figure 4a. This shows fits of survival curves for the training data set [21], together with out of sample predictions in four of the other data sets for which information regarding survival exists. Recall that these are data sets from quite distinct and diverse studies, so we are assessing a model fitted to one data set on four quite challenging out of sample validation data sets. Though not described further here, the BFRM statistical model analysis used by the SFPA also addresses issues of gene-sample-study specific effects within the analysis and is able to correct enough of the idiosyncracies and bias inherent in microarray assays to retain predictive accuracy [19], [31]. The results demonstrate that the factorprofiles of these in vitro environmental signatures can improve survival prediction significantly across several test data sets. Similar results are obtained for the prediction of metastasis-free survival.

Bottom Line: We address this with a framework and methods to dissect, enhance and extend the in vivo utility of in vitro derived gene expression signatures.These factors retain their relationship to the original, one-dimensional in vitro signature but better describe the diversity of in vivo biology.In a breast cancer analysis, we show that factors can reflect fundamentally different biological processes linked to molecular and clinical features of human cancers, and that in combination they can improve prediction of clinical outcomes.

View Article: PubMed Central - PubMed

Affiliation: Institute for Genome Sciences and Policy, Duke University, Durham, North Carolina, United States of America. joe@stat.duke.edu

ABSTRACT
Human disease studies using DNA microarrays in both clinical/observational and experimental/controlled studies are having increasing impact on our understanding of the complexity of human diseases. A fundamental concept is the use of gene expression as a "common currency" that links the results of in vitro controlled experiments to in vivo observational human studies. Many studies--in cancer and other diseases--have shown promise in using in vitro cell manipulations to improve understanding of in vivo biology, but experiments often simply fail to reflect the enormous phenotypic variation seen in human diseases. We address this with a framework and methods to dissect, enhance and extend the in vivo utility of in vitro derived gene expression signatures. From an experimentally defined gene expression signature we use statistical factor analysis to generate multiple quantitative factors in human cancer gene expression data. These factors retain their relationship to the original, one-dimensional in vitro signature but better describe the diversity of in vivo biology. In a breast cancer analysis, we show that factors can reflect fundamentally different biological processes linked to molecular and clinical features of human cancers, and that in combination they can improve prediction of clinical outcomes.

Show MeSH
Related in: MedlinePlus