Limits...
Iterative Bayesian Model Averaging: a method for the application of survival analysis to high-dimensional microarray data.

Annest A, Bumgarner RE, Raftery AE, Yeung KY - BMC Bioinformatics (2009)

Bottom Line: Moreover, we achieved comparable results using only the 5 top selected genes with 100% posterior probabilities.Once again, we assigned the patients in the validation set to significantly distinct risk groups (p-value = 0.00139).The results from this study demonstrate that our procedure selects a small number of genes while eclipsing other methods in predictive performance, making it a highly accurate and cost-effective prognostic tool in the clinical setting.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Technology/Computing and Software Systems, University of Washington, Tacoma, WA 98402, USA. amanu@u.washington.edu

ABSTRACT

Background: Microarray technology is increasingly used to identify potential biomarkers for cancer prognostics and diagnostics. Previously, we have developed the iterative Bayesian Model Averaging (BMA) algorithm for use in classification. Here, we extend the iterative BMA algorithm for application to survival analysis on high-dimensional microarray data. The main goal in applying survival analysis to microarray data is to determine a highly predictive model of patients' time to event (such as death, relapse, or metastasis) using a small number of selected genes. Our multivariate procedure combines the effectiveness of multiple contending models by calculating the weighted average of their posterior probability distributions. Our results demonstrate that our iterative BMA algorithm for survival analysis achieves high prediction accuracy while consistently selecting a small and cost-effective number of predictor genes.

Results: We applied the iterative BMA algorithm to two cancer datasets: breast cancer and diffuse large B-cell lymphoma (DLBCL) data. On the breast cancer data, the algorithm selected a total of 15 predictor genes across 84 contending models from the training data. The maximum likelihood estimates of the selected genes and the posterior probabilities of the selected models from the training data were used to divide patients in the test (or validation) dataset into high- and low-risk categories. Using the genes and models determined from the training data, we assigned patients from the test data into highly distinct risk groups (as indicated by a p-value of 7.26e-05 from the log-rank test). Moreover, we achieved comparable results using only the 5 top selected genes with 100% posterior probabilities. On the DLBCL data, our iterative BMA procedure selected a total of 25 genes across 3 contending models from the training data. Once again, we assigned the patients in the validation set to significantly distinct risk groups (p-value = 0.00139).

Conclusion: The strength of the iterative BMA algorithm for survival analysis lies in its ability to account for model uncertainty. The results from this study demonstrate that our procedure selects a small number of genes while eclipsing other methods in predictive performance, making it a highly accurate and cost-effective prognostic tool in the clinical setting.

Show MeSH

Related in: MedlinePlus

5-gene Breast cancer data, n = 234: Kaplan-Meier survival analysis curve as a nonparametric estimator of the difference between risk groups. In this analysis, p = 5, nbest = 50, maxNvar = 15, and cutPoint = 60. Validation set risk scores were predicted using 5 top-ranked genes across 2 selected models. Survival time is given in years, p-value = 9.06e-06, and chi-square = 19.699.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2657791&req=5

Figure 4: 5-gene Breast cancer data, n = 234: Kaplan-Meier survival analysis curve as a nonparametric estimator of the difference between risk groups. In this analysis, p = 5, nbest = 50, maxNvar = 15, and cutPoint = 60. Validation set risk scores were predicted using 5 top-ranked genes across 2 selected models. Survival time is given in years, p-value = 9.06e-06, and chi-square = 19.699.

Mentions: In order to provide a more direct performance comparison between the iterative BMA method and these alternative procedures, we made some modifications. First, we applied the previously selected 15 genes and 84 models to the full van de Vijver et al. validation set of 295 samples. Figure 3 displays the resulting Kaplan-Meier survival analysis curve (p-value = 3.382e-10, chi-square = 39.441). Second, we predicted the risk scores for the validation set and calculated the difference between the risk groups using the top 5 genes with posterior probabilities of 100% from Table 3. Figure 4 shows the Kaplan-Meier survival analysis curve using these 5 genes for n = 234 (p-value = 9.063e-06, chi-square = 19.699), and Figure 5 provides the same information for n = 295 (p-value = 1.143e-10, chi-square = 41.559). The exclusion of the bottom-ranked 10 genes did not undermine predictive accuracy; in fact, the results are slightly better than those obtained from using all 15 genes originally selected by the algorithm.


Iterative Bayesian Model Averaging: a method for the application of survival analysis to high-dimensional microarray data.

Annest A, Bumgarner RE, Raftery AE, Yeung KY - BMC Bioinformatics (2009)

5-gene Breast cancer data, n = 234: Kaplan-Meier survival analysis curve as a nonparametric estimator of the difference between risk groups. In this analysis, p = 5, nbest = 50, maxNvar = 15, and cutPoint = 60. Validation set risk scores were predicted using 5 top-ranked genes across 2 selected models. Survival time is given in years, p-value = 9.06e-06, and chi-square = 19.699.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2657791&req=5

Figure 4: 5-gene Breast cancer data, n = 234: Kaplan-Meier survival analysis curve as a nonparametric estimator of the difference between risk groups. In this analysis, p = 5, nbest = 50, maxNvar = 15, and cutPoint = 60. Validation set risk scores were predicted using 5 top-ranked genes across 2 selected models. Survival time is given in years, p-value = 9.06e-06, and chi-square = 19.699.
Mentions: In order to provide a more direct performance comparison between the iterative BMA method and these alternative procedures, we made some modifications. First, we applied the previously selected 15 genes and 84 models to the full van de Vijver et al. validation set of 295 samples. Figure 3 displays the resulting Kaplan-Meier survival analysis curve (p-value = 3.382e-10, chi-square = 39.441). Second, we predicted the risk scores for the validation set and calculated the difference between the risk groups using the top 5 genes with posterior probabilities of 100% from Table 3. Figure 4 shows the Kaplan-Meier survival analysis curve using these 5 genes for n = 234 (p-value = 9.063e-06, chi-square = 19.699), and Figure 5 provides the same information for n = 295 (p-value = 1.143e-10, chi-square = 41.559). The exclusion of the bottom-ranked 10 genes did not undermine predictive accuracy; in fact, the results are slightly better than those obtained from using all 15 genes originally selected by the algorithm.

Bottom Line: Moreover, we achieved comparable results using only the 5 top selected genes with 100% posterior probabilities.Once again, we assigned the patients in the validation set to significantly distinct risk groups (p-value = 0.00139).The results from this study demonstrate that our procedure selects a small number of genes while eclipsing other methods in predictive performance, making it a highly accurate and cost-effective prognostic tool in the clinical setting.

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Technology/Computing and Software Systems, University of Washington, Tacoma, WA 98402, USA. amanu@u.washington.edu

ABSTRACT

Background: Microarray technology is increasingly used to identify potential biomarkers for cancer prognostics and diagnostics. Previously, we have developed the iterative Bayesian Model Averaging (BMA) algorithm for use in classification. Here, we extend the iterative BMA algorithm for application to survival analysis on high-dimensional microarray data. The main goal in applying survival analysis to microarray data is to determine a highly predictive model of patients' time to event (such as death, relapse, or metastasis) using a small number of selected genes. Our multivariate procedure combines the effectiveness of multiple contending models by calculating the weighted average of their posterior probability distributions. Our results demonstrate that our iterative BMA algorithm for survival analysis achieves high prediction accuracy while consistently selecting a small and cost-effective number of predictor genes.

Results: We applied the iterative BMA algorithm to two cancer datasets: breast cancer and diffuse large B-cell lymphoma (DLBCL) data. On the breast cancer data, the algorithm selected a total of 15 predictor genes across 84 contending models from the training data. The maximum likelihood estimates of the selected genes and the posterior probabilities of the selected models from the training data were used to divide patients in the test (or validation) dataset into high- and low-risk categories. Using the genes and models determined from the training data, we assigned patients from the test data into highly distinct risk groups (as indicated by a p-value of 7.26e-05 from the log-rank test). Moreover, we achieved comparable results using only the 5 top selected genes with 100% posterior probabilities. On the DLBCL data, our iterative BMA procedure selected a total of 25 genes across 3 contending models from the training data. Once again, we assigned the patients in the validation set to significantly distinct risk groups (p-value = 0.00139).

Conclusion: The strength of the iterative BMA algorithm for survival analysis lies in its ability to account for model uncertainty. The results from this study demonstrate that our procedure selects a small number of genes while eclipsing other methods in predictive performance, making it a highly accurate and cost-effective prognostic tool in the clinical setting.

Show MeSH
Related in: MedlinePlus