Limits...
A Bayesian approach for inducing sparsity in generalized linear models with multi-category response.

Madahian B, Roy S, Bowman D, Deng LY, Homayouni R - BMC Bioinformatics (2015)

Bottom Line: Several approaches exist to reduce the number of variables with respect to small sample sizes.Importantly, using Geneset Cohesion Analysis Tool, we found that the top 100 genes produced by SBGG had an average functional cohesion p-value of 2.0E-4 compared to 0.007 to 0.131 produced by the other methods.Using GDP in a Bayesian GLM model applied to cancer progression data results in better subclass prediction.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: The dimension and complexity of high-throughput gene expression data create many challenges for downstream analysis. Several approaches exist to reduce the number of variables with respect to small sample sizes. In this study, we utilized the Generalized Double Pareto (GDP) prior to induce sparsity in a Bayesian Generalized Linear Model (GLM) setting. The approach was evaluated using a publicly available microarray dataset containing 99 samples corresponding to four different prostate cancer subtypes.

Results: A hierarchical Sparse Bayesian GLM using GDP prior (SBGG) was developed to take into account the progressive nature of the response variable. We obtained an average overall classification accuracy between 82.5% and 94%, which was higher than Support Vector Machine, Random Forest or a Sparse Bayesian GLM using double exponential priors. Additionally, SBGG outperforms the other 3 methods in correctly identifying pre-metastatic stages of cancer progression, which can prove extremely valuable for therapeutic and diagnostic purposes. Importantly, using Geneset Cohesion Analysis Tool, we found that the top 100 genes produced by SBGG had an average functional cohesion p-value of 2.0E-4 compared to 0.007 to 0.131 produced by the other methods.

Conclusions: Using GDP in a Bayesian GLM model applied to cancer progression data results in better subclass prediction. In particular, the method identifies pre-metastatic stages of prostate cancer with substantially better accuracy and produces more functionally relevant gene sets.

No MeSH data available.


Related in: MedlinePlus

Flow chart of Gibbs sampling procedure for SBGG. Here j = 1, 2,..., p and r = 1, 2,..., n and s = 2, 3, .. , k where n is the number of samples, p is the number of covariates in the model, and k is the number of categories of response variable.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4597416&req=5

Figure 1: Flow chart of Gibbs sampling procedure for SBGG. Here j = 1, 2,..., p and r = 1, 2,..., n and s = 2, 3, .. , k where n is the number of samples, p is the number of covariates in the model, and k is the number of categories of response variable.

Mentions: Defining the parameters as above, the hierarchical representation of the model is as follows. and we put non-informative uniform prior on . Using the above mixture representation for the parameters and defining the prior distributions, we obtain following conditional posteriors that lead to a straightforward Gibbs sampling algorithm as outlined in Figure 1.


A Bayesian approach for inducing sparsity in generalized linear models with multi-category response.

Madahian B, Roy S, Bowman D, Deng LY, Homayouni R - BMC Bioinformatics (2015)

Flow chart of Gibbs sampling procedure for SBGG. Here j = 1, 2,..., p and r = 1, 2,..., n and s = 2, 3, .. , k where n is the number of samples, p is the number of covariates in the model, and k is the number of categories of response variable.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4597416&req=5

Figure 1: Flow chart of Gibbs sampling procedure for SBGG. Here j = 1, 2,..., p and r = 1, 2,..., n and s = 2, 3, .. , k where n is the number of samples, p is the number of covariates in the model, and k is the number of categories of response variable.
Mentions: Defining the parameters as above, the hierarchical representation of the model is as follows. and we put non-informative uniform prior on . Using the above mixture representation for the parameters and defining the prior distributions, we obtain following conditional posteriors that lead to a straightforward Gibbs sampling algorithm as outlined in Figure 1.

Bottom Line: Several approaches exist to reduce the number of variables with respect to small sample sizes.Importantly, using Geneset Cohesion Analysis Tool, we found that the top 100 genes produced by SBGG had an average functional cohesion p-value of 2.0E-4 compared to 0.007 to 0.131 produced by the other methods.Using GDP in a Bayesian GLM model applied to cancer progression data results in better subclass prediction.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: The dimension and complexity of high-throughput gene expression data create many challenges for downstream analysis. Several approaches exist to reduce the number of variables with respect to small sample sizes. In this study, we utilized the Generalized Double Pareto (GDP) prior to induce sparsity in a Bayesian Generalized Linear Model (GLM) setting. The approach was evaluated using a publicly available microarray dataset containing 99 samples corresponding to four different prostate cancer subtypes.

Results: A hierarchical Sparse Bayesian GLM using GDP prior (SBGG) was developed to take into account the progressive nature of the response variable. We obtained an average overall classification accuracy between 82.5% and 94%, which was higher than Support Vector Machine, Random Forest or a Sparse Bayesian GLM using double exponential priors. Additionally, SBGG outperforms the other 3 methods in correctly identifying pre-metastatic stages of cancer progression, which can prove extremely valuable for therapeutic and diagnostic purposes. Importantly, using Geneset Cohesion Analysis Tool, we found that the top 100 genes produced by SBGG had an average functional cohesion p-value of 2.0E-4 compared to 0.007 to 0.131 produced by the other methods.

Conclusions: Using GDP in a Bayesian GLM model applied to cancer progression data results in better subclass prediction. In particular, the method identifies pre-metastatic stages of prostate cancer with substantially better accuracy and produces more functionally relevant gene sets.

No MeSH data available.


Related in: MedlinePlus