Limits...
Network-constrained group lasso for high-dimensional multinomial classification with application to cancer subtype prediction.

Tian X, Wang X, Chen J - Cancer Inform (2015)

Bottom Line: Efficient use of the network information is important to improve classification performance as well as the biological interpretability.The proposed model was compared to models with no prior structure information in both simulations and a problem of cancer subtype prediction with real TCGA (the cancer genome atlas) gene expression data.The network-constrained mode outperformed the traditional ones in both cases.

View Article: PubMed Central - PubMed

Affiliation: Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY, USA.

ABSTRACT
Classic multinomial logit model, commonly used in multiclass regression problem, is restricted to few predictors and does not take into account the relationship among variables. It has limited use for genomic data, where the number of genomic features far exceeds the sample size. Genomic features such as gene expressions are usually related by an underlying biological network. Efficient use of the network information is important to improve classification performance as well as the biological interpretability. We proposed a multinomial logit model that is capable of addressing both the high dimensionality of predictors and the underlying network information. Group lasso was used to induce model sparsity, and a network-constraint was imposed to induce the smoothness of the coefficients with respect to the underlying network structure. To deal with the non-smoothness of the objective function in optimization, we developed a proximal gradient algorithm for efficient computation. The proposed model was compared to models with no prior structure information in both simulations and a problem of cancer subtype prediction with real TCGA (the cancer genome atlas) gene expression data. The network-constrained mode outperformed the traditional ones in both cases.

No MeSH data available.


Related in: MedlinePlus

MSE of parameter estimation under ideal structure information for small and large models with ideal, similar, and random coefficients.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4295837&req=5

f1-cin-suppl.6-2014-025: MSE of parameter estimation under ideal structure information for small and large models with ideal, similar, and random coefficients.

Mentions: We first simulate ideal network structure; that is, all the relevant variables come from a fully connected subnetwork. Figure 1 shows the estimation performance of various models. As expected, the structure information improves estimation significantly, especially for large models, which is particularly relevant for real applications. The estimation of the adaptive method (NGL-MLMa) outperforms others substantially. In case of random coefficients, where prior network does not provide any useful information, the proposed model is comparable to models without using the network information (L-MLM, GL-MLM), and sometimes even better. Figure 2 shows that the prediction accuracy is also higher for the proposed model in almost all scenarios. When Brier score is used (Fig. 3), a similar trend follows: the network-constrained model always performs better when we simulate ideal and similar coefficients, and is comparable to traditional models without using structure information in case of random coefficients.


Network-constrained group lasso for high-dimensional multinomial classification with application to cancer subtype prediction.

Tian X, Wang X, Chen J - Cancer Inform (2015)

MSE of parameter estimation under ideal structure information for small and large models with ideal, similar, and random coefficients.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4295837&req=5

f1-cin-suppl.6-2014-025: MSE of parameter estimation under ideal structure information for small and large models with ideal, similar, and random coefficients.
Mentions: We first simulate ideal network structure; that is, all the relevant variables come from a fully connected subnetwork. Figure 1 shows the estimation performance of various models. As expected, the structure information improves estimation significantly, especially for large models, which is particularly relevant for real applications. The estimation of the adaptive method (NGL-MLMa) outperforms others substantially. In case of random coefficients, where prior network does not provide any useful information, the proposed model is comparable to models without using the network information (L-MLM, GL-MLM), and sometimes even better. Figure 2 shows that the prediction accuracy is also higher for the proposed model in almost all scenarios. When Brier score is used (Fig. 3), a similar trend follows: the network-constrained model always performs better when we simulate ideal and similar coefficients, and is comparable to traditional models without using structure information in case of random coefficients.

Bottom Line: Efficient use of the network information is important to improve classification performance as well as the biological interpretability.The proposed model was compared to models with no prior structure information in both simulations and a problem of cancer subtype prediction with real TCGA (the cancer genome atlas) gene expression data.The network-constrained mode outperformed the traditional ones in both cases.

View Article: PubMed Central - PubMed

Affiliation: Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY, USA.

ABSTRACT
Classic multinomial logit model, commonly used in multiclass regression problem, is restricted to few predictors and does not take into account the relationship among variables. It has limited use for genomic data, where the number of genomic features far exceeds the sample size. Genomic features such as gene expressions are usually related by an underlying biological network. Efficient use of the network information is important to improve classification performance as well as the biological interpretability. We proposed a multinomial logit model that is capable of addressing both the high dimensionality of predictors and the underlying network information. Group lasso was used to induce model sparsity, and a network-constraint was imposed to induce the smoothness of the coefficients with respect to the underlying network structure. To deal with the non-smoothness of the objective function in optimization, we developed a proximal gradient algorithm for efficient computation. The proposed model was compared to models with no prior structure information in both simulations and a problem of cancer subtype prediction with real TCGA (the cancer genome atlas) gene expression data. The network-constrained mode outperformed the traditional ones in both cases.

No MeSH data available.


Related in: MedlinePlus