Limits...
Joint network and node selection for pathway-based genomic data analysis.

Zhe S, Naqvi SA, Yang Y, Qi Y - Bioinformatics (2013)

Bottom Line: Simulation results demonstrate improved predictive performance and selection accuracy of our method over alternative methods.In addition to pathway analysis, our method is expected to have a wide range of applications in selecting relevant groups of correlated high-dimensional biomarkers.The code can be downloaded at www.cs.purdue.edu/homes/szhe/software.html. alanqi@purdue.edu.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, Department of Biology, and Department of Statistics, Purdue University, West Lafayette, IN 47907, USA.

ABSTRACT

Motivation: By capturing various biochemical interactions, biological pathways provide insight into underlying biological processes. Given high-dimensional microarray or RNA-sequencing data, a critical challenge is how to integrate them with rich information from pathway databases to jointly select relevant pathways and genes for phenotype prediction or disease prognosis. Addressing this challenge can help us deepen biological understanding of phenotypes and diseases from a systems perspective.

Results: In this article, we propose a novel sparse Bayesian model for joint network and node selection. This model integrates information from networks (e.g. pathways) and nodes (e.g. genes) by a hybrid of conditional and generative components. For the conditional component, we propose a sparse prior based on graph Laplacian matrices, each of which encodes detailed correlation structures between network nodes. For the generative component, we use a spike and slab prior over network nodes. The integration of these two components, coupled with efficient variational inference, enables the selection of networks as well as correlated network nodes in the selected networks. Simulation results demonstrate improved predictive performance and selection accuracy of our method over alternative methods. Based on three expression datasets for cancer study and the KEGG pathway database, we selected relevant genes and pathways, many of which are supported by biological literature. In addition to pathway analysis, our method is expected to have a wide range of applications in selecting relevant groups of correlated high-dimensional biomarkers.

Availability: The code can be downloaded at www.cs.purdue.edu/homes/szhe/software.html.

Contact: alanqi@purdue.edu.

Show MeSH

Related in: MedlinePlus

scores for pathway selection. ‘EXP’ stands for ‘Experiment’ and ‘D’ stands for ‘Data model’
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3722525&req=5

btt335-F5: scores for pathway selection. ‘EXP’ stands for ‘Experiment’ and ‘D’ stands for ‘Data model’

Mentions: All the results are summarized in Figure 2, in which the error bars represent the standard errors. For all the settings, NaNOS gives smaller errors and higher scores for gene selection than the other methods, except that, for classification of the samples from the second data model, NaNOS and group lasso obtain the comparable scores. All the improvements are significant under the two-sample t-test (P < 0.05). We also show the accuracy of group lasso, GSEA and NaNOS for pathway selection in Figure 5. Again, NaNOS achieves significantly higher selection accuracy. Because the LL approach was developed for regression, we did not have its classification results. While the LL approach uses the topological information of all the pathways, they are merged together into a global network for regularization. In contrast, using a sparse prior over individual pathways, NaNOS can explicitly select pathways relevant to the response, guiding the gene selection. This may contribute to its improved performance.Fig. 2.


Joint network and node selection for pathway-based genomic data analysis.

Zhe S, Naqvi SA, Yang Y, Qi Y - Bioinformatics (2013)

scores for pathway selection. ‘EXP’ stands for ‘Experiment’ and ‘D’ stands for ‘Data model’
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3722525&req=5

btt335-F5: scores for pathway selection. ‘EXP’ stands for ‘Experiment’ and ‘D’ stands for ‘Data model’
Mentions: All the results are summarized in Figure 2, in which the error bars represent the standard errors. For all the settings, NaNOS gives smaller errors and higher scores for gene selection than the other methods, except that, for classification of the samples from the second data model, NaNOS and group lasso obtain the comparable scores. All the improvements are significant under the two-sample t-test (P < 0.05). We also show the accuracy of group lasso, GSEA and NaNOS for pathway selection in Figure 5. Again, NaNOS achieves significantly higher selection accuracy. Because the LL approach was developed for regression, we did not have its classification results. While the LL approach uses the topological information of all the pathways, they are merged together into a global network for regularization. In contrast, using a sparse prior over individual pathways, NaNOS can explicitly select pathways relevant to the response, guiding the gene selection. This may contribute to its improved performance.Fig. 2.

Bottom Line: Simulation results demonstrate improved predictive performance and selection accuracy of our method over alternative methods.In addition to pathway analysis, our method is expected to have a wide range of applications in selecting relevant groups of correlated high-dimensional biomarkers.The code can be downloaded at www.cs.purdue.edu/homes/szhe/software.html. alanqi@purdue.edu.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, Department of Biology, and Department of Statistics, Purdue University, West Lafayette, IN 47907, USA.

ABSTRACT

Motivation: By capturing various biochemical interactions, biological pathways provide insight into underlying biological processes. Given high-dimensional microarray or RNA-sequencing data, a critical challenge is how to integrate them with rich information from pathway databases to jointly select relevant pathways and genes for phenotype prediction or disease prognosis. Addressing this challenge can help us deepen biological understanding of phenotypes and diseases from a systems perspective.

Results: In this article, we propose a novel sparse Bayesian model for joint network and node selection. This model integrates information from networks (e.g. pathways) and nodes (e.g. genes) by a hybrid of conditional and generative components. For the conditional component, we propose a sparse prior based on graph Laplacian matrices, each of which encodes detailed correlation structures between network nodes. For the generative component, we use a spike and slab prior over network nodes. The integration of these two components, coupled with efficient variational inference, enables the selection of networks as well as correlated network nodes in the selected networks. Simulation results demonstrate improved predictive performance and selection accuracy of our method over alternative methods. Based on three expression datasets for cancer study and the KEGG pathway database, we selected relevant genes and pathways, many of which are supported by biological literature. In addition to pathway analysis, our method is expected to have a wide range of applications in selecting relevant groups of correlated high-dimensional biomarkers.

Availability: The code can be downloaded at www.cs.purdue.edu/homes/szhe/software.html.

Contact: alanqi@purdue.edu.

Show MeSH
Related in: MedlinePlus