Limits...
Prior knowledge driven Granger causality analysis on gene regulatory network discovery.

Yao S, Yoo S, Yu D - BMC Bioinformatics (2015)

Bottom Line: The widely applied pairwise GC model (PGC) and other regularization strategies can lead to a significant number of false identifications when n>T.In our simulation experiments, the propose new methodology CGC-2SPR showed significant performance improvement in terms of accuracy over other widely used GC modeling (PGC, Ridge and Lasso) and MI-based (MRNET and ARACNE) methods.In our research, we noticed a " 1+1>2" effect when we combined prior knowledge and gene expression data to discover regulatory networks.

View Article: PubMed Central - PubMed

Affiliation: Department of Biochemistry and Cell Biology, Stony Brook University, Stony Brook, 11790, NY, USA. yaoshun88@gmail.com.

ABSTRACT

Background: Our study focuses on discovering gene regulatory networks from time series gene expression data using the Granger causality (GC) model. However, the number of available time points (T) usually is much smaller than the number of target genes (n) in biological datasets. The widely applied pairwise GC model (PGC) and other regularization strategies can lead to a significant number of false identifications when n>T.

Results: In this study, we proposed a new method, viz., CGC-2SPR (CGC using two-step prior Ridge regularization) to resolve the problem by incorporating prior biological knowledge about a target gene data set. In our simulation experiments, the propose new methodology CGC-2SPR showed significant performance improvement in terms of accuracy over other widely used GC modeling (PGC, Ridge and Lasso) and MI-based (MRNET and ARACNE) methods. In addition, we applied CGC-2SPR to a real biological dataset, i.e., the yeast metabolic cycle, and discovered more true positive edges with CGC-2SPR than with the other existing methods.

Conclusions: In our research, we noticed a " 1+1>2" effect when we combined prior knowledge and gene expression data to discover regulatory networks. Based on causality networks, we made a functional prediction that the Abm1 gene (its functions previously were unknown) might be related to the yeast's responses to different levels of glucose. Our research improves causality modeling by combining heterogeneous knowledge, which is well aligned with the future direction in system biology. Furthermore, we proposed a method of Monte Carlo significance estimation (MCSE) to calculate the edge significances which provide statistical meanings to the discovered causality networks. All of our data and source codes will be available under the link https://bitbucket.org/dtyu/granger-causality/wiki/Home.

No MeSH data available.


The prior knowledge structure used by CGC-2SPR on the simulation dataset. It consists of the 13-variable clique motif repeatedly in the basic module and could provide the group information for analyzing gene expression data
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4551367&req=5

Fig6: The prior knowledge structure used by CGC-2SPR on the simulation dataset. It consists of the 13-variable clique motif repeatedly in the basic module and could provide the group information for analyzing gene expression data

Mentions: The prior knowledge graph needs to be carefully selected to confer group information to the expression data analysis. In this study, the clique graph structure in each subgroup is used as prior knowledge to represent group information. Essentially, it is a bidirectional clique graph under each 1 →3→9 regulatory unit. The basic structure of the prior knowledge graph is shown in Fig. 6. We note that the prior knowledge graph did not include random cross-module links that were added into the ground truth regulatory network. Due to the aforementioned filtering process, some of the clique regulatory relationships are not included in this prior knowledge graph.Fig. 6


Prior knowledge driven Granger causality analysis on gene regulatory network discovery.

Yao S, Yoo S, Yu D - BMC Bioinformatics (2015)

The prior knowledge structure used by CGC-2SPR on the simulation dataset. It consists of the 13-variable clique motif repeatedly in the basic module and could provide the group information for analyzing gene expression data
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4551367&req=5

Fig6: The prior knowledge structure used by CGC-2SPR on the simulation dataset. It consists of the 13-variable clique motif repeatedly in the basic module and could provide the group information for analyzing gene expression data
Mentions: The prior knowledge graph needs to be carefully selected to confer group information to the expression data analysis. In this study, the clique graph structure in each subgroup is used as prior knowledge to represent group information. Essentially, it is a bidirectional clique graph under each 1 →3→9 regulatory unit. The basic structure of the prior knowledge graph is shown in Fig. 6. We note that the prior knowledge graph did not include random cross-module links that were added into the ground truth regulatory network. Due to the aforementioned filtering process, some of the clique regulatory relationships are not included in this prior knowledge graph.Fig. 6

Bottom Line: The widely applied pairwise GC model (PGC) and other regularization strategies can lead to a significant number of false identifications when n>T.In our simulation experiments, the propose new methodology CGC-2SPR showed significant performance improvement in terms of accuracy over other widely used GC modeling (PGC, Ridge and Lasso) and MI-based (MRNET and ARACNE) methods.In our research, we noticed a " 1+1>2" effect when we combined prior knowledge and gene expression data to discover regulatory networks.

View Article: PubMed Central - PubMed

Affiliation: Department of Biochemistry and Cell Biology, Stony Brook University, Stony Brook, 11790, NY, USA. yaoshun88@gmail.com.

ABSTRACT

Background: Our study focuses on discovering gene regulatory networks from time series gene expression data using the Granger causality (GC) model. However, the number of available time points (T) usually is much smaller than the number of target genes (n) in biological datasets. The widely applied pairwise GC model (PGC) and other regularization strategies can lead to a significant number of false identifications when n>T.

Results: In this study, we proposed a new method, viz., CGC-2SPR (CGC using two-step prior Ridge regularization) to resolve the problem by incorporating prior biological knowledge about a target gene data set. In our simulation experiments, the propose new methodology CGC-2SPR showed significant performance improvement in terms of accuracy over other widely used GC modeling (PGC, Ridge and Lasso) and MI-based (MRNET and ARACNE) methods. In addition, we applied CGC-2SPR to a real biological dataset, i.e., the yeast metabolic cycle, and discovered more true positive edges with CGC-2SPR than with the other existing methods.

Conclusions: In our research, we noticed a " 1+1>2" effect when we combined prior knowledge and gene expression data to discover regulatory networks. Based on causality networks, we made a functional prediction that the Abm1 gene (its functions previously were unknown) might be related to the yeast's responses to different levels of glucose. Our research improves causality modeling by combining heterogeneous knowledge, which is well aligned with the future direction in system biology. Furthermore, we proposed a method of Monte Carlo significance estimation (MCSE) to calculate the edge significances which provide statistical meanings to the discovered causality networks. All of our data and source codes will be available under the link https://bitbucket.org/dtyu/granger-causality/wiki/Home.

No MeSH data available.