Limits...
Prior knowledge driven Granger causality analysis on gene regulatory network discovery.

Yao S, Yoo S, Yu D - BMC Bioinformatics (2015)

Bottom Line: The widely applied pairwise GC model (PGC) and other regularization strategies can lead to a significant number of false identifications when n>T.In our simulation experiments, the propose new methodology CGC-2SPR showed significant performance improvement in terms of accuracy over other widely used GC modeling (PGC, Ridge and Lasso) and MI-based (MRNET and ARACNE) methods.In our research, we noticed a " 1+1>2" effect when we combined prior knowledge and gene expression data to discover regulatory networks.

View Article: PubMed Central - PubMed

Affiliation: Department of Biochemistry and Cell Biology, Stony Brook University, Stony Brook, 11790, NY, USA. yaoshun88@gmail.com.

ABSTRACT

Background: Our study focuses on discovering gene regulatory networks from time series gene expression data using the Granger causality (GC) model. However, the number of available time points (T) usually is much smaller than the number of target genes (n) in biological datasets. The widely applied pairwise GC model (PGC) and other regularization strategies can lead to a significant number of false identifications when n>T.

Results: In this study, we proposed a new method, viz., CGC-2SPR (CGC using two-step prior Ridge regularization) to resolve the problem by incorporating prior biological knowledge about a target gene data set. In our simulation experiments, the propose new methodology CGC-2SPR showed significant performance improvement in terms of accuracy over other widely used GC modeling (PGC, Ridge and Lasso) and MI-based (MRNET and ARACNE) methods. In addition, we applied CGC-2SPR to a real biological dataset, i.e., the yeast metabolic cycle, and discovered more true positive edges with CGC-2SPR than with the other existing methods.

Conclusions: In our research, we noticed a " 1+1>2" effect when we combined prior knowledge and gene expression data to discover regulatory networks. Based on causality networks, we made a functional prediction that the Abm1 gene (its functions previously were unknown) might be related to the yeast's responses to different levels of glucose. Our research improves causality modeling by combining heterogeneous knowledge, which is well aligned with the future direction in system biology. Furthermore, we proposed a method of Monte Carlo significance estimation (MCSE) to calculate the edge significances which provide statistical meanings to the discovered causality networks. All of our data and source codes will be available under the link https://bitbucket.org/dtyu/granger-causality/wiki/Home.

No MeSH data available.


One of the discovered causality networks using CGC-2SPR. The edge significance values were estimated by our MCSE algorithm
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4551367&req=5

Fig11: One of the discovered causality networks using CGC-2SPR. The edge significance values were estimated by our MCSE algorithm

Mentions: We plotted one causality network result using Cytoscape [48], as shown in Fig. 11. The known functional annotations of these genes are taken from the Saccharomyces genome database [49]. MIG2 is a known Zinc finger transcription repressor, working in the glucose-induced repression of many genes. ABM1 is a protein with unknown function, but is required for normal microtubule organization. HXT8 is also a protein with unknown function, and its expression is affected by the level of glucose. ECM22 is the sterol regulatory element binding protein which regulates the transcription of sterol biosynthetic genes. When glucose is at a high level, ECM22 activates the sterol biosynthetic process that consumes glucose. HO, RDS1 and MCH2 are the downstream effector proteins that control different aspects of cell activities. In other words, the causality network shown here is involved in responding to different levels of glucose in yeast. Based on this information, we could infer that Abm1 might be a gene that responds to different glucose levels.Fig. 11


Prior knowledge driven Granger causality analysis on gene regulatory network discovery.

Yao S, Yoo S, Yu D - BMC Bioinformatics (2015)

One of the discovered causality networks using CGC-2SPR. The edge significance values were estimated by our MCSE algorithm
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4551367&req=5

Fig11: One of the discovered causality networks using CGC-2SPR. The edge significance values were estimated by our MCSE algorithm
Mentions: We plotted one causality network result using Cytoscape [48], as shown in Fig. 11. The known functional annotations of these genes are taken from the Saccharomyces genome database [49]. MIG2 is a known Zinc finger transcription repressor, working in the glucose-induced repression of many genes. ABM1 is a protein with unknown function, but is required for normal microtubule organization. HXT8 is also a protein with unknown function, and its expression is affected by the level of glucose. ECM22 is the sterol regulatory element binding protein which regulates the transcription of sterol biosynthetic genes. When glucose is at a high level, ECM22 activates the sterol biosynthetic process that consumes glucose. HO, RDS1 and MCH2 are the downstream effector proteins that control different aspects of cell activities. In other words, the causality network shown here is involved in responding to different levels of glucose in yeast. Based on this information, we could infer that Abm1 might be a gene that responds to different glucose levels.Fig. 11

Bottom Line: The widely applied pairwise GC model (PGC) and other regularization strategies can lead to a significant number of false identifications when n>T.In our simulation experiments, the propose new methodology CGC-2SPR showed significant performance improvement in terms of accuracy over other widely used GC modeling (PGC, Ridge and Lasso) and MI-based (MRNET and ARACNE) methods.In our research, we noticed a " 1+1>2" effect when we combined prior knowledge and gene expression data to discover regulatory networks.

View Article: PubMed Central - PubMed

Affiliation: Department of Biochemistry and Cell Biology, Stony Brook University, Stony Brook, 11790, NY, USA. yaoshun88@gmail.com.

ABSTRACT

Background: Our study focuses on discovering gene regulatory networks from time series gene expression data using the Granger causality (GC) model. However, the number of available time points (T) usually is much smaller than the number of target genes (n) in biological datasets. The widely applied pairwise GC model (PGC) and other regularization strategies can lead to a significant number of false identifications when n>T.

Results: In this study, we proposed a new method, viz., CGC-2SPR (CGC using two-step prior Ridge regularization) to resolve the problem by incorporating prior biological knowledge about a target gene data set. In our simulation experiments, the propose new methodology CGC-2SPR showed significant performance improvement in terms of accuracy over other widely used GC modeling (PGC, Ridge and Lasso) and MI-based (MRNET and ARACNE) methods. In addition, we applied CGC-2SPR to a real biological dataset, i.e., the yeast metabolic cycle, and discovered more true positive edges with CGC-2SPR than with the other existing methods.

Conclusions: In our research, we noticed a " 1+1>2" effect when we combined prior knowledge and gene expression data to discover regulatory networks. Based on causality networks, we made a functional prediction that the Abm1 gene (its functions previously were unknown) might be related to the yeast's responses to different levels of glucose. Our research improves causality modeling by combining heterogeneous knowledge, which is well aligned with the future direction in system biology. Furthermore, we proposed a method of Monte Carlo significance estimation (MCSE) to calculate the edge significances which provide statistical meanings to the discovered causality networks. All of our data and source codes will be available under the link https://bitbucket.org/dtyu/granger-causality/wiki/Home.

No MeSH data available.