Limits...
Granger causality vs. dynamic Bayesian network inference: a comparative study.

Zou C, Denby KJ, Feng J - BMC Bioinformatics (2009)

Bottom Line: For synthesized data, a critical point of the data length is found: the dynamic Bayesian network outperforms the Granger causality approach when the data length is short, and vice versa.We then test our results in experimental data of short length which is a common scenario in current biological experiments: it is again confirmed that the dynamic Bayesian network works better.When the data size is short, the dynamic Bayesian network inference performs better than the Granger causality approach; otherwise the Granger causality approach is better.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, University of Warwick, Coventry, UK. csrcbh@dcs.warwick.ac.uk

ABSTRACT

Background: In computational biology, one often faces the problem of deriving the causal relationship among different elements such as genes, proteins, metabolites, neurons and so on, based upon multi-dimensional temporal data. Currently, there are two common approaches used to explore the network structure among elements. One is the Granger causality approach, and the other is the dynamic Bayesian network inference approach. Both have at least a few thousand publications reported in the literature. A key issue is to choose which approach is used to tackle the data, in particular when they give rise to contradictory results.

Results: In this paper, we provide an answer by focusing on a systematic and computationally intensive comparison between the two approaches on both synthesized and experimental data. For synthesized data, a critical point of the data length is found: the dynamic Bayesian network outperforms the Granger causality approach when the data length is short, and vice versa. We then test our results in experimental data of short length which is a common scenario in current biological experiments: it is again confirmed that the dynamic Bayesian network works better.

Conclusion: When the data size is short, the dynamic Bayesian network inference performs better than the Granger causality approach; otherwise the Granger causality approach is better.

Show MeSH

Related in: MedlinePlus

Granger causality and Bayesian network inference applied on a stochastic coefficients toy model. The parameters in polynomial equation are randomly generated in the interval [-1,1]. For each randomly generated coefficient vector, we applied the same approach as example 1: bootstrapping method and 95% confidence interval for Granger causality; 95% high confidence arcs are chosen from Bayesian network inference. (A) We applied both approaches on different sample size (from 20 to 900). For each sample size, we generated 100 different coefficient vectors, so the total number of directed interactions for each sample size is 500. (a) The percentage of detected true positive causalities for both approaches. (b) Time cost for both approaches. (B) For sample size 900, the derived causality (1 represents positive causality and 0 represents negative) is plotted with the absolute value of corresponding coefficients. For visualization purpose, the figure for Granger causality is shifted upward. (C) Linear model fitting comparison for both Granger causality and Bayesian networks. Using a number of training data points to fit both linear models, one can calculate a corresponding predicted mean-square error by applying a set of test data. And we can find that Bayesian networks inference approach works much better than the Granger causality approach when the sample size is significant small (around 100). When the sample size is significant large, both approaches converge to the standard error which exactly fits the noise term in our toy model.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2691740&req=5

Figure 3: Granger causality and Bayesian network inference applied on a stochastic coefficients toy model. The parameters in polynomial equation are randomly generated in the interval [-1,1]. For each randomly generated coefficient vector, we applied the same approach as example 1: bootstrapping method and 95% confidence interval for Granger causality; 95% high confidence arcs are chosen from Bayesian network inference. (A) We applied both approaches on different sample size (from 20 to 900). For each sample size, we generated 100 different coefficient vectors, so the total number of directed interactions for each sample size is 500. (a) The percentage of detected true positive causalities for both approaches. (b) Time cost for both approaches. (B) For sample size 900, the derived causality (1 represents positive causality and 0 represents negative) is plotted with the absolute value of corresponding coefficients. For visualization purpose, the figure for Granger causality is shifted upward. (C) Linear model fitting comparison for both Granger causality and Bayesian networks. Using a number of training data points to fit both linear models, one can calculate a corresponding predicted mean-square error by applying a set of test data. And we can find that Bayesian networks inference approach works much better than the Granger causality approach when the sample size is significant small (around 100). When the sample size is significant large, both approaches converge to the standard error which exactly fits the noise term in our toy model.

Mentions: Figure 3Aa shows the comparison result of the percentage of true positive connections derived from these two methods. In general, the Granger causality approach can infer slightly more true positive causalities compared to the Bayesian network inference approach when the data length is long. It is interesting to see that there is a critical point at around 30 in Figure 3Aa. If the sample size is larger than 30, then the Bayesian network recovers less positive connections. However, if the sample size is smaller than 30, the Bayesian network performs better. From Figure 3Ab, we see that computing time for the Bayesian network inference is much larger than the Granger causality.


Granger causality vs. dynamic Bayesian network inference: a comparative study.

Zou C, Denby KJ, Feng J - BMC Bioinformatics (2009)

Granger causality and Bayesian network inference applied on a stochastic coefficients toy model. The parameters in polynomial equation are randomly generated in the interval [-1,1]. For each randomly generated coefficient vector, we applied the same approach as example 1: bootstrapping method and 95% confidence interval for Granger causality; 95% high confidence arcs are chosen from Bayesian network inference. (A) We applied both approaches on different sample size (from 20 to 900). For each sample size, we generated 100 different coefficient vectors, so the total number of directed interactions for each sample size is 500. (a) The percentage of detected true positive causalities for both approaches. (b) Time cost for both approaches. (B) For sample size 900, the derived causality (1 represents positive causality and 0 represents negative) is plotted with the absolute value of corresponding coefficients. For visualization purpose, the figure for Granger causality is shifted upward. (C) Linear model fitting comparison for both Granger causality and Bayesian networks. Using a number of training data points to fit both linear models, one can calculate a corresponding predicted mean-square error by applying a set of test data. And we can find that Bayesian networks inference approach works much better than the Granger causality approach when the sample size is significant small (around 100). When the sample size is significant large, both approaches converge to the standard error which exactly fits the noise term in our toy model.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2691740&req=5

Figure 3: Granger causality and Bayesian network inference applied on a stochastic coefficients toy model. The parameters in polynomial equation are randomly generated in the interval [-1,1]. For each randomly generated coefficient vector, we applied the same approach as example 1: bootstrapping method and 95% confidence interval for Granger causality; 95% high confidence arcs are chosen from Bayesian network inference. (A) We applied both approaches on different sample size (from 20 to 900). For each sample size, we generated 100 different coefficient vectors, so the total number of directed interactions for each sample size is 500. (a) The percentage of detected true positive causalities for both approaches. (b) Time cost for both approaches. (B) For sample size 900, the derived causality (1 represents positive causality and 0 represents negative) is plotted with the absolute value of corresponding coefficients. For visualization purpose, the figure for Granger causality is shifted upward. (C) Linear model fitting comparison for both Granger causality and Bayesian networks. Using a number of training data points to fit both linear models, one can calculate a corresponding predicted mean-square error by applying a set of test data. And we can find that Bayesian networks inference approach works much better than the Granger causality approach when the sample size is significant small (around 100). When the sample size is significant large, both approaches converge to the standard error which exactly fits the noise term in our toy model.
Mentions: Figure 3Aa shows the comparison result of the percentage of true positive connections derived from these two methods. In general, the Granger causality approach can infer slightly more true positive causalities compared to the Bayesian network inference approach when the data length is long. It is interesting to see that there is a critical point at around 30 in Figure 3Aa. If the sample size is larger than 30, then the Bayesian network recovers less positive connections. However, if the sample size is smaller than 30, the Bayesian network performs better. From Figure 3Ab, we see that computing time for the Bayesian network inference is much larger than the Granger causality.

Bottom Line: For synthesized data, a critical point of the data length is found: the dynamic Bayesian network outperforms the Granger causality approach when the data length is short, and vice versa.We then test our results in experimental data of short length which is a common scenario in current biological experiments: it is again confirmed that the dynamic Bayesian network works better.When the data size is short, the dynamic Bayesian network inference performs better than the Granger causality approach; otherwise the Granger causality approach is better.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, University of Warwick, Coventry, UK. csrcbh@dcs.warwick.ac.uk

ABSTRACT

Background: In computational biology, one often faces the problem of deriving the causal relationship among different elements such as genes, proteins, metabolites, neurons and so on, based upon multi-dimensional temporal data. Currently, there are two common approaches used to explore the network structure among elements. One is the Granger causality approach, and the other is the dynamic Bayesian network inference approach. Both have at least a few thousand publications reported in the literature. A key issue is to choose which approach is used to tackle the data, in particular when they give rise to contradictory results.

Results: In this paper, we provide an answer by focusing on a systematic and computationally intensive comparison between the two approaches on both synthesized and experimental data. For synthesized data, a critical point of the data length is found: the dynamic Bayesian network outperforms the Granger causality approach when the data length is short, and vice versa. We then test our results in experimental data of short length which is a common scenario in current biological experiments: it is again confirmed that the dynamic Bayesian network works better.

Conclusion: When the data size is short, the dynamic Bayesian network inference performs better than the Granger causality approach; otherwise the Granger causality approach is better.

Show MeSH
Related in: MedlinePlus