Granger causality vs. dynamic Bayesian network inference: a comparative study.
Bottom Line:
For synthesized data, a critical point of the data length is found: the dynamic Bayesian network outperforms the Granger causality approach when the data length is short, and vice versa.We then test our results in experimental data of short length which is a common scenario in current biological experiments: it is again confirmed that the dynamic Bayesian network works better.When the data size is short, the dynamic Bayesian network inference performs better than the Granger causality approach; otherwise the Granger causality approach is better.
Affiliation: Department of Computer Science, University of Warwick, Coventry, UK. csrcbh@dcs.warwick.ac.uk
ABSTRACT
Show MeSH
Background: In computational biology, one often faces the problem of deriving the causal relationship among different elements such as genes, proteins, metabolites, neurons and so on, based upon multi-dimensional temporal data. Currently, there are two common approaches used to explore the network structure among elements. One is the Granger causality approach, and the other is the dynamic Bayesian network inference approach. Both have at least a few thousand publications reported in the literature. A key issue is to choose which approach is used to tackle the data, in particular when they give rise to contradictory results. Results: In this paper, we provide an answer by focusing on a systematic and computationally intensive comparison between the two approaches on both synthesized and experimental data. For synthesized data, a critical point of the data length is found: the dynamic Bayesian network outperforms the Granger causality approach when the data length is short, and vice versa. We then test our results in experimental data of short length which is a common scenario in current biological experiments: it is again confirmed that the dynamic Bayesian network works better. Conclusion: When the data size is short, the dynamic Bayesian network inference performs better than the Granger causality approach; otherwise the Granger causality approach is better. |
Related In:
Results -
Collection
License getmorefigures.php?uid=PMC2691740&req=5
Mentions: where n is the time, and [ε1, ε2, ε3, ε4, ε5] are independent Gaussian white noise processes with zero means and unit variances. From the equations, we see that X1(n) is a cause of X2(n), X3(n) and X4(n), and X4(n) and X5(n) share a feedback loop with each other, as depicted in Figure 1B. Figure 1A shows an example of the time trace of 5 time series. For the Granger causality approach, we simulated the fitted vector autoregressive (VAR) model to generate a data set of 100 realizations of 1000 time points, and applied the bootstrap approach to construct the 95% confidence intervals (Figure 1C). For Granger causality, we assume the causality value is Gaussian distributed. Then the confidence intervals can be obtained by calculating the mean and standard derivation values [21,22]. According to the confidence intervals, one can derive the network structure as shown in Figure 1B which correctly recovers the pattern of the connectivity in our toy model. For the dynamic Bayesian network inference approach, we can infer a network structure (Figure 1Da) for each realization of 1000 time points. The final resulting causal network model was inferred with high-confidence causal arcs (the arcs occur more than 95% of the time in the whole population) between various variables [13]. This complex network contains the information of different time-lags for each variable. It fits exactly the pattern of connectivity in our VAR model. In order to compare it with the Granger causality approach, we can further simplify the network by hiding the information of time-lags, and then we infer the exactly same structure as the Granger causality approach (Figure 1Dd). From this simple example, we can find that both approaches can reveal correct network structures for the data with a large sample size (1000 here). |
View Article: PubMed Central - HTML - PubMed
Affiliation: Department of Computer Science, University of Warwick, Coventry, UK. csrcbh@dcs.warwick.ac.uk
Background: In computational biology, one often faces the problem of deriving the causal relationship among different elements such as genes, proteins, metabolites, neurons and so on, based upon multi-dimensional temporal data. Currently, there are two common approaches used to explore the network structure among elements. One is the Granger causality approach, and the other is the dynamic Bayesian network inference approach. Both have at least a few thousand publications reported in the literature. A key issue is to choose which approach is used to tackle the data, in particular when they give rise to contradictory results.
Results: In this paper, we provide an answer by focusing on a systematic and computationally intensive comparison between the two approaches on both synthesized and experimental data. For synthesized data, a critical point of the data length is found: the dynamic Bayesian network outperforms the Granger causality approach when the data length is short, and vice versa. We then test our results in experimental data of short length which is a common scenario in current biological experiments: it is again confirmed that the dynamic Bayesian network works better.
Conclusion: When the data size is short, the dynamic Bayesian network inference performs better than the Granger causality approach; otherwise the Granger causality approach is better.