Limits...
Granger causality vs. dynamic Bayesian network inference: a comparative study.

Zou C, Denby KJ, Feng J - BMC Bioinformatics (2009)

Bottom Line: For synthesized data, a critical point of the data length is found: the dynamic Bayesian network outperforms the Granger causality approach when the data length is short, and vice versa.We then test our results in experimental data of short length which is a common scenario in current biological experiments: it is again confirmed that the dynamic Bayesian network works better.When the data size is short, the dynamic Bayesian network inference performs better than the Granger causality approach; otherwise the Granger causality approach is better.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, University of Warwick, Coventry, UK. csrcbh@dcs.warwick.ac.uk

ABSTRACT

Background: In computational biology, one often faces the problem of deriving the causal relationship among different elements such as genes, proteins, metabolites, neurons and so on, based upon multi-dimensional temporal data. Currently, there are two common approaches used to explore the network structure among elements. One is the Granger causality approach, and the other is the dynamic Bayesian network inference approach. Both have at least a few thousand publications reported in the literature. A key issue is to choose which approach is used to tackle the data, in particular when they give rise to contradictory results.

Results: In this paper, we provide an answer by focusing on a systematic and computationally intensive comparison between the two approaches on both synthesized and experimental data. For synthesized data, a critical point of the data length is found: the dynamic Bayesian network outperforms the Granger causality approach when the data length is short, and vice versa. We then test our results in experimental data of short length which is a common scenario in current biological experiments: it is again confirmed that the dynamic Bayesian network works better.

Conclusion: When the data size is short, the dynamic Bayesian network inference performs better than the Granger causality approach; otherwise the Granger causality approach is better.

Show MeSH
Granger causality and Bayesian network inference approaches applied on a simple linear toy model. A. Five time series are simultaneously generated, and the length of each time series is 1000. X2, X3, X4 and X5 are shifted upward for visualization purpose. B. Granger causality results. (a) The network structure inferred from Granger causality approach. (b) The 95% confidence intervals graph for all the possible directed connections. (c) For visualization purpose, all directed edges (causalities) are sorted and enumerated into the table. The total number of edges is 20. C. Dynamic Bayesian network inference results. (a) The causal network structure learned from Bayesian network inference. (b) Each variable is represented by four nodes, representing different time-lags, we have a total of 20 nodes. They are numbered and enumerated in the table. (c) The simplified network structure: since we only care about the causality to the current time status, we can remove all the other edges and nodes that have no connection to the node 16 to node 20 (five variables with current time status). (d). A further simplified network structure of causality.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2691740&req=5

Figure 1: Granger causality and Bayesian network inference approaches applied on a simple linear toy model. A. Five time series are simultaneously generated, and the length of each time series is 1000. X2, X3, X4 and X5 are shifted upward for visualization purpose. B. Granger causality results. (a) The network structure inferred from Granger causality approach. (b) The 95% confidence intervals graph for all the possible directed connections. (c) For visualization purpose, all directed edges (causalities) are sorted and enumerated into the table. The total number of edges is 20. C. Dynamic Bayesian network inference results. (a) The causal network structure learned from Bayesian network inference. (b) Each variable is represented by four nodes, representing different time-lags, we have a total of 20 nodes. They are numbered and enumerated in the table. (c) The simplified network structure: since we only care about the causality to the current time status, we can remove all the other edges and nodes that have no connection to the node 16 to node 20 (five variables with current time status). (d). A further simplified network structure of causality.

Mentions: where n is the time, and [ε1, ε2, ε3, ε4, ε5] are independent Gaussian white noise processes with zero means and unit variances. From the equations, we see that X1(n) is a cause of X2(n), X3(n) and X4(n), and X4(n) and X5(n) share a feedback loop with each other, as depicted in Figure 1B. Figure 1A shows an example of the time trace of 5 time series. For the Granger causality approach, we simulated the fitted vector autoregressive (VAR) model to generate a data set of 100 realizations of 1000 time points, and applied the bootstrap approach to construct the 95% confidence intervals (Figure 1C). For Granger causality, we assume the causality value is Gaussian distributed. Then the confidence intervals can be obtained by calculating the mean and standard derivation values [21,22]. According to the confidence intervals, one can derive the network structure as shown in Figure 1B which correctly recovers the pattern of the connectivity in our toy model. For the dynamic Bayesian network inference approach, we can infer a network structure (Figure 1Da) for each realization of 1000 time points. The final resulting causal network model was inferred with high-confidence causal arcs (the arcs occur more than 95% of the time in the whole population) between various variables [13]. This complex network contains the information of different time-lags for each variable. It fits exactly the pattern of connectivity in our VAR model. In order to compare it with the Granger causality approach, we can further simplify the network by hiding the information of time-lags, and then we infer the exactly same structure as the Granger causality approach (Figure 1Dd). From this simple example, we can find that both approaches can reveal correct network structures for the data with a large sample size (1000 here).


Granger causality vs. dynamic Bayesian network inference: a comparative study.

Zou C, Denby KJ, Feng J - BMC Bioinformatics (2009)

Granger causality and Bayesian network inference approaches applied on a simple linear toy model. A. Five time series are simultaneously generated, and the length of each time series is 1000. X2, X3, X4 and X5 are shifted upward for visualization purpose. B. Granger causality results. (a) The network structure inferred from Granger causality approach. (b) The 95% confidence intervals graph for all the possible directed connections. (c) For visualization purpose, all directed edges (causalities) are sorted and enumerated into the table. The total number of edges is 20. C. Dynamic Bayesian network inference results. (a) The causal network structure learned from Bayesian network inference. (b) Each variable is represented by four nodes, representing different time-lags, we have a total of 20 nodes. They are numbered and enumerated in the table. (c) The simplified network structure: since we only care about the causality to the current time status, we can remove all the other edges and nodes that have no connection to the node 16 to node 20 (five variables with current time status). (d). A further simplified network structure of causality.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2691740&req=5

Figure 1: Granger causality and Bayesian network inference approaches applied on a simple linear toy model. A. Five time series are simultaneously generated, and the length of each time series is 1000. X2, X3, X4 and X5 are shifted upward for visualization purpose. B. Granger causality results. (a) The network structure inferred from Granger causality approach. (b) The 95% confidence intervals graph for all the possible directed connections. (c) For visualization purpose, all directed edges (causalities) are sorted and enumerated into the table. The total number of edges is 20. C. Dynamic Bayesian network inference results. (a) The causal network structure learned from Bayesian network inference. (b) Each variable is represented by four nodes, representing different time-lags, we have a total of 20 nodes. They are numbered and enumerated in the table. (c) The simplified network structure: since we only care about the causality to the current time status, we can remove all the other edges and nodes that have no connection to the node 16 to node 20 (five variables with current time status). (d). A further simplified network structure of causality.
Mentions: where n is the time, and [ε1, ε2, ε3, ε4, ε5] are independent Gaussian white noise processes with zero means and unit variances. From the equations, we see that X1(n) is a cause of X2(n), X3(n) and X4(n), and X4(n) and X5(n) share a feedback loop with each other, as depicted in Figure 1B. Figure 1A shows an example of the time trace of 5 time series. For the Granger causality approach, we simulated the fitted vector autoregressive (VAR) model to generate a data set of 100 realizations of 1000 time points, and applied the bootstrap approach to construct the 95% confidence intervals (Figure 1C). For Granger causality, we assume the causality value is Gaussian distributed. Then the confidence intervals can be obtained by calculating the mean and standard derivation values [21,22]. According to the confidence intervals, one can derive the network structure as shown in Figure 1B which correctly recovers the pattern of the connectivity in our toy model. For the dynamic Bayesian network inference approach, we can infer a network structure (Figure 1Da) for each realization of 1000 time points. The final resulting causal network model was inferred with high-confidence causal arcs (the arcs occur more than 95% of the time in the whole population) between various variables [13]. This complex network contains the information of different time-lags for each variable. It fits exactly the pattern of connectivity in our VAR model. In order to compare it with the Granger causality approach, we can further simplify the network by hiding the information of time-lags, and then we infer the exactly same structure as the Granger causality approach (Figure 1Dd). From this simple example, we can find that both approaches can reveal correct network structures for the data with a large sample size (1000 here).

Bottom Line: For synthesized data, a critical point of the data length is found: the dynamic Bayesian network outperforms the Granger causality approach when the data length is short, and vice versa.We then test our results in experimental data of short length which is a common scenario in current biological experiments: it is again confirmed that the dynamic Bayesian network works better.When the data size is short, the dynamic Bayesian network inference performs better than the Granger causality approach; otherwise the Granger causality approach is better.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, University of Warwick, Coventry, UK. csrcbh@dcs.warwick.ac.uk

ABSTRACT

Background: In computational biology, one often faces the problem of deriving the causal relationship among different elements such as genes, proteins, metabolites, neurons and so on, based upon multi-dimensional temporal data. Currently, there are two common approaches used to explore the network structure among elements. One is the Granger causality approach, and the other is the dynamic Bayesian network inference approach. Both have at least a few thousand publications reported in the literature. A key issue is to choose which approach is used to tackle the data, in particular when they give rise to contradictory results.

Results: In this paper, we provide an answer by focusing on a systematic and computationally intensive comparison between the two approaches on both synthesized and experimental data. For synthesized data, a critical point of the data length is found: the dynamic Bayesian network outperforms the Granger causality approach when the data length is short, and vice versa. We then test our results in experimental data of short length which is a common scenario in current biological experiments: it is again confirmed that the dynamic Bayesian network works better.

Conclusion: When the data size is short, the dynamic Bayesian network inference performs better than the Granger causality approach; otherwise the Granger causality approach is better.

Show MeSH