Limits...
Modelling nonstationary gene regulatory processes.

Grzegorcyzk M, Husmeier D, Rahnenführer J - Adv Bioinformatics (2010)

Bottom Line: The former aim to relax the homogeneity assumption, whereas the latter are more flexible and, in principle, more adequate for modelling nonlinear processes.In our paper, we compare both paradigms and discuss theoretical shortcomings of the latter approach.We show that a model based on the changepoint process yields systematically better results than the free allocation model when inferring nonstationary gene regulatory processes from simulated gene expression time series.

View Article: PubMed Central - PubMed

Affiliation: Department of Statistics, TU Dortmund University, 44221 Dortmund, Germany.

ABSTRACT
An important objective in systems biology is to infer gene regulatory networks from postgenomic data, and dynamic Bayesian networks have been widely applied as a popular tool to this end. The standard approach for nondiscretised data is restricted to a linear model and a homogeneous Markov chain. Recently, various generalisations based on changepoint processes and free allocation mixture models have been proposed. The former aim to relax the homogeneity assumption, whereas the latter are more flexible and, in principle, more adequate for modelling nonlinear processes. In our paper, we compare both paradigms and discuss theoretical shortcomings of the latter approach. We show that a model based on the changepoint process yields systematically better results than the free allocation model when inferring nonstationary gene regulatory processes from simulated gene expression time series. We further cross-compare the performance of both models on three biological systems: macrophages challenged with viral infection, circadian regulation in Arabidopsis thaliana, and morphogenesis in Drosophila melanogaster.

No MeSH data available.


Related in: MedlinePlus

Prior probability ratios between the heterogeneous and the homogeneous state for (i) varying time series length m ((a) and (b)) and (ii) varying segment length proportions ((c) and (d)). (a) and (b): prior probability ratio R between (i) the heterogeneous state that consists of two equally-spaced segments t2,…, t⌊m/2⌋+1 and t⌊m/2⌋+2,…, tm and (ii) the homogeneous state consisting of one single segment t2,…, tm. The prior probability ratios (vertical axis) are plotted in dependence on the time series length m = 3,5, 7,…, 25 (horizontal axis). The prior probability ratio was defined in (19). For clarity the logarithmic prior probability ratios are plotted in (b). (c) and (d): prior probability ratio R for a time series of length m = 26 between (i) the heterogeneous state with two segments t2,…, tj and tj+1,…, tm and (ii) the homogeneous state consisting of one single segment t2,…, tm only. The prior probability ratios (vertical axis) are plotted in dependence on the changepoint location (horizontal axis). For the sake of clarity the logarithmic prior probability ratios are plotted in (b). See text for further details.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2913537&req=5

fig10: Prior probability ratios between the heterogeneous and the homogeneous state for (i) varying time series length m ((a) and (b)) and (ii) varying segment length proportions ((c) and (d)). (a) and (b): prior probability ratio R between (i) the heterogeneous state that consists of two equally-spaced segments t2,…, t⌊m/2⌋+1 and t⌊m/2⌋+2,…, tm and (ii) the homogeneous state consisting of one single segment t2,…, tm. The prior probability ratios (vertical axis) are plotted in dependence on the time series length m = 3,5, 7,…, 25 (horizontal axis). The prior probability ratio was defined in (19). For clarity the logarithmic prior probability ratios are plotted in (b). (c) and (d): prior probability ratio R for a time series of length m = 26 between (i) the heterogeneous state with two segments t2,…, tj and tj+1,…, tm and (ii) the homogeneous state consisting of one single segment t2,…, tm only. The prior probability ratios (vertical axis) are plotted in dependence on the changepoint location (horizontal axis). For the sake of clarity the logarithmic prior probability ratios are plotted in (b). See text for further details.

Mentions: (21)RBGMD=P(K=2)P(K=1)·∫jj+16(m−b1)(b1−2)(m−2)3db1. In the second theoretical study we vary the length of the time series m = 3,5, 7,…, 25, and we consider a heterogeneous time series consisting of two equally-spaced segments t2,…, t⌊m/2⌋+1 and t⌊m/2⌋+2,…, tm. This corresponds to m1 = m2 = 0.5 · (m − 1) in the BGM model. For the BGMD model, we obtain with j = ⌊m/2⌋ + 1 that the changepoint has to be located in the interval b1 ∈ [t⌊m/2⌋+1, t⌊m/2⌋+2]. Figures 10(a) and 10(b) show the resulting (logarithmic) prior probability ratios in dependence on m. It can be seen that the prior ratio R for the BGM model is considerably lower than for the BGMD model. Moreover, the logarithmic plot in Figure 10(b) shows that the prior ratio of the BGM model shows a much stronger decrease with the sample size m than the BGMD model. This suggests that the BGM model imposes a more severe penalty for complexity (non-stationarity), which increases with increasing sample size m. This tendency may explain the finding in [4] for the macrophage gene expression time series, which we have reproduced in the present study (Figure 3(c)): the BGM model does not infer a clear two-phase nature of the time series under simultaneous immune activation (with IFNγ) and viral infection (with CMV). A possible biological explanation was offered in [4]. However, the novel BGM model does not support the hypothesis of a decreased probability for the two-phase nature (Figure 3(f)). Moreover, the previous analysis has revealed that a strong penalty against the two-phase process is inherent in the BGM model. This suggests that the results reported in [4], which we have reproduced in our study, might be an artefact of the BGM model rather than of genuine biological nature.


Modelling nonstationary gene regulatory processes.

Grzegorcyzk M, Husmeier D, Rahnenführer J - Adv Bioinformatics (2010)

Prior probability ratios between the heterogeneous and the homogeneous state for (i) varying time series length m ((a) and (b)) and (ii) varying segment length proportions ((c) and (d)). (a) and (b): prior probability ratio R between (i) the heterogeneous state that consists of two equally-spaced segments t2,…, t⌊m/2⌋+1 and t⌊m/2⌋+2,…, tm and (ii) the homogeneous state consisting of one single segment t2,…, tm. The prior probability ratios (vertical axis) are plotted in dependence on the time series length m = 3,5, 7,…, 25 (horizontal axis). The prior probability ratio was defined in (19). For clarity the logarithmic prior probability ratios are plotted in (b). (c) and (d): prior probability ratio R for a time series of length m = 26 between (i) the heterogeneous state with two segments t2,…, tj and tj+1,…, tm and (ii) the homogeneous state consisting of one single segment t2,…, tm only. The prior probability ratios (vertical axis) are plotted in dependence on the changepoint location (horizontal axis). For the sake of clarity the logarithmic prior probability ratios are plotted in (b). See text for further details.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2913537&req=5

fig10: Prior probability ratios between the heterogeneous and the homogeneous state for (i) varying time series length m ((a) and (b)) and (ii) varying segment length proportions ((c) and (d)). (a) and (b): prior probability ratio R between (i) the heterogeneous state that consists of two equally-spaced segments t2,…, t⌊m/2⌋+1 and t⌊m/2⌋+2,…, tm and (ii) the homogeneous state consisting of one single segment t2,…, tm. The prior probability ratios (vertical axis) are plotted in dependence on the time series length m = 3,5, 7,…, 25 (horizontal axis). The prior probability ratio was defined in (19). For clarity the logarithmic prior probability ratios are plotted in (b). (c) and (d): prior probability ratio R for a time series of length m = 26 between (i) the heterogeneous state with two segments t2,…, tj and tj+1,…, tm and (ii) the homogeneous state consisting of one single segment t2,…, tm only. The prior probability ratios (vertical axis) are plotted in dependence on the changepoint location (horizontal axis). For the sake of clarity the logarithmic prior probability ratios are plotted in (b). See text for further details.
Mentions: (21)RBGMD=P(K=2)P(K=1)·∫jj+16(m−b1)(b1−2)(m−2)3db1. In the second theoretical study we vary the length of the time series m = 3,5, 7,…, 25, and we consider a heterogeneous time series consisting of two equally-spaced segments t2,…, t⌊m/2⌋+1 and t⌊m/2⌋+2,…, tm. This corresponds to m1 = m2 = 0.5 · (m − 1) in the BGM model. For the BGMD model, we obtain with j = ⌊m/2⌋ + 1 that the changepoint has to be located in the interval b1 ∈ [t⌊m/2⌋+1, t⌊m/2⌋+2]. Figures 10(a) and 10(b) show the resulting (logarithmic) prior probability ratios in dependence on m. It can be seen that the prior ratio R for the BGM model is considerably lower than for the BGMD model. Moreover, the logarithmic plot in Figure 10(b) shows that the prior ratio of the BGM model shows a much stronger decrease with the sample size m than the BGMD model. This suggests that the BGM model imposes a more severe penalty for complexity (non-stationarity), which increases with increasing sample size m. This tendency may explain the finding in [4] for the macrophage gene expression time series, which we have reproduced in the present study (Figure 3(c)): the BGM model does not infer a clear two-phase nature of the time series under simultaneous immune activation (with IFNγ) and viral infection (with CMV). A possible biological explanation was offered in [4]. However, the novel BGM model does not support the hypothesis of a decreased probability for the two-phase nature (Figure 3(f)). Moreover, the previous analysis has revealed that a strong penalty against the two-phase process is inherent in the BGM model. This suggests that the results reported in [4], which we have reproduced in our study, might be an artefact of the BGM model rather than of genuine biological nature.

Bottom Line: The former aim to relax the homogeneity assumption, whereas the latter are more flexible and, in principle, more adequate for modelling nonlinear processes.In our paper, we compare both paradigms and discuss theoretical shortcomings of the latter approach.We show that a model based on the changepoint process yields systematically better results than the free allocation model when inferring nonstationary gene regulatory processes from simulated gene expression time series.

View Article: PubMed Central - PubMed

Affiliation: Department of Statistics, TU Dortmund University, 44221 Dortmund, Germany.

ABSTRACT
An important objective in systems biology is to infer gene regulatory networks from postgenomic data, and dynamic Bayesian networks have been widely applied as a popular tool to this end. The standard approach for nondiscretised data is restricted to a linear model and a homogeneous Markov chain. Recently, various generalisations based on changepoint processes and free allocation mixture models have been proposed. The former aim to relax the homogeneity assumption, whereas the latter are more flexible and, in principle, more adequate for modelling nonlinear processes. In our paper, we compare both paradigms and discuss theoretical shortcomings of the latter approach. We show that a model based on the changepoint process yields systematically better results than the free allocation model when inferring nonstationary gene regulatory processes from simulated gene expression time series. We further cross-compare the performance of both models on three biological systems: macrophages challenged with viral infection, circadian regulation in Arabidopsis thaliana, and morphogenesis in Drosophila melanogaster.

No MeSH data available.


Related in: MedlinePlus