Limits...
Validation and selection of ODE based systems biology models: how to arrive at more reliable decisions.

Hasdemir D, Hoefsloot HC, Smilde AK - BMC Syst Biol (2015)

Bottom Line: However, drawbacks associated with this approach are usually under-estimated.The hold-out validation strategy leads to biased conclusions, since it can lead to different validation and selection decisions when different partitioning schemes are used.Therefore, it proves to be a promising alternative to the standard hold-out validation strategy.

View Article: PubMed Central - PubMed

Affiliation: Biosystems Data Analysis Group, Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, The Netherlands. D.Hasdemir@uva.nl.

ABSTRACT

Background: Most ordinary differential equation (ODE) based modeling studies in systems biology involve a hold-out validation step for model validation. In this framework a pre-determined part of the data is used as validation data and, therefore it is not used for estimating the parameters of the model. The model is assumed to be validated if the model predictions on the validation dataset show good agreement with the data. Model selection between alternative model structures can also be performed in the same setting, based on the predictive power of the model structures on the validation dataset. However, drawbacks associated with this approach are usually under-estimated.

Results: We have carried out simulations by using a recently published High Osmolarity Glycerol (HOG) pathway from S.cerevisiae to demonstrate these drawbacks. We have shown that it is very important how the data is partitioned and which part of the data is used for validation purposes. The hold-out validation strategy leads to biased conclusions, since it can lead to different validation and selection decisions when different partitioning schemes are used. Furthermore, finding sensible partitioning schemes that would lead to reliable decisions are heavily dependent on the biology and unknown model parameters which turns the problem into a paradox. This brings the need for alternative validation approaches that offer flexible partitioning of the data. For this purpose, we have introduced a stratified random cross-validation (SRCV) approach that successfully overcomes these limitations.

Conclusions: SRCV leads to more stable decisions for both validation and selection which are not biased by underlying biological phenomena. Furthermore, it is less dependent on the specific noise realization in the data. Therefore, it proves to be a promising alternative to the standard hold-out validation strategy.

No MeSH data available.


Related in: MedlinePlus

Normalized bias and standard deviation of branch parameters. Bar graphs show the median of the normalized bias and standard deviation of parameters across all noise realizations. Only some of the branch parameters are shown in the figure. Parameters p4-p5-p6 play a role in the Sln1 branch and parameters p8-p9-p10 are in the Sho1 branch. Blue, green and black refers to the Sln1, Sho1 and WT schemes respectively. a Median of normalized bias in Sln1 branch parameters. b Median of normalized bias in Sho1 branch parameters. c Median of normalized standard deviation in Sln1 branch parameters. d Median of normalized standard deviation in Sho1 branch parameters
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4493957&req=5

Fig10: Normalized bias and standard deviation of branch parameters. Bar graphs show the median of the normalized bias and standard deviation of parameters across all noise realizations. Only some of the branch parameters are shown in the figure. Parameters p4-p5-p6 play a role in the Sln1 branch and parameters p8-p9-p10 are in the Sho1 branch. Blue, green and black refers to the Sln1, Sho1 and WT schemes respectively. a Median of normalized bias in Sln1 branch parameters. b Median of normalized bias in Sho1 branch parameters. c Median of normalized standard deviation in Sln1 branch parameters. d Median of normalized standard deviation in Sho1 branch parameters

Mentions: We measured the parameter estimation quality by using the normalized bias of each parameter. The median of this measure across all noise realizations shows how well the parameter was estimated in general in a certain scheme. In Fig. 10b, we see that the parameters related to the complex formation of Sho1 and Pbs2 proteins and this complex’ phosphorylation, p8 and p9, were predicted with very high bias in the Sln1 scheme. (see the yellow region in Fig. 1). This means that when the Sln1 data is used for model training, we estimate the Sho1 branch parameters with a very high uncertainty with a median bias of 31 % and 33 %, respectively. The same reasoning is valid also for the estimation of two of the parameters related to the phosphorylation of the Pbs2 protein, p4 and p5. The median bias for these parameters (see the gray region in Fig. 1) were found to be 17 % and 14 %, respectively (see Fig. 10a). There is an interesting difference between the estimation quality of the parameters in the two different branches, though. We could decrease the bias of the Sln1 branch parameters considerably when we used the WT data for training the model. However, the level of bias in the Sho1 branch parameters was still relatively high in the WT scheme compared to the Sho1 scheme. Similarly, the identifiability analysis (see Fig. 10c) shows that training the model on the WT data results in similar standard deviations in the Sln1 branch parameters when compared to the standard deviations obtained in the Sln1 scheme. However, the standard deviations of the Sho1 branch parameters are much lower in the Sho1 scheme compared to those obtained in the WT scheme (see Fig. 10d) which suggests improved identifiability in the Sho1 scheme. Therefore, the Sln1 data could be predicted well in the WT scheme whereas the prediction of the Sho1 data was still problematic. As a further investigation on the system dynamics, we tuned one of the branch parameters each time within a range limited by the minimum and maximum of their estimated values. This allowed us to confirm the deteriorating effect of biased branch parameters on the predictions (data not shown).Fig. 10


Validation and selection of ODE based systems biology models: how to arrive at more reliable decisions.

Hasdemir D, Hoefsloot HC, Smilde AK - BMC Syst Biol (2015)

Normalized bias and standard deviation of branch parameters. Bar graphs show the median of the normalized bias and standard deviation of parameters across all noise realizations. Only some of the branch parameters are shown in the figure. Parameters p4-p5-p6 play a role in the Sln1 branch and parameters p8-p9-p10 are in the Sho1 branch. Blue, green and black refers to the Sln1, Sho1 and WT schemes respectively. a Median of normalized bias in Sln1 branch parameters. b Median of normalized bias in Sho1 branch parameters. c Median of normalized standard deviation in Sln1 branch parameters. d Median of normalized standard deviation in Sho1 branch parameters
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4493957&req=5

Fig10: Normalized bias and standard deviation of branch parameters. Bar graphs show the median of the normalized bias and standard deviation of parameters across all noise realizations. Only some of the branch parameters are shown in the figure. Parameters p4-p5-p6 play a role in the Sln1 branch and parameters p8-p9-p10 are in the Sho1 branch. Blue, green and black refers to the Sln1, Sho1 and WT schemes respectively. a Median of normalized bias in Sln1 branch parameters. b Median of normalized bias in Sho1 branch parameters. c Median of normalized standard deviation in Sln1 branch parameters. d Median of normalized standard deviation in Sho1 branch parameters
Mentions: We measured the parameter estimation quality by using the normalized bias of each parameter. The median of this measure across all noise realizations shows how well the parameter was estimated in general in a certain scheme. In Fig. 10b, we see that the parameters related to the complex formation of Sho1 and Pbs2 proteins and this complex’ phosphorylation, p8 and p9, were predicted with very high bias in the Sln1 scheme. (see the yellow region in Fig. 1). This means that when the Sln1 data is used for model training, we estimate the Sho1 branch parameters with a very high uncertainty with a median bias of 31 % and 33 %, respectively. The same reasoning is valid also for the estimation of two of the parameters related to the phosphorylation of the Pbs2 protein, p4 and p5. The median bias for these parameters (see the gray region in Fig. 1) were found to be 17 % and 14 %, respectively (see Fig. 10a). There is an interesting difference between the estimation quality of the parameters in the two different branches, though. We could decrease the bias of the Sln1 branch parameters considerably when we used the WT data for training the model. However, the level of bias in the Sho1 branch parameters was still relatively high in the WT scheme compared to the Sho1 scheme. Similarly, the identifiability analysis (see Fig. 10c) shows that training the model on the WT data results in similar standard deviations in the Sln1 branch parameters when compared to the standard deviations obtained in the Sln1 scheme. However, the standard deviations of the Sho1 branch parameters are much lower in the Sho1 scheme compared to those obtained in the WT scheme (see Fig. 10d) which suggests improved identifiability in the Sho1 scheme. Therefore, the Sln1 data could be predicted well in the WT scheme whereas the prediction of the Sho1 data was still problematic. As a further investigation on the system dynamics, we tuned one of the branch parameters each time within a range limited by the minimum and maximum of their estimated values. This allowed us to confirm the deteriorating effect of biased branch parameters on the predictions (data not shown).Fig. 10

Bottom Line: However, drawbacks associated with this approach are usually under-estimated.The hold-out validation strategy leads to biased conclusions, since it can lead to different validation and selection decisions when different partitioning schemes are used.Therefore, it proves to be a promising alternative to the standard hold-out validation strategy.

View Article: PubMed Central - PubMed

Affiliation: Biosystems Data Analysis Group, Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, The Netherlands. D.Hasdemir@uva.nl.

ABSTRACT

Background: Most ordinary differential equation (ODE) based modeling studies in systems biology involve a hold-out validation step for model validation. In this framework a pre-determined part of the data is used as validation data and, therefore it is not used for estimating the parameters of the model. The model is assumed to be validated if the model predictions on the validation dataset show good agreement with the data. Model selection between alternative model structures can also be performed in the same setting, based on the predictive power of the model structures on the validation dataset. However, drawbacks associated with this approach are usually under-estimated.

Results: We have carried out simulations by using a recently published High Osmolarity Glycerol (HOG) pathway from S.cerevisiae to demonstrate these drawbacks. We have shown that it is very important how the data is partitioned and which part of the data is used for validation purposes. The hold-out validation strategy leads to biased conclusions, since it can lead to different validation and selection decisions when different partitioning schemes are used. Furthermore, finding sensible partitioning schemes that would lead to reliable decisions are heavily dependent on the biology and unknown model parameters which turns the problem into a paradox. This brings the need for alternative validation approaches that offer flexible partitioning of the data. For this purpose, we have introduced a stratified random cross-validation (SRCV) approach that successfully overcomes these limitations.

Conclusions: SRCV leads to more stable decisions for both validation and selection which are not biased by underlying biological phenomena. Furthermore, it is less dependent on the specific noise realization in the data. Therefore, it proves to be a promising alternative to the standard hold-out validation strategy.

No MeSH data available.


Related in: MedlinePlus