Limits...
Validation and selection of ODE based systems biology models: how to arrive at more reliable decisions.

Hasdemir D, Hoefsloot HC, Smilde AK - BMC Syst Biol (2015)

Bottom Line: However, drawbacks associated with this approach are usually under-estimated.The hold-out validation strategy leads to biased conclusions, since it can lead to different validation and selection decisions when different partitioning schemes are used.Therefore, it proves to be a promising alternative to the standard hold-out validation strategy.

View Article: PubMed Central - PubMed

Affiliation: Biosystems Data Analysis Group, Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, The Netherlands. D.Hasdemir@uva.nl.

ABSTRACT

Background: Most ordinary differential equation (ODE) based modeling studies in systems biology involve a hold-out validation step for model validation. In this framework a pre-determined part of the data is used as validation data and, therefore it is not used for estimating the parameters of the model. The model is assumed to be validated if the model predictions on the validation dataset show good agreement with the data. Model selection between alternative model structures can also be performed in the same setting, based on the predictive power of the model structures on the validation dataset. However, drawbacks associated with this approach are usually under-estimated.

Results: We have carried out simulations by using a recently published High Osmolarity Glycerol (HOG) pathway from S.cerevisiae to demonstrate these drawbacks. We have shown that it is very important how the data is partitioned and which part of the data is used for validation purposes. The hold-out validation strategy leads to biased conclusions, since it can lead to different validation and selection decisions when different partitioning schemes are used. Furthermore, finding sensible partitioning schemes that would lead to reliable decisions are heavily dependent on the biology and unknown model parameters which turns the problem into a paradox. This brings the need for alternative validation approaches that offer flexible partitioning of the data. For this purpose, we have introduced a stratified random cross-validation (SRCV) approach that successfully overcomes these limitations.

Conclusions: SRCV leads to more stable decisions for both validation and selection which are not biased by underlying biological phenomena. Furthermore, it is less dependent on the specific noise realization in the data. Therefore, it proves to be a promising alternative to the standard hold-out validation strategy.

No MeSH data available.


Related in: MedlinePlus

The pathway topology proposed in [25]. We used this model as our true model and generated data based upon it. The black lines with small arrow tips depict the transition between different species in the model like production, degradation or complex formation. The black lines with open circle tips depict the phosphorylation process by kinases. The lines with open triangle tips show activating regulatory interactions whereas lines with blunt ends show deactivating regulatory interactions. The red colored double arrow denotes the post translational regulation of glycerol production by the active phosphorylated Hog1 protein. We did not consider this regulatory interaction in our simplified model. The dotted ellipses in the upper left hand corner indicate the two different upstream activation routes important in our study. Parts of the pathway whose parameters were affected by the choice of the partitioning scheme were highlighted yellow and gray. We explained the changes in the parameters of those regions in our results section. (Figure adopted from [23])
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4493957&req=5

Fig1: The pathway topology proposed in [25]. We used this model as our true model and generated data based upon it. The black lines with small arrow tips depict the transition between different species in the model like production, degradation or complex formation. The black lines with open circle tips depict the phosphorylation process by kinases. The lines with open triangle tips show activating regulatory interactions whereas lines with blunt ends show deactivating regulatory interactions. The red colored double arrow denotes the post translational regulation of glycerol production by the active phosphorylated Hog1 protein. We did not consider this regulatory interaction in our simplified model. The dotted ellipses in the upper left hand corner indicate the two different upstream activation routes important in our study. Parts of the pathway whose parameters were affected by the choice of the partitioning scheme were highlighted yellow and gray. We explained the changes in the parameters of those regions in our results section. (Figure adopted from [23])

Mentions: We used the high osmolarity glycerol pathway model in S.cerevisiae which was presented as the best approximating model in [25] (see Fig. 1) to generate synthetic data. The model is available in Biomodels Database [26] with the accession number MODEL1209110001. The readers are referred to the original paper for the details of the model structure.Fig. 1


Validation and selection of ODE based systems biology models: how to arrive at more reliable decisions.

Hasdemir D, Hoefsloot HC, Smilde AK - BMC Syst Biol (2015)

The pathway topology proposed in [25]. We used this model as our true model and generated data based upon it. The black lines with small arrow tips depict the transition between different species in the model like production, degradation or complex formation. The black lines with open circle tips depict the phosphorylation process by kinases. The lines with open triangle tips show activating regulatory interactions whereas lines with blunt ends show deactivating regulatory interactions. The red colored double arrow denotes the post translational regulation of glycerol production by the active phosphorylated Hog1 protein. We did not consider this regulatory interaction in our simplified model. The dotted ellipses in the upper left hand corner indicate the two different upstream activation routes important in our study. Parts of the pathway whose parameters were affected by the choice of the partitioning scheme were highlighted yellow and gray. We explained the changes in the parameters of those regions in our results section. (Figure adopted from [23])
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4493957&req=5

Fig1: The pathway topology proposed in [25]. We used this model as our true model and generated data based upon it. The black lines with small arrow tips depict the transition between different species in the model like production, degradation or complex formation. The black lines with open circle tips depict the phosphorylation process by kinases. The lines with open triangle tips show activating regulatory interactions whereas lines with blunt ends show deactivating regulatory interactions. The red colored double arrow denotes the post translational regulation of glycerol production by the active phosphorylated Hog1 protein. We did not consider this regulatory interaction in our simplified model. The dotted ellipses in the upper left hand corner indicate the two different upstream activation routes important in our study. Parts of the pathway whose parameters were affected by the choice of the partitioning scheme were highlighted yellow and gray. We explained the changes in the parameters of those regions in our results section. (Figure adopted from [23])
Mentions: We used the high osmolarity glycerol pathway model in S.cerevisiae which was presented as the best approximating model in [25] (see Fig. 1) to generate synthetic data. The model is available in Biomodels Database [26] with the accession number MODEL1209110001. The readers are referred to the original paper for the details of the model structure.Fig. 1

Bottom Line: However, drawbacks associated with this approach are usually under-estimated.The hold-out validation strategy leads to biased conclusions, since it can lead to different validation and selection decisions when different partitioning schemes are used.Therefore, it proves to be a promising alternative to the standard hold-out validation strategy.

View Article: PubMed Central - PubMed

Affiliation: Biosystems Data Analysis Group, Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, The Netherlands. D.Hasdemir@uva.nl.

ABSTRACT

Background: Most ordinary differential equation (ODE) based modeling studies in systems biology involve a hold-out validation step for model validation. In this framework a pre-determined part of the data is used as validation data and, therefore it is not used for estimating the parameters of the model. The model is assumed to be validated if the model predictions on the validation dataset show good agreement with the data. Model selection between alternative model structures can also be performed in the same setting, based on the predictive power of the model structures on the validation dataset. However, drawbacks associated with this approach are usually under-estimated.

Results: We have carried out simulations by using a recently published High Osmolarity Glycerol (HOG) pathway from S.cerevisiae to demonstrate these drawbacks. We have shown that it is very important how the data is partitioned and which part of the data is used for validation purposes. The hold-out validation strategy leads to biased conclusions, since it can lead to different validation and selection decisions when different partitioning schemes are used. Furthermore, finding sensible partitioning schemes that would lead to reliable decisions are heavily dependent on the biology and unknown model parameters which turns the problem into a paradox. This brings the need for alternative validation approaches that offer flexible partitioning of the data. For this purpose, we have introduced a stratified random cross-validation (SRCV) approach that successfully overcomes these limitations.

Conclusions: SRCV leads to more stable decisions for both validation and selection which are not biased by underlying biological phenomena. Furthermore, it is less dependent on the specific noise realization in the data. Therefore, it proves to be a promising alternative to the standard hold-out validation strategy.

No MeSH data available.


Related in: MedlinePlus