Limits...
Validation and selection of ODE based systems biology models: how to arrive at more reliable decisions.

Hasdemir D, Hoefsloot HC, Smilde AK - BMC Syst Biol (2015)

Bottom Line: However, drawbacks associated with this approach are usually under-estimated.The hold-out validation strategy leads to biased conclusions, since it can lead to different validation and selection decisions when different partitioning schemes are used.Therefore, it proves to be a promising alternative to the standard hold-out validation strategy.

View Article: PubMed Central - PubMed

Affiliation: Biosystems Data Analysis Group, Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, The Netherlands. D.Hasdemir@uva.nl.

ABSTRACT

Background: Most ordinary differential equation (ODE) based modeling studies in systems biology involve a hold-out validation step for model validation. In this framework a pre-determined part of the data is used as validation data and, therefore it is not used for estimating the parameters of the model. The model is assumed to be validated if the model predictions on the validation dataset show good agreement with the data. Model selection between alternative model structures can also be performed in the same setting, based on the predictive power of the model structures on the validation dataset. However, drawbacks associated with this approach are usually under-estimated.

Results: We have carried out simulations by using a recently published High Osmolarity Glycerol (HOG) pathway from S.cerevisiae to demonstrate these drawbacks. We have shown that it is very important how the data is partitioned and which part of the data is used for validation purposes. The hold-out validation strategy leads to biased conclusions, since it can lead to different validation and selection decisions when different partitioning schemes are used. Furthermore, finding sensible partitioning schemes that would lead to reliable decisions are heavily dependent on the biology and unknown model parameters which turns the problem into a paradox. This brings the need for alternative validation approaches that offer flexible partitioning of the data. For this purpose, we have introduced a stratified random cross-validation (SRCV) approach that successfully overcomes these limitations.

Conclusions: SRCV leads to more stable decisions for both validation and selection which are not biased by underlying biological phenomena. Furthermore, it is less dependent on the specific noise realization in the data. Therefore, it proves to be a promising alternative to the standard hold-out validation strategy.

No MeSH data available.


Related in: MedlinePlus

Experimental conditions under which the data was generated. Check marks indicate the measurements that were performed. Each row shows a different dose in a different cell type whereas columns are for different biochemical species measured. Hog1PP data consists of 18 subsets (6 different doses and 3 different cell types) and is the main subject of variability between different partitioning schemes that we evaluated
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4493957&req=5

Fig2: Experimental conditions under which the data was generated. Check marks indicate the measurements that were performed. Each row shows a different dose in a different cell type whereas columns are for different biochemical species measured. Hog1PP data consists of 18 subsets (6 different doses and 3 different cell types) and is the main subject of variability between different partitioning schemes that we evaluated

Mentions: We mimicked the real experimental conditions used in [25] when generating the data. These include different cell types and different NaCl doses. The different cell types were deletion mutants in which only the signaling branch through Sln1 activation or Sho1 activation was active and the wild type cell in which both branches were active (Fig. 1). The different NaCl shock levels ranged between 0.07 and 0.8 M (Fig. 2). The data consisted mainly of the ratio of the active phosphorylated Hog1 protein to the maximum Hog1 protein level observed in the wild type cell which was expressed as a percentage. The Hog1 protein phosphorylation percentage data (Hog1PP data) from 3 cell types and 6 doses formed 18 different subsets of Hog1PP data. We used different subsets for parameter estimation and model validation/selection each time within different data partitioning schemes which we explain in detail in the following section. Concentration data of other species in the model were essential for the estimation of the parameters downstream from the Hog1 protein. For this reason, measurements of mRNA, protein and glycerol levels at 0.5 M. NaCl shock were always a part of the training dataset. Therefore, the terms ’validation data’ and ’training data’ refer only to Hog1PP data, throughout the text.Fig. 2


Validation and selection of ODE based systems biology models: how to arrive at more reliable decisions.

Hasdemir D, Hoefsloot HC, Smilde AK - BMC Syst Biol (2015)

Experimental conditions under which the data was generated. Check marks indicate the measurements that were performed. Each row shows a different dose in a different cell type whereas columns are for different biochemical species measured. Hog1PP data consists of 18 subsets (6 different doses and 3 different cell types) and is the main subject of variability between different partitioning schemes that we evaluated
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4493957&req=5

Fig2: Experimental conditions under which the data was generated. Check marks indicate the measurements that were performed. Each row shows a different dose in a different cell type whereas columns are for different biochemical species measured. Hog1PP data consists of 18 subsets (6 different doses and 3 different cell types) and is the main subject of variability between different partitioning schemes that we evaluated
Mentions: We mimicked the real experimental conditions used in [25] when generating the data. These include different cell types and different NaCl doses. The different cell types were deletion mutants in which only the signaling branch through Sln1 activation or Sho1 activation was active and the wild type cell in which both branches were active (Fig. 1). The different NaCl shock levels ranged between 0.07 and 0.8 M (Fig. 2). The data consisted mainly of the ratio of the active phosphorylated Hog1 protein to the maximum Hog1 protein level observed in the wild type cell which was expressed as a percentage. The Hog1 protein phosphorylation percentage data (Hog1PP data) from 3 cell types and 6 doses formed 18 different subsets of Hog1PP data. We used different subsets for parameter estimation and model validation/selection each time within different data partitioning schemes which we explain in detail in the following section. Concentration data of other species in the model were essential for the estimation of the parameters downstream from the Hog1 protein. For this reason, measurements of mRNA, protein and glycerol levels at 0.5 M. NaCl shock were always a part of the training dataset. Therefore, the terms ’validation data’ and ’training data’ refer only to Hog1PP data, throughout the text.Fig. 2

Bottom Line: However, drawbacks associated with this approach are usually under-estimated.The hold-out validation strategy leads to biased conclusions, since it can lead to different validation and selection decisions when different partitioning schemes are used.Therefore, it proves to be a promising alternative to the standard hold-out validation strategy.

View Article: PubMed Central - PubMed

Affiliation: Biosystems Data Analysis Group, Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, The Netherlands. D.Hasdemir@uva.nl.

ABSTRACT

Background: Most ordinary differential equation (ODE) based modeling studies in systems biology involve a hold-out validation step for model validation. In this framework a pre-determined part of the data is used as validation data and, therefore it is not used for estimating the parameters of the model. The model is assumed to be validated if the model predictions on the validation dataset show good agreement with the data. Model selection between alternative model structures can also be performed in the same setting, based on the predictive power of the model structures on the validation dataset. However, drawbacks associated with this approach are usually under-estimated.

Results: We have carried out simulations by using a recently published High Osmolarity Glycerol (HOG) pathway from S.cerevisiae to demonstrate these drawbacks. We have shown that it is very important how the data is partitioned and which part of the data is used for validation purposes. The hold-out validation strategy leads to biased conclusions, since it can lead to different validation and selection decisions when different partitioning schemes are used. Furthermore, finding sensible partitioning schemes that would lead to reliable decisions are heavily dependent on the biology and unknown model parameters which turns the problem into a paradox. This brings the need for alternative validation approaches that offer flexible partitioning of the data. For this purpose, we have introduced a stratified random cross-validation (SRCV) approach that successfully overcomes these limitations.

Conclusions: SRCV leads to more stable decisions for both validation and selection which are not biased by underlying biological phenomena. Furthermore, it is less dependent on the specific noise realization in the data. Therefore, it proves to be a promising alternative to the standard hold-out validation strategy.

No MeSH data available.


Related in: MedlinePlus