Validation and selection of ODE based systems biology models: how to arrive at more reliable decisions.
Bottom Line:
However, drawbacks associated with this approach are usually under-estimated.The hold-out validation strategy leads to biased conclusions, since it can lead to different validation and selection decisions when different partitioning schemes are used.Therefore, it proves to be a promising alternative to the standard hold-out validation strategy.
View Article:
PubMed Central - PubMed
Affiliation: Biosystems Data Analysis Group, Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, The Netherlands. D.Hasdemir@uva.nl.
ABSTRACT
Background: Most ordinary differential equation (ODE) based modeling studies in systems biology involve a hold-out validation step for model validation. In this framework a pre-determined part of the data is used as validation data and, therefore it is not used for estimating the parameters of the model. The model is assumed to be validated if the model predictions on the validation dataset show good agreement with the data. Model selection between alternative model structures can also be performed in the same setting, based on the predictive power of the model structures on the validation dataset. However, drawbacks associated with this approach are usually under-estimated. Results: We have carried out simulations by using a recently published High Osmolarity Glycerol (HOG) pathway from S.cerevisiae to demonstrate these drawbacks. We have shown that it is very important how the data is partitioned and which part of the data is used for validation purposes. The hold-out validation strategy leads to biased conclusions, since it can lead to different validation and selection decisions when different partitioning schemes are used. Furthermore, finding sensible partitioning schemes that would lead to reliable decisions are heavily dependent on the biology and unknown model parameters which turns the problem into a paradox. This brings the need for alternative validation approaches that offer flexible partitioning of the data. For this purpose, we have introduced a stratified random cross-validation (SRCV) approach that successfully overcomes these limitations. Conclusions: SRCV leads to more stable decisions for both validation and selection which are not biased by underlying biological phenomena. Furthermore, it is less dependent on the specific noise realization in the data. Therefore, it proves to be a promising alternative to the standard hold-out validation strategy. No MeSH data available. Related in: MedlinePlus |
Related In:
Results -
Collection
License 1 - License 2 getmorefigures.php?uid=PMC4493957&req=5
Mentions: The difference between the true and the simplified model predictions (ΔTS) can be calculated by using the trapezoidal rule as in Equation 3. With this method, the area between two curves can be approximated as a series of trapezoids (see Fig. 7). The sum of the areas of the trapezoids provide a good approximation of the area between the curves when the number of trapezoids are sufficiently high. Here, the two curves are the profiles of the Hog1PP predicted by the true and the simplified model structures. We normalize the calculated area with respect to the maximum of the Hog1PP data in the corresponding validation subset. Large areas between the two curves mean that the separation of the two model structures is easier. Therefore, when correct model selection decisions are given, model separation (ΔTS) can be used as an additional criteria of enhanced model selection.(3)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$ \begin{aligned} & \Delta {TS}_{i} =\frac{\sum_{k}^{K-1}{\frac{\left/T\left(t_{k+1}\right) - S\left(t_{k+1}\right)\right/ + \left/T\left(t_{k}\right) - S\left(t_{k}\right)\right/}{2}.\left(t_{k+1}-t_{k}\right)}}{max\left(x_{ij}\right)}\\ & \Delta TS = \frac{{\sum_{i}^{I}}{\Delta {TS}_{i}}}{I}\\ & \text{T: numerical values of the Hog1PP predictions by the}\\ &\quad\text{true model structure}\\ & \text{S: numerical values of the Hog1PP predictions by the}\\ &\quad\text{simplified model structure}\\ & \text{k} = \text{1:K-1 index for trapezoids}\\ & \text{i} = \text{1:I index for validation subsets of Hog1PP data}\\ & \text{j} = \text{1:15 index for time points}\\ & \text{I} = \text{total number of validation subsets} \end{aligned} $$ \end{document}ΔTSi=∑kK−1Ttk+1−Stk+1+Ttk−Stk2.tk+1−tkmaxxijΔTS=∑iIΔTSiIT: numerical values of the Hog1PP predictions by thetrue model structureS: numerical values of the Hog1PP predictions by thesimplified model structurek=1:K-1 index for trapezoidsi=1:I index for validation subsets of Hog1PP dataj=1:15 index for time pointsI=total number of validation subsetsFig. 7 |
View Article: PubMed Central - PubMed
Affiliation: Biosystems Data Analysis Group, Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, The Netherlands. D.Hasdemir@uva.nl.
Background: Most ordinary differential equation (ODE) based modeling studies in systems biology involve a hold-out validation step for model validation. In this framework a pre-determined part of the data is used as validation data and, therefore it is not used for estimating the parameters of the model. The model is assumed to be validated if the model predictions on the validation dataset show good agreement with the data. Model selection between alternative model structures can also be performed in the same setting, based on the predictive power of the model structures on the validation dataset. However, drawbacks associated with this approach are usually under-estimated.
Results: We have carried out simulations by using a recently published High Osmolarity Glycerol (HOG) pathway from S.cerevisiae to demonstrate these drawbacks. We have shown that it is very important how the data is partitioned and which part of the data is used for validation purposes. The hold-out validation strategy leads to biased conclusions, since it can lead to different validation and selection decisions when different partitioning schemes are used. Furthermore, finding sensible partitioning schemes that would lead to reliable decisions are heavily dependent on the biology and unknown model parameters which turns the problem into a paradox. This brings the need for alternative validation approaches that offer flexible partitioning of the data. For this purpose, we have introduced a stratified random cross-validation (SRCV) approach that successfully overcomes these limitations.
Conclusions: SRCV leads to more stable decisions for both validation and selection which are not biased by underlying biological phenomena. Furthermore, it is less dependent on the specific noise realization in the data. Therefore, it proves to be a promising alternative to the standard hold-out validation strategy.
No MeSH data available.