Limits...
Validation and selection of ODE based systems biology models: how to arrive at more reliable decisions.

Hasdemir D, Hoefsloot HC, Smilde AK - BMC Syst Biol (2015)

Bottom Line: However, drawbacks associated with this approach are usually under-estimated.The hold-out validation strategy leads to biased conclusions, since it can lead to different validation and selection decisions when different partitioning schemes are used.Therefore, it proves to be a promising alternative to the standard hold-out validation strategy.

View Article: PubMed Central - PubMed

Affiliation: Biosystems Data Analysis Group, Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, The Netherlands. D.Hasdemir@uva.nl.

ABSTRACT

Background: Most ordinary differential equation (ODE) based modeling studies in systems biology involve a hold-out validation step for model validation. In this framework a pre-determined part of the data is used as validation data and, therefore it is not used for estimating the parameters of the model. The model is assumed to be validated if the model predictions on the validation dataset show good agreement with the data. Model selection between alternative model structures can also be performed in the same setting, based on the predictive power of the model structures on the validation dataset. However, drawbacks associated with this approach are usually under-estimated.

Results: We have carried out simulations by using a recently published High Osmolarity Glycerol (HOG) pathway from S.cerevisiae to demonstrate these drawbacks. We have shown that it is very important how the data is partitioned and which part of the data is used for validation purposes. The hold-out validation strategy leads to biased conclusions, since it can lead to different validation and selection decisions when different partitioning schemes are used. Furthermore, finding sensible partitioning schemes that would lead to reliable decisions are heavily dependent on the biology and unknown model parameters which turns the problem into a paradox. This brings the need for alternative validation approaches that offer flexible partitioning of the data. For this purpose, we have introduced a stratified random cross-validation (SRCV) approach that successfully overcomes these limitations.

Conclusions: SRCV leads to more stable decisions for both validation and selection which are not biased by underlying biological phenomena. Furthermore, it is less dependent on the specific noise realization in the data. Therefore, it proves to be a promising alternative to the standard hold-out validation strategy.

No MeSH data available.


Related in: MedlinePlus

Percentage prediction errors (PE) and model separation (ΔTS) in the adapted scenarios. Each box plot shows the distribution of PE or ΔTS over 100 different realizations of the data obtained in a single scheme. The red dots indicate the outliers which lie outside approximately 99.3 % coverage if the data is normally distributed. Black, blue and green boxes in the first row of graphs refer to the Sln1/Sho1, Sln1/WT and Sho1/WT schemes. Cyan boxes refer to the stratified cross-validation (SRCV) schemes. Gray and yellow boxes in the second row refer to the low doses and high doses schemes. The labels on the x-axis indicate the medians of the PE or ΔTS distribution summarized visually by the box plots. The axis labels in the ΔTS graphs show also the number of wrong decisions given in each scheme. In each graph, ten realizations with the highest PE or ΔTS are located above the black dashed line. The region above this line is compressed for visual ease. a PE obtained in adapted cell type scenario. bΔTS obtained in adapted cell type scenario. c PE obtained in adapted dose scenario. dΔTS obtained in adapted dose scenario
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4493957&req=5

Fig15: Percentage prediction errors (PE) and model separation (ΔTS) in the adapted scenarios. Each box plot shows the distribution of PE or ΔTS over 100 different realizations of the data obtained in a single scheme. The red dots indicate the outliers which lie outside approximately 99.3 % coverage if the data is normally distributed. Black, blue and green boxes in the first row of graphs refer to the Sln1/Sho1, Sln1/WT and Sho1/WT schemes. Cyan boxes refer to the stratified cross-validation (SRCV) schemes. Gray and yellow boxes in the second row refer to the low doses and high doses schemes. The labels on the x-axis indicate the medians of the PE or ΔTS distribution summarized visually by the box plots. The axis labels in the ΔTS graphs show also the number of wrong decisions given in each scheme. In each graph, ten realizations with the highest PE or ΔTS are located above the black dashed line. The region above this line is compressed for visual ease. a PE obtained in adapted cell type scenario. bΔTS obtained in adapted cell type scenario. c PE obtained in adapted dose scenario. dΔTS obtained in adapted dose scenario

Mentions: Firstly, we compare the schemes in which the training set includes different cell types. The most important observation regarding these three schemes is the low predictive power in the Sln1/WT scheme as can be seen in Fig. 15a. This shows that when the models are trained without using the Sho1 data, validating them on Sho1 data is risky. On the contrary, when the Sln1 data is missing in the training set, we do not observe such low predictive power. The reasons for this can be traced back to the asymmetrical branch structure that we explained in detail in the Scenario 1 section. Therefore, we do not discuss those here again.Fig. 15


Validation and selection of ODE based systems biology models: how to arrive at more reliable decisions.

Hasdemir D, Hoefsloot HC, Smilde AK - BMC Syst Biol (2015)

Percentage prediction errors (PE) and model separation (ΔTS) in the adapted scenarios. Each box plot shows the distribution of PE or ΔTS over 100 different realizations of the data obtained in a single scheme. The red dots indicate the outliers which lie outside approximately 99.3 % coverage if the data is normally distributed. Black, blue and green boxes in the first row of graphs refer to the Sln1/Sho1, Sln1/WT and Sho1/WT schemes. Cyan boxes refer to the stratified cross-validation (SRCV) schemes. Gray and yellow boxes in the second row refer to the low doses and high doses schemes. The labels on the x-axis indicate the medians of the PE or ΔTS distribution summarized visually by the box plots. The axis labels in the ΔTS graphs show also the number of wrong decisions given in each scheme. In each graph, ten realizations with the highest PE or ΔTS are located above the black dashed line. The region above this line is compressed for visual ease. a PE obtained in adapted cell type scenario. bΔTS obtained in adapted cell type scenario. c PE obtained in adapted dose scenario. dΔTS obtained in adapted dose scenario
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4493957&req=5

Fig15: Percentage prediction errors (PE) and model separation (ΔTS) in the adapted scenarios. Each box plot shows the distribution of PE or ΔTS over 100 different realizations of the data obtained in a single scheme. The red dots indicate the outliers which lie outside approximately 99.3 % coverage if the data is normally distributed. Black, blue and green boxes in the first row of graphs refer to the Sln1/Sho1, Sln1/WT and Sho1/WT schemes. Cyan boxes refer to the stratified cross-validation (SRCV) schemes. Gray and yellow boxes in the second row refer to the low doses and high doses schemes. The labels on the x-axis indicate the medians of the PE or ΔTS distribution summarized visually by the box plots. The axis labels in the ΔTS graphs show also the number of wrong decisions given in each scheme. In each graph, ten realizations with the highest PE or ΔTS are located above the black dashed line. The region above this line is compressed for visual ease. a PE obtained in adapted cell type scenario. bΔTS obtained in adapted cell type scenario. c PE obtained in adapted dose scenario. dΔTS obtained in adapted dose scenario
Mentions: Firstly, we compare the schemes in which the training set includes different cell types. The most important observation regarding these three schemes is the low predictive power in the Sln1/WT scheme as can be seen in Fig. 15a. This shows that when the models are trained without using the Sho1 data, validating them on Sho1 data is risky. On the contrary, when the Sln1 data is missing in the training set, we do not observe such low predictive power. The reasons for this can be traced back to the asymmetrical branch structure that we explained in detail in the Scenario 1 section. Therefore, we do not discuss those here again.Fig. 15

Bottom Line: However, drawbacks associated with this approach are usually under-estimated.The hold-out validation strategy leads to biased conclusions, since it can lead to different validation and selection decisions when different partitioning schemes are used.Therefore, it proves to be a promising alternative to the standard hold-out validation strategy.

View Article: PubMed Central - PubMed

Affiliation: Biosystems Data Analysis Group, Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, The Netherlands. D.Hasdemir@uva.nl.

ABSTRACT

Background: Most ordinary differential equation (ODE) based modeling studies in systems biology involve a hold-out validation step for model validation. In this framework a pre-determined part of the data is used as validation data and, therefore it is not used for estimating the parameters of the model. The model is assumed to be validated if the model predictions on the validation dataset show good agreement with the data. Model selection between alternative model structures can also be performed in the same setting, based on the predictive power of the model structures on the validation dataset. However, drawbacks associated with this approach are usually under-estimated.

Results: We have carried out simulations by using a recently published High Osmolarity Glycerol (HOG) pathway from S.cerevisiae to demonstrate these drawbacks. We have shown that it is very important how the data is partitioned and which part of the data is used for validation purposes. The hold-out validation strategy leads to biased conclusions, since it can lead to different validation and selection decisions when different partitioning schemes are used. Furthermore, finding sensible partitioning schemes that would lead to reliable decisions are heavily dependent on the biology and unknown model parameters which turns the problem into a paradox. This brings the need for alternative validation approaches that offer flexible partitioning of the data. For this purpose, we have introduced a stratified random cross-validation (SRCV) approach that successfully overcomes these limitations.

Conclusions: SRCV leads to more stable decisions for both validation and selection which are not biased by underlying biological phenomena. Furthermore, it is less dependent on the specific noise realization in the data. Therefore, it proves to be a promising alternative to the standard hold-out validation strategy.

No MeSH data available.


Related in: MedlinePlus