Limits...
Validation and selection of ODE based systems biology models: how to arrive at more reliable decisions.

Hasdemir D, Hoefsloot HC, Smilde AK - BMC Syst Biol (2015)

Bottom Line: However, drawbacks associated with this approach are usually under-estimated.The hold-out validation strategy leads to biased conclusions, since it can lead to different validation and selection decisions when different partitioning schemes are used.Therefore, it proves to be a promising alternative to the standard hold-out validation strategy.

View Article: PubMed Central - PubMed

Affiliation: Biosystems Data Analysis Group, Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, The Netherlands. D.Hasdemir@uva.nl.

ABSTRACT

Background: Most ordinary differential equation (ODE) based modeling studies in systems biology involve a hold-out validation step for model validation. In this framework a pre-determined part of the data is used as validation data and, therefore it is not used for estimating the parameters of the model. The model is assumed to be validated if the model predictions on the validation dataset show good agreement with the data. Model selection between alternative model structures can also be performed in the same setting, based on the predictive power of the model structures on the validation dataset. However, drawbacks associated with this approach are usually under-estimated.

Results: We have carried out simulations by using a recently published High Osmolarity Glycerol (HOG) pathway from S.cerevisiae to demonstrate these drawbacks. We have shown that it is very important how the data is partitioned and which part of the data is used for validation purposes. The hold-out validation strategy leads to biased conclusions, since it can lead to different validation and selection decisions when different partitioning schemes are used. Furthermore, finding sensible partitioning schemes that would lead to reliable decisions are heavily dependent on the biology and unknown model parameters which turns the problem into a paradox. This brings the need for alternative validation approaches that offer flexible partitioning of the data. For this purpose, we have introduced a stratified random cross-validation (SRCV) approach that successfully overcomes these limitations.

Conclusions: SRCV leads to more stable decisions for both validation and selection which are not biased by underlying biological phenomena. Furthermore, it is less dependent on the specific noise realization in the data. Therefore, it proves to be a promising alternative to the standard hold-out validation strategy.

No MeSH data available.


Related in: MedlinePlus

Number of wrong decisions in scenario 2. Bars show the number of realizations in which the simplified model gave lower residuals than the true model structure and therefore, was wrongly selected over the true model structure. Gray and yellow bars refer to the lowest and the highest dose schemes. The labels on the x-axis show the specific dose and the cell type of the data on which the validation was performed. Here, only the twelve validation subsets which could be used in both the lowest and the highest schemes are shown
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4493957&req=5

Fig13: Number of wrong decisions in scenario 2. Bars show the number of realizations in which the simplified model gave lower residuals than the true model structure and therefore, was wrongly selected over the true model structure. Gray and yellow bars refer to the lowest and the highest dose schemes. The labels on the x-axis show the specific dose and the cell type of the data on which the validation was performed. Here, only the twelve validation subsets which could be used in both the lowest and the highest schemes are shown

Mentions: When it comes to model selection, we face a different challenge. Figure 13 shows the number of realizations in which the simplified model structure was selected over the true model structure. For example, the lowest dose scheme results in 22 wrong decisions whereas the highest dose scheme results in only 2 wrong decisions when the 0.1 M. Sln1 dataset is used as validation dataset as can be seen in the upper left hand side corner of Fig. 13. Here, only the results on validation sets that can be used in both schemes are shown because our focus is on comparing the performance of two different schemes on shared validation sets. The most important observation from the figure is that the number of wrong decisions by the lowest scheme is higher on the 0.1 M. - 0.2 M. Sln1 and WT data compared to the highest dose scheme. The number of wrong decisions by the lowest scheme is very high (22 and 30 on the Sln1 and WT validation data, respectively) especially on the 0.1 M. dose which is very close to the 0.07 M. dose where the models were trained. In addition, we see that the highest scheme gives a slightly higher number of wrong decisions compared to the lowest dose scheme on the 0.6 M. Sln1 and WT data. These observations suggest that model selection is problematic when the training and validation sets are too close to each other. We looked further at the model separation between the true and the simplified models (see Fig. 14) to investigate the separation between the two model structures in higher resolution.Fig. 13


Validation and selection of ODE based systems biology models: how to arrive at more reliable decisions.

Hasdemir D, Hoefsloot HC, Smilde AK - BMC Syst Biol (2015)

Number of wrong decisions in scenario 2. Bars show the number of realizations in which the simplified model gave lower residuals than the true model structure and therefore, was wrongly selected over the true model structure. Gray and yellow bars refer to the lowest and the highest dose schemes. The labels on the x-axis show the specific dose and the cell type of the data on which the validation was performed. Here, only the twelve validation subsets which could be used in both the lowest and the highest schemes are shown
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4493957&req=5

Fig13: Number of wrong decisions in scenario 2. Bars show the number of realizations in which the simplified model gave lower residuals than the true model structure and therefore, was wrongly selected over the true model structure. Gray and yellow bars refer to the lowest and the highest dose schemes. The labels on the x-axis show the specific dose and the cell type of the data on which the validation was performed. Here, only the twelve validation subsets which could be used in both the lowest and the highest schemes are shown
Mentions: When it comes to model selection, we face a different challenge. Figure 13 shows the number of realizations in which the simplified model structure was selected over the true model structure. For example, the lowest dose scheme results in 22 wrong decisions whereas the highest dose scheme results in only 2 wrong decisions when the 0.1 M. Sln1 dataset is used as validation dataset as can be seen in the upper left hand side corner of Fig. 13. Here, only the results on validation sets that can be used in both schemes are shown because our focus is on comparing the performance of two different schemes on shared validation sets. The most important observation from the figure is that the number of wrong decisions by the lowest scheme is higher on the 0.1 M. - 0.2 M. Sln1 and WT data compared to the highest dose scheme. The number of wrong decisions by the lowest scheme is very high (22 and 30 on the Sln1 and WT validation data, respectively) especially on the 0.1 M. dose which is very close to the 0.07 M. dose where the models were trained. In addition, we see that the highest scheme gives a slightly higher number of wrong decisions compared to the lowest dose scheme on the 0.6 M. Sln1 and WT data. These observations suggest that model selection is problematic when the training and validation sets are too close to each other. We looked further at the model separation between the true and the simplified models (see Fig. 14) to investigate the separation between the two model structures in higher resolution.Fig. 13

Bottom Line: However, drawbacks associated with this approach are usually under-estimated.The hold-out validation strategy leads to biased conclusions, since it can lead to different validation and selection decisions when different partitioning schemes are used.Therefore, it proves to be a promising alternative to the standard hold-out validation strategy.

View Article: PubMed Central - PubMed

Affiliation: Biosystems Data Analysis Group, Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, The Netherlands. D.Hasdemir@uva.nl.

ABSTRACT

Background: Most ordinary differential equation (ODE) based modeling studies in systems biology involve a hold-out validation step for model validation. In this framework a pre-determined part of the data is used as validation data and, therefore it is not used for estimating the parameters of the model. The model is assumed to be validated if the model predictions on the validation dataset show good agreement with the data. Model selection between alternative model structures can also be performed in the same setting, based on the predictive power of the model structures on the validation dataset. However, drawbacks associated with this approach are usually under-estimated.

Results: We have carried out simulations by using a recently published High Osmolarity Glycerol (HOG) pathway from S.cerevisiae to demonstrate these drawbacks. We have shown that it is very important how the data is partitioned and which part of the data is used for validation purposes. The hold-out validation strategy leads to biased conclusions, since it can lead to different validation and selection decisions when different partitioning schemes are used. Furthermore, finding sensible partitioning schemes that would lead to reliable decisions are heavily dependent on the biology and unknown model parameters which turns the problem into a paradox. This brings the need for alternative validation approaches that offer flexible partitioning of the data. For this purpose, we have introduced a stratified random cross-validation (SRCV) approach that successfully overcomes these limitations.

Conclusions: SRCV leads to more stable decisions for both validation and selection which are not biased by underlying biological phenomena. Furthermore, it is less dependent on the specific noise realization in the data. Therefore, it proves to be a promising alternative to the standard hold-out validation strategy.

No MeSH data available.


Related in: MedlinePlus