Limits...
Identification of metabolic network models from incomplete high-throughput datasets.

Berthoumieux S, Brilli M, de Jong H, Kahn D, Cinquemani E - Bioinformatics (2011)

Bottom Line: We evaluate performance of our methods by comparison to existing approaches, and show that our EM method provides the best results over a variety of simulated scenarios.We then apply the EM algorithm to a real problem, the identification of a model for the Escherichia coli central carbon metabolism, based on challenging experimental data from the literature.This leads to promising results and allows us to highlight critical identification issues.

View Article: PubMed Central - PubMed

Affiliation: INRIA Grenoble-Rhône-Alpes, Montbonnot, France. sara.berthoumieux@inria.fr

ABSTRACT

Motivation: High-throughput measurement techniques for metabolism and gene expression provide a wealth of information for the identification of metabolic network models. Yet, missing observations scattered over the dataset restrict the number of effectively available datapoints and make classical regression techniques inaccurate or inapplicable. Thorough exploitation of the data by identification techniques that explicitly cope with missing observations is therefore of major importance.

Results: We develop a maximum-likelihood approach for the estimation of unknown parameters of metabolic network models that relies on the integration of statistical priors to compensate for the missing data. In the context of the linlog metabolic modeling framework, we implement the identification method by an Expectation-Maximization (EM) algorithm and by a simpler direct numerical optimization method. We evaluate performance of our methods by comparison to existing approaches, and show that our EM method provides the best results over a variety of simulated scenarios. We then apply the EM algorithm to a real problem, the identification of a model for the Escherichia coli central carbon metabolism, based on challenging experimental data from the literature. This leads to promising results and allows us to highlight critical identification issues.

Show MeSH

Related in: MedlinePlus

Statistics of estimated parameter values for datasets with 75% of missing data and 20% noise. The graphical notations are the same as for Figure 1. (A–F) Boxplots for reactions 3, 13, 17, 22, 19 and 25 of the network, respectively.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3117355&req=5

Figure 2: Statistics of estimated parameter values for datasets with 75% of missing data and 20% noise. The graphical notations are the same as for Figure 1. (A–F) Boxplots for reactions 3, 13, 17, 22, 19 and 25 of the network, respectively.

Mentions: The most informative results from all identification methods are summarized by boxplots of the ratio of the estimated parameter values c over the reference parameter values cref used to simulate the data. The closer the ratio to 1, the better the estimates. Ensemble statistics are drawn for all parameters corresponding to the same reaction. Figure 1 is dedicated to the scenario with 40% missing data and 10% noise, whereas Figure 2 reports on 75% missing data and 20% noise. Complete results for all reactions under all conditions can be found in Supplementary Section S3.Fig. 1.


Identification of metabolic network models from incomplete high-throughput datasets.

Berthoumieux S, Brilli M, de Jong H, Kahn D, Cinquemani E - Bioinformatics (2011)

Statistics of estimated parameter values for datasets with 75% of missing data and 20% noise. The graphical notations are the same as for Figure 1. (A–F) Boxplots for reactions 3, 13, 17, 22, 19 and 25 of the network, respectively.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3117355&req=5

Figure 2: Statistics of estimated parameter values for datasets with 75% of missing data and 20% noise. The graphical notations are the same as for Figure 1. (A–F) Boxplots for reactions 3, 13, 17, 22, 19 and 25 of the network, respectively.
Mentions: The most informative results from all identification methods are summarized by boxplots of the ratio of the estimated parameter values c over the reference parameter values cref used to simulate the data. The closer the ratio to 1, the better the estimates. Ensemble statistics are drawn for all parameters corresponding to the same reaction. Figure 1 is dedicated to the scenario with 40% missing data and 10% noise, whereas Figure 2 reports on 75% missing data and 20% noise. Complete results for all reactions under all conditions can be found in Supplementary Section S3.Fig. 1.

Bottom Line: We evaluate performance of our methods by comparison to existing approaches, and show that our EM method provides the best results over a variety of simulated scenarios.We then apply the EM algorithm to a real problem, the identification of a model for the Escherichia coli central carbon metabolism, based on challenging experimental data from the literature.This leads to promising results and allows us to highlight critical identification issues.

View Article: PubMed Central - PubMed

Affiliation: INRIA Grenoble-Rhône-Alpes, Montbonnot, France. sara.berthoumieux@inria.fr

ABSTRACT

Motivation: High-throughput measurement techniques for metabolism and gene expression provide a wealth of information for the identification of metabolic network models. Yet, missing observations scattered over the dataset restrict the number of effectively available datapoints and make classical regression techniques inaccurate or inapplicable. Thorough exploitation of the data by identification techniques that explicitly cope with missing observations is therefore of major importance.

Results: We develop a maximum-likelihood approach for the estimation of unknown parameters of metabolic network models that relies on the integration of statistical priors to compensate for the missing data. In the context of the linlog metabolic modeling framework, we implement the identification method by an Expectation-Maximization (EM) algorithm and by a simpler direct numerical optimization method. We evaluate performance of our methods by comparison to existing approaches, and show that our EM method provides the best results over a variety of simulated scenarios. We then apply the EM algorithm to a real problem, the identification of a model for the Escherichia coli central carbon metabolism, based on challenging experimental data from the literature. This leads to promising results and allows us to highlight critical identification issues.

Show MeSH
Related in: MedlinePlus