Limits...
Rapid Prediction of Bacterial Heterotrophic Fluxomics Using Machine Learning and Constraint Programming.

Wu SG, Wang Y, Jiang W, Oyetunde T, Yao R, Zhang X, Shimizu K, Tang YJ, Bao FS - PLoS Comput. Biol. (2016)

Bottom Line: Mining the relationship between environmental and genetic factors and metabolic fluxes hidden in existing fluxomic data will lead to predictive models that can significantly accelerate flux quantification.Due to the interest of studying model organism under particular carbon sources, bias of fluxome in the dataset may limit the applicability of machine learning models.This problem can be resolved after more papers on 13C-MFA are published for non-model species.

View Article: PubMed Central - PubMed

Affiliation: Department of Energy, Environmental and Chemical Engineering, Washington University in St. Louis, St. Louis, Missouri, United States of America.

ABSTRACT
13C metabolic flux analysis (13C-MFA) has been widely used to measure in vivo enzyme reaction rates (i.e., metabolic flux) in microorganisms. Mining the relationship between environmental and genetic factors and metabolic fluxes hidden in existing fluxomic data will lead to predictive models that can significantly accelerate flux quantification. In this paper, we present a web-based platform MFlux (http://mflux.org) that predicts the bacterial central metabolism via machine learning, leveraging data from approximately 100 13C-MFA papers on heterotrophic bacterial metabolisms. Three machine learning methods, namely Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), and Decision Tree, were employed to study the sophisticated relationship between influential factors and metabolic fluxes. We performed a grid search of the best parameter set for each algorithm and verified their performance through 10-fold cross validations. SVM yields the highest accuracy among all three algorithms. Further, we employed quadratic programming to adjust flux profiles to satisfy stoichiometric constraints. Multiple case studies have shown that MFlux can reasonably predict fluxomes as a function of bacterial species, substrate types, growth rate, oxygen conditions, and cultivation methods. Due to the interest of studying model organism under particular carbon sources, bias of fluxome in the dataset may limit the applicability of machine learning models. This problem can be resolved after more papers on 13C-MFA are published for non-model species.

Show MeSH

Related in: MedlinePlus

A comparison between linear-kernel SVM and RBF-kernel SVM.The best cross-validation results of linear kernel and RBF kernel after grid searches on WP dataset are very similar. The RBF kernel is employed in the final model for flux prediction.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4836714&req=5

pcbi.1004838.g006: A comparison between linear-kernel SVM and RBF-kernel SVM.The best cross-validation results of linear kernel and RBF kernel after grid searches on WP dataset are very similar. The RBF kernel is employed in the final model for flux prediction.

Mentions: Better cross validation was expected from the SVM models trained on the WT dataset, rather than on the WP dataset, while sophisticated genetic variations are not included in the WT dataset. However, cross-validation results refuted our initial thought: the models from the WP dataset demonstrated significantly better performance than those trained on the WT dataset (data shown in Fig 5). This result can be interpreted as that the size of the training set is a major factor affecting the model quality, especially when the training set is relatively small (the sizes of WT and WP datasets are about 150 and 450 samples, respectively). We also compared the SVM results using the linear kernel with those using the RBF kernel, and the RBF kernel showed slightly better performance (Fig 6). The parameter set producing the most accurate cross-validation result was used to configure MFlux. Notably, prediction on v11 (the second step of the oxidative PP pathway) and v24 (the glyoxylate shunt) have relatively large variations. Two factors may contribute to this fact. Both v11 and v24 have relatively narrow ranges (see Fig 1) and consequently even small numerical variations will generate larger relative errors for both fluxes. Meanwhile, genetic modifications may influence both v11 (e.g., zwf knockout [40]) and v24 (e.g., ppc knockout [41]) significantly. For instance, knocking out zwf in E. coli will cause a zero flux in v10 (the oxidative pentose phosphate pathway, OPP pathway) [42]. However, the lack of sufficient information on flux re-organization mechanisms in engineered microbes reduces ML predictability. This is because most engineered microbial fluxomics studies are focused on a few model species such as E. coli. To resolve this problem, the MFlux platform allows the users to manually set the boundaries of central fluxes to improve prediction quality (e.g., setting a zero flux through the OPP pathway for E. coli zwf mutant).


Rapid Prediction of Bacterial Heterotrophic Fluxomics Using Machine Learning and Constraint Programming.

Wu SG, Wang Y, Jiang W, Oyetunde T, Yao R, Zhang X, Shimizu K, Tang YJ, Bao FS - PLoS Comput. Biol. (2016)

A comparison between linear-kernel SVM and RBF-kernel SVM.The best cross-validation results of linear kernel and RBF kernel after grid searches on WP dataset are very similar. The RBF kernel is employed in the final model for flux prediction.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4836714&req=5

pcbi.1004838.g006: A comparison between linear-kernel SVM and RBF-kernel SVM.The best cross-validation results of linear kernel and RBF kernel after grid searches on WP dataset are very similar. The RBF kernel is employed in the final model for flux prediction.
Mentions: Better cross validation was expected from the SVM models trained on the WT dataset, rather than on the WP dataset, while sophisticated genetic variations are not included in the WT dataset. However, cross-validation results refuted our initial thought: the models from the WP dataset demonstrated significantly better performance than those trained on the WT dataset (data shown in Fig 5). This result can be interpreted as that the size of the training set is a major factor affecting the model quality, especially when the training set is relatively small (the sizes of WT and WP datasets are about 150 and 450 samples, respectively). We also compared the SVM results using the linear kernel with those using the RBF kernel, and the RBF kernel showed slightly better performance (Fig 6). The parameter set producing the most accurate cross-validation result was used to configure MFlux. Notably, prediction on v11 (the second step of the oxidative PP pathway) and v24 (the glyoxylate shunt) have relatively large variations. Two factors may contribute to this fact. Both v11 and v24 have relatively narrow ranges (see Fig 1) and consequently even small numerical variations will generate larger relative errors for both fluxes. Meanwhile, genetic modifications may influence both v11 (e.g., zwf knockout [40]) and v24 (e.g., ppc knockout [41]) significantly. For instance, knocking out zwf in E. coli will cause a zero flux in v10 (the oxidative pentose phosphate pathway, OPP pathway) [42]. However, the lack of sufficient information on flux re-organization mechanisms in engineered microbes reduces ML predictability. This is because most engineered microbial fluxomics studies are focused on a few model species such as E. coli. To resolve this problem, the MFlux platform allows the users to manually set the boundaries of central fluxes to improve prediction quality (e.g., setting a zero flux through the OPP pathway for E. coli zwf mutant).

Bottom Line: Mining the relationship between environmental and genetic factors and metabolic fluxes hidden in existing fluxomic data will lead to predictive models that can significantly accelerate flux quantification.Due to the interest of studying model organism under particular carbon sources, bias of fluxome in the dataset may limit the applicability of machine learning models.This problem can be resolved after more papers on 13C-MFA are published for non-model species.

View Article: PubMed Central - PubMed

Affiliation: Department of Energy, Environmental and Chemical Engineering, Washington University in St. Louis, St. Louis, Missouri, United States of America.

ABSTRACT
13C metabolic flux analysis (13C-MFA) has been widely used to measure in vivo enzyme reaction rates (i.e., metabolic flux) in microorganisms. Mining the relationship between environmental and genetic factors and metabolic fluxes hidden in existing fluxomic data will lead to predictive models that can significantly accelerate flux quantification. In this paper, we present a web-based platform MFlux (http://mflux.org) that predicts the bacterial central metabolism via machine learning, leveraging data from approximately 100 13C-MFA papers on heterotrophic bacterial metabolisms. Three machine learning methods, namely Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), and Decision Tree, were employed to study the sophisticated relationship between influential factors and metabolic fluxes. We performed a grid search of the best parameter set for each algorithm and verified their performance through 10-fold cross validations. SVM yields the highest accuracy among all three algorithms. Further, we employed quadratic programming to adjust flux profiles to satisfy stoichiometric constraints. Multiple case studies have shown that MFlux can reasonably predict fluxomes as a function of bacterial species, substrate types, growth rate, oxygen conditions, and cultivation methods. Due to the interest of studying model organism under particular carbon sources, bias of fluxome in the dataset may limit the applicability of machine learning models. This problem can be resolved after more papers on 13C-MFA are published for non-model species.

Show MeSH
Related in: MedlinePlus