Limits...
Rapid Prediction of Bacterial Heterotrophic Fluxomics Using Machine Learning and Constraint Programming.

Wu SG, Wang Y, Jiang W, Oyetunde T, Yao R, Zhang X, Shimizu K, Tang YJ, Bao FS - PLoS Comput. Biol. (2016)

Bottom Line: Mining the relationship between environmental and genetic factors and metabolic fluxes hidden in existing fluxomic data will lead to predictive models that can significantly accelerate flux quantification.Due to the interest of studying model organism under particular carbon sources, bias of fluxome in the dataset may limit the applicability of machine learning models.This problem can be resolved after more papers on 13C-MFA are published for non-model species.

View Article: PubMed Central - PubMed

Affiliation: Department of Energy, Environmental and Chemical Engineering, Washington University in St. Louis, St. Louis, Missouri, United States of America.

ABSTRACT
13C metabolic flux analysis (13C-MFA) has been widely used to measure in vivo enzyme reaction rates (i.e., metabolic flux) in microorganisms. Mining the relationship between environmental and genetic factors and metabolic fluxes hidden in existing fluxomic data will lead to predictive models that can significantly accelerate flux quantification. In this paper, we present a web-based platform MFlux (http://mflux.org) that predicts the bacterial central metabolism via machine learning, leveraging data from approximately 100 13C-MFA papers on heterotrophic bacterial metabolisms. Three machine learning methods, namely Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), and Decision Tree, were employed to study the sophisticated relationship between influential factors and metabolic fluxes. We performed a grid search of the best parameter set for each algorithm and verified their performance through 10-fold cross validations. SVM yields the highest accuracy among all three algorithms. Further, we employed quadratic programming to adjust flux profiles to satisfy stoichiometric constraints. Multiple case studies have shown that MFlux can reasonably predict fluxomes as a function of bacterial species, substrate types, growth rate, oxygen conditions, and cultivation methods. Due to the interest of studying model organism under particular carbon sources, bias of fluxome in the dataset may limit the applicability of machine learning models. This problem can be resolved after more papers on 13C-MFA are published for non-model species.

Show MeSH

Related in: MedlinePlus

A universal central metabolic pathway for bacteria.The central carbon metabolic pathway is simplified into 29 fluxes, used as the outputs of our model.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4836714&req=5

pcbi.1004838.g001: A universal central metabolic pathway for bacteria.The central carbon metabolic pathway is simplified into 29 fluxes, used as the outputs of our model.

Mentions: As mentioned earlier, supervised ML builds models based on the samples, each of which is a pair of a feature vector and a target. Based on published 13C-MFA methodologies and microbial physiologies, we proposed five categorical features: species, nutrient types, oxygen conditions, engineering method, genetic background, and cultivation methods. There were two considerations when choosing those features. First, genetic modifications can significantly re-organize fluxomes. To improve the predictability on mutant strains, our platform allows toggling on or off certain central pathways (by manually setting the flux boundaries) in engineered strains. Second, the factor of cultivation method aims to reveal fluxome differences between shake flask cultures (a pseudo-steady state approach) and bioreactor cultures (a well-controlled fermentation or chemostat cultivation). Meanwhile, we introduced sixteen continuous features: growth rate, substrate uptake rate, and the ratio of multiple substrate uptakes (glucose, fructose, galactose, gluconate, glutamate, citrate, xylose, succinate, malate, lactate, pyruvate, glycerol, acetate and NaHCO3, as shown in Fig 1). Since the features include both categorical and continuous ones, one-hot encoders were used to convert categorical feature values into real numbers. Each feature was then standardized into zero mean and unit variance as assumed by many ML approaches. For each predicted flux, or the target/label in ML terminology, we scaled it into the interval [0, 1] by the min-max method. In addition to the min-max method, we also tested unit-variance-zero-mean standardization for scaling flux values, and the result was quite similar.


Rapid Prediction of Bacterial Heterotrophic Fluxomics Using Machine Learning and Constraint Programming.

Wu SG, Wang Y, Jiang W, Oyetunde T, Yao R, Zhang X, Shimizu K, Tang YJ, Bao FS - PLoS Comput. Biol. (2016)

A universal central metabolic pathway for bacteria.The central carbon metabolic pathway is simplified into 29 fluxes, used as the outputs of our model.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4836714&req=5

pcbi.1004838.g001: A universal central metabolic pathway for bacteria.The central carbon metabolic pathway is simplified into 29 fluxes, used as the outputs of our model.
Mentions: As mentioned earlier, supervised ML builds models based on the samples, each of which is a pair of a feature vector and a target. Based on published 13C-MFA methodologies and microbial physiologies, we proposed five categorical features: species, nutrient types, oxygen conditions, engineering method, genetic background, and cultivation methods. There were two considerations when choosing those features. First, genetic modifications can significantly re-organize fluxomes. To improve the predictability on mutant strains, our platform allows toggling on or off certain central pathways (by manually setting the flux boundaries) in engineered strains. Second, the factor of cultivation method aims to reveal fluxome differences between shake flask cultures (a pseudo-steady state approach) and bioreactor cultures (a well-controlled fermentation or chemostat cultivation). Meanwhile, we introduced sixteen continuous features: growth rate, substrate uptake rate, and the ratio of multiple substrate uptakes (glucose, fructose, galactose, gluconate, glutamate, citrate, xylose, succinate, malate, lactate, pyruvate, glycerol, acetate and NaHCO3, as shown in Fig 1). Since the features include both categorical and continuous ones, one-hot encoders were used to convert categorical feature values into real numbers. Each feature was then standardized into zero mean and unit variance as assumed by many ML approaches. For each predicted flux, or the target/label in ML terminology, we scaled it into the interval [0, 1] by the min-max method. In addition to the min-max method, we also tested unit-variance-zero-mean standardization for scaling flux values, and the result was quite similar.

Bottom Line: Mining the relationship between environmental and genetic factors and metabolic fluxes hidden in existing fluxomic data will lead to predictive models that can significantly accelerate flux quantification.Due to the interest of studying model organism under particular carbon sources, bias of fluxome in the dataset may limit the applicability of machine learning models.This problem can be resolved after more papers on 13C-MFA are published for non-model species.

View Article: PubMed Central - PubMed

Affiliation: Department of Energy, Environmental and Chemical Engineering, Washington University in St. Louis, St. Louis, Missouri, United States of America.

ABSTRACT
13C metabolic flux analysis (13C-MFA) has been widely used to measure in vivo enzyme reaction rates (i.e., metabolic flux) in microorganisms. Mining the relationship between environmental and genetic factors and metabolic fluxes hidden in existing fluxomic data will lead to predictive models that can significantly accelerate flux quantification. In this paper, we present a web-based platform MFlux (http://mflux.org) that predicts the bacterial central metabolism via machine learning, leveraging data from approximately 100 13C-MFA papers on heterotrophic bacterial metabolisms. Three machine learning methods, namely Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), and Decision Tree, were employed to study the sophisticated relationship between influential factors and metabolic fluxes. We performed a grid search of the best parameter set for each algorithm and verified their performance through 10-fold cross validations. SVM yields the highest accuracy among all three algorithms. Further, we employed quadratic programming to adjust flux profiles to satisfy stoichiometric constraints. Multiple case studies have shown that MFlux can reasonably predict fluxomes as a function of bacterial species, substrate types, growth rate, oxygen conditions, and cultivation methods. Due to the interest of studying model organism under particular carbon sources, bias of fluxome in the dataset may limit the applicability of machine learning models. This problem can be resolved after more papers on 13C-MFA are published for non-model species.

Show MeSH
Related in: MedlinePlus