Limits...
Evaluation of regression models in metabolic physiology: predicting fluxes from isotopic data without knowledge of the pathway.

Antoniewicz MR, Stephanopoulos G, Kelleher JK - Metabolomics (2006)

Bottom Line: For large training sets (>50) the artificial neural network model was superior, capturing 95% of the variability in the gluconeogenic flux, whereas the three linear models captured only 75%.The effect of error in the variables and the addition of random variables to the data set was considered.They provide insight for metabolomics and the future of isotopic tracers in metabolic research where the underlying physiology is complex or unknown.

View Article: PubMed Central - PubMed

Affiliation: Department of Chemical Engineering, Bioinformatics and Metabolic Engineering Laboratory, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139 USA.

ABSTRACT
This study explores the ability of regression models, with no knowledge of the underlying physiology, to estimate physiological parameters relevant for metabolism and endocrinology. Four regression models were compared: multiple linear regression (MLR), principal component regression (PCR), partial least-squares regression (PLS) and regression using artificial neural networks (ANN). The pathway of mammalian gluconeogenesis was analyzed using [U-(13)C]glucose as tracer. A set of data was simulated by randomly selecting physiologically appropriate metabolic fluxes for the 9 steps of this pathway as independent variables. The isotope labeling patterns of key intermediates in the pathway were then calculated for each set of fluxes, yielding 29 dependent variables. Two thousand sets were created, allowing independent training and test data. Regression models were asked to predict the nine fluxes, given only the 29 isotopomers. For large training sets (>50) the artificial neural network model was superior, capturing 95% of the variability in the gluconeogenic flux, whereas the three linear models captured only 75%. This reflects the ability of neural networks to capture the inherent non-linearities of the metabolic system. The effect of error in the variables and the addition of random variables to the data set was considered. Model sensitivities were used to find the isotopomers that most influenced the predicted flux values. These studies provide the first test of multivariate regression models for the analysis of isotopomer flux data. They provide insight for metabolomics and the future of isotopic tracers in metabolic research where the underlying physiology is complex or unknown.

No MeSH data available.


Related in: MedlinePlus

Determination of the optimal number of principal components by the leave-one-out cross-validation method. The optimal number of principal components is defined as the fewest number of components yielding a PRESS value within 5% of the minimal observed PRESS value; in this case 10 principal components.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC1622920&req=5

Fig2: Determination of the optimal number of principal components by the leave-one-out cross-validation method. The optimal number of principal components is defined as the fewest number of components yielding a PRESS value within 5% of the minimal observed PRESS value; in this case 10 principal components.

Mentions: For the construction of reduced space models the optimal number of principal components (or the optimal dimensionality of the new space) needs to be determined from available calibration data. Using too few components results in significant information loss. On the other hand, since measured data is never noise free, some components will only describe noise. Therefore, using too many dimensions will cause overfitting of data and yield inaccurate predictions as well. A number of criteria have been proposed for the rational selection of the optimal number of principal components; a cross-validation method is the preferred choice for the construction of predictive models. In this approach, each sample is in turn omitted from the training set and the model is trained using the remaining n−1 samples. The trained model is then used to predict the values of the response variables in the sample that was left out, and residuals are calculated as the difference between the actual observed values and the predicted values. The prediction residual sum of squares (PRESS) is then calculated as the sum of all squared residuals. This PRESS value is determined for varying number of components (i.e. dimensions), as one searches for the number of components that gives the minimum PRESS value. However, the location of the minimum is not always well defined and models with varying number of components may yield similar magnitude PRESS values. In this study, the optimal number of principal components was defined as the fewest number of components yielding a PRESS value within 5% of the minimal observed PRESS value. Figure 2 gives an example plot of the PRESS value against the number of components for the training of a PLS model used in this study. The optimal number of dimensions in this case was 10.Figure 2.


Evaluation of regression models in metabolic physiology: predicting fluxes from isotopic data without knowledge of the pathway.

Antoniewicz MR, Stephanopoulos G, Kelleher JK - Metabolomics (2006)

Determination of the optimal number of principal components by the leave-one-out cross-validation method. The optimal number of principal components is defined as the fewest number of components yielding a PRESS value within 5% of the minimal observed PRESS value; in this case 10 principal components.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC1622920&req=5

Fig2: Determination of the optimal number of principal components by the leave-one-out cross-validation method. The optimal number of principal components is defined as the fewest number of components yielding a PRESS value within 5% of the minimal observed PRESS value; in this case 10 principal components.
Mentions: For the construction of reduced space models the optimal number of principal components (or the optimal dimensionality of the new space) needs to be determined from available calibration data. Using too few components results in significant information loss. On the other hand, since measured data is never noise free, some components will only describe noise. Therefore, using too many dimensions will cause overfitting of data and yield inaccurate predictions as well. A number of criteria have been proposed for the rational selection of the optimal number of principal components; a cross-validation method is the preferred choice for the construction of predictive models. In this approach, each sample is in turn omitted from the training set and the model is trained using the remaining n−1 samples. The trained model is then used to predict the values of the response variables in the sample that was left out, and residuals are calculated as the difference between the actual observed values and the predicted values. The prediction residual sum of squares (PRESS) is then calculated as the sum of all squared residuals. This PRESS value is determined for varying number of components (i.e. dimensions), as one searches for the number of components that gives the minimum PRESS value. However, the location of the minimum is not always well defined and models with varying number of components may yield similar magnitude PRESS values. In this study, the optimal number of principal components was defined as the fewest number of components yielding a PRESS value within 5% of the minimal observed PRESS value. Figure 2 gives an example plot of the PRESS value against the number of components for the training of a PLS model used in this study. The optimal number of dimensions in this case was 10.Figure 2.

Bottom Line: For large training sets (>50) the artificial neural network model was superior, capturing 95% of the variability in the gluconeogenic flux, whereas the three linear models captured only 75%.The effect of error in the variables and the addition of random variables to the data set was considered.They provide insight for metabolomics and the future of isotopic tracers in metabolic research where the underlying physiology is complex or unknown.

View Article: PubMed Central - PubMed

Affiliation: Department of Chemical Engineering, Bioinformatics and Metabolic Engineering Laboratory, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139 USA.

ABSTRACT
This study explores the ability of regression models, with no knowledge of the underlying physiology, to estimate physiological parameters relevant for metabolism and endocrinology. Four regression models were compared: multiple linear regression (MLR), principal component regression (PCR), partial least-squares regression (PLS) and regression using artificial neural networks (ANN). The pathway of mammalian gluconeogenesis was analyzed using [U-(13)C]glucose as tracer. A set of data was simulated by randomly selecting physiologically appropriate metabolic fluxes for the 9 steps of this pathway as independent variables. The isotope labeling patterns of key intermediates in the pathway were then calculated for each set of fluxes, yielding 29 dependent variables. Two thousand sets were created, allowing independent training and test data. Regression models were asked to predict the nine fluxes, given only the 29 isotopomers. For large training sets (>50) the artificial neural network model was superior, capturing 95% of the variability in the gluconeogenic flux, whereas the three linear models captured only 75%. This reflects the ability of neural networks to capture the inherent non-linearities of the metabolic system. The effect of error in the variables and the addition of random variables to the data set was considered. Model sensitivities were used to find the isotopomers that most influenced the predicted flux values. These studies provide the first test of multivariate regression models for the analysis of isotopomer flux data. They provide insight for metabolomics and the future of isotopic tracers in metabolic research where the underlying physiology is complex or unknown.

No MeSH data available.


Related in: MedlinePlus