Limits...
Integration of metabolomic and proteomic phenotypes: analysis of data covariance dissects starch and RFO metabolism from low and high temperature compensation response in Arabidopsis thaliana.

Wienkoop S, Morgenthal K, Wolschin F, Scholz M, Selbig J, Weckwerth W - Mol. Cell Proteomics (2008)

Bottom Line: Annu.Independent component analysis revealed phenotype classification resolving genotype-dependent response effects to temperature treatment and genotype-independent general temperature compensation mechanisms.The whole concept of high-dimensional profiling data integration from many replicates, subsequent multivariate statistics for dimensionality reduction, and covariance structure analysis is proposed to be a major strategy for revealing central responses of the biological system under study.

View Article: PubMed Central - PubMed

Affiliation: Max Planck Institute of Molecular Plant Physiology, 14424 Potsdam, Germany.

ABSTRACT
Statistical mining and integration of complex molecular data including metabolites, proteins, and transcripts is one of the critical goals of systems biology (Ideker, T., Galitski, T., and Hood, L. (2001) A new approach to decoding life: systems biology. Annu. Rev. Genomics Hum. Genet. 2, 343-372). A number of studies have demonstrated the parallel analysis of metabolites and large scale transcript expression. Protein analysis has been ignored in these studies, although a clear correlation between transcript and protein levels is shown only in rare cases, necessitating that actual protein levels have to be determined for protein function analysis. Here, we present an approach to investigate the combined covariance structure of metabolite and protein dynamics in a systemic response to abiotic temperature stress in Arabidopsis thaliana wild-type and a corresponding starch-deficient mutant (phosphoglucomutase-deficient). Independent component analysis revealed phenotype classification resolving genotype-dependent response effects to temperature treatment and genotype-independent general temperature compensation mechanisms. An observation is the stress-induced increase of raffinose-family-oligosaccharide levels in the absence of transitory starch storage/mobilization in temperature-treated phosphoglucomutase plants indicating that sucrose synthesis and storage in these mutant plants is sufficient to bypass the typical starch storage/mobilization pathways under abiotic stress. Eventually, sample pattern recognition and correlation network topology analysis allowed for the detection of specific metabolite-protein co-regulation and assignment of a circadian output regulated RNA-binding protein to these processes. The whole concept of high-dimensional profiling data integration from many replicates, subsequent multivariate statistics for dimensionality reduction, and covariance structure analysis is proposed to be a major strategy for revealing central responses of the biological system under study.

Show MeSH

Related in: MedlinePlus

Strategy to analyze the combined covariance/correlation matrix of metabolites and proteins using integrative extraction from a multitude of biological replicates versus different experiments. By data integration it is possible to enhance the interpretation of the extracted independent components and assign specific biomarkers. In parallel pathway mapping, correlation network analysis and stochastic metabolic modeling can be linked to the whole process in an iterative manner to improve metabolic models and their predictive power (15, 42).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2556022&req=5

f1: Strategy to analyze the combined covariance/correlation matrix of metabolites and proteins using integrative extraction from a multitude of biological replicates versus different experiments. By data integration it is possible to enhance the interpretation of the extracted independent components and assign specific biomarkers. In parallel pathway mapping, correlation network analysis and stochastic metabolic modeling can be linked to the whole process in an iterative manner to improve metabolic models and their predictive power (15, 42).

Mentions: Metabolomic technologies enable the very rapid non-targeted analysis of metabolites and provide a diagnostic tool for pattern recognition of biological samples (2–5). Typical pattern recognition methods are variance discrimination algorithms such as principal components analysis (PCA)1 or independent component analysis (ICA) (2, 6–9). Independent component analysis is an extension of covariance analysis by looking for kurtosis thresholds or high entropy (8, 10) and thus adds a further value for biological interpretation. Variance discrimination of samples relies strongly on a high biological variability of independent biological replicate analysis (4, 11, 12). Recently, we demonstrated that these covariance matrixes of experimentally determined metabolite levels are connected with the elasticities of pathway reaction networks (13). Consequently, changes in the structure of these covariance networks reveal biochemical regulations (4). This was confirmed by using topology studies of differential metabolite correlation/covariance networks to investigate a silent phenotype sucrose synthase antisense plant and alterations in a starch-deficient Arabidopsis thaliana mutant (9, 14). Further we used a computational kinetic model of the Calvin cycle coupled to sucrose biosynthesis in plant leaf metabolism to demonstrate changes in metabolite correlation/covariance networks as a response to protein phosphorylation and enzymatic regulation (15, 16). The statistical model implies that variance discrimination analysis such as PCA will optimize sample grouping according to differences in biochemical regulation, thus providing for the first time a fundamental relationship between large scale profiling methods such as metabolomics combined with multivariate data analyses, biochemical regulation, and pattern recognition (4, 12) (see Fig. 1). However, although regulatory hubs can be identified in differential metabolite correlation networks, causal relationships in experimental systems are not derivable without integration of additional parameters such as external environmental perturbation and further molecular levels like protein concentrations or RNA expression data (1, 4). Computer simulation of enzymatic activities of a biochemical network enables calculation of corresponding metabolite correlation networks (4, 13, 15–17). This idea has been further substantiated by recent calculations of metabolic networks (18, 19). In these studies the authors identified high variances in gene expression and protein activity as causes for metabolite correlations. Obviously, the model for metabolite correlations can be extended to systemic fluctuations in complex biochemical networks (20). Consequently, the integration of rapid sample classification and metabolic network analysis using metabolomic techniques with quantitative non-targeted protein profiling will add a further dimension for protein function analysis and systems biology. Furthermore, integrated metabolite and protein measurements offer an improved method for distinguishing among phenotypes (i.e. causes for phenotypes) (4, 9, 12, 21). The systematic comparison of mRNA expression levels, enzymatic activities, and protein levels revealed a low correlation in most studies so far indicating that high throughput microarrays are not sufficient to understand genome-wide protein dynamics or biochemical regulation (22, 23). The systematic integration of transcript and metabolite profiling, thus, necessitates time course resolution. A more direct interaction can be expected for proteins and metabolites. However, only a few examples are existing, consequently investigating metabolomics and proteomics data integration. Recent examples demonstrate such an approach (9, 11, 24–26). These studies clearly demonstrate the need for data integration, however, show that several further obstacles have to be addressed: (i) data quality and comprehensiveness; (ii) sample throughput; and (iii) algorithms and statistics to extract significant information and to cope with the high dimensionality structure of the data. All these issues are directly related and dominate the outcome of an integrative study. In the present study a strategy for metabolomic and proteomic phenotype integration is shown coping with these problems. The overall strategy is based on recent work by us for the systematic analysis of the combined covariance structure of metabolites and proteins in a complex systemic response (see Fig. 1) (9, 11). Recent approaches were restricted to only low numbers of individual proteins. In the present work we improved protein identification and quantification rates strongly without limiting the sample throughput, which is a requirement to exploit biological variability for sample classification and biological interpretation as described above (4, 9, 11).


Integration of metabolomic and proteomic phenotypes: analysis of data covariance dissects starch and RFO metabolism from low and high temperature compensation response in Arabidopsis thaliana.

Wienkoop S, Morgenthal K, Wolschin F, Scholz M, Selbig J, Weckwerth W - Mol. Cell Proteomics (2008)

Strategy to analyze the combined covariance/correlation matrix of metabolites and proteins using integrative extraction from a multitude of biological replicates versus different experiments. By data integration it is possible to enhance the interpretation of the extracted independent components and assign specific biomarkers. In parallel pathway mapping, correlation network analysis and stochastic metabolic modeling can be linked to the whole process in an iterative manner to improve metabolic models and their predictive power (15, 42).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2556022&req=5

f1: Strategy to analyze the combined covariance/correlation matrix of metabolites and proteins using integrative extraction from a multitude of biological replicates versus different experiments. By data integration it is possible to enhance the interpretation of the extracted independent components and assign specific biomarkers. In parallel pathway mapping, correlation network analysis and stochastic metabolic modeling can be linked to the whole process in an iterative manner to improve metabolic models and their predictive power (15, 42).
Mentions: Metabolomic technologies enable the very rapid non-targeted analysis of metabolites and provide a diagnostic tool for pattern recognition of biological samples (2–5). Typical pattern recognition methods are variance discrimination algorithms such as principal components analysis (PCA)1 or independent component analysis (ICA) (2, 6–9). Independent component analysis is an extension of covariance analysis by looking for kurtosis thresholds or high entropy (8, 10) and thus adds a further value for biological interpretation. Variance discrimination of samples relies strongly on a high biological variability of independent biological replicate analysis (4, 11, 12). Recently, we demonstrated that these covariance matrixes of experimentally determined metabolite levels are connected with the elasticities of pathway reaction networks (13). Consequently, changes in the structure of these covariance networks reveal biochemical regulations (4). This was confirmed by using topology studies of differential metabolite correlation/covariance networks to investigate a silent phenotype sucrose synthase antisense plant and alterations in a starch-deficient Arabidopsis thaliana mutant (9, 14). Further we used a computational kinetic model of the Calvin cycle coupled to sucrose biosynthesis in plant leaf metabolism to demonstrate changes in metabolite correlation/covariance networks as a response to protein phosphorylation and enzymatic regulation (15, 16). The statistical model implies that variance discrimination analysis such as PCA will optimize sample grouping according to differences in biochemical regulation, thus providing for the first time a fundamental relationship between large scale profiling methods such as metabolomics combined with multivariate data analyses, biochemical regulation, and pattern recognition (4, 12) (see Fig. 1). However, although regulatory hubs can be identified in differential metabolite correlation networks, causal relationships in experimental systems are not derivable without integration of additional parameters such as external environmental perturbation and further molecular levels like protein concentrations or RNA expression data (1, 4). Computer simulation of enzymatic activities of a biochemical network enables calculation of corresponding metabolite correlation networks (4, 13, 15–17). This idea has been further substantiated by recent calculations of metabolic networks (18, 19). In these studies the authors identified high variances in gene expression and protein activity as causes for metabolite correlations. Obviously, the model for metabolite correlations can be extended to systemic fluctuations in complex biochemical networks (20). Consequently, the integration of rapid sample classification and metabolic network analysis using metabolomic techniques with quantitative non-targeted protein profiling will add a further dimension for protein function analysis and systems biology. Furthermore, integrated metabolite and protein measurements offer an improved method for distinguishing among phenotypes (i.e. causes for phenotypes) (4, 9, 12, 21). The systematic comparison of mRNA expression levels, enzymatic activities, and protein levels revealed a low correlation in most studies so far indicating that high throughput microarrays are not sufficient to understand genome-wide protein dynamics or biochemical regulation (22, 23). The systematic integration of transcript and metabolite profiling, thus, necessitates time course resolution. A more direct interaction can be expected for proteins and metabolites. However, only a few examples are existing, consequently investigating metabolomics and proteomics data integration. Recent examples demonstrate such an approach (9, 11, 24–26). These studies clearly demonstrate the need for data integration, however, show that several further obstacles have to be addressed: (i) data quality and comprehensiveness; (ii) sample throughput; and (iii) algorithms and statistics to extract significant information and to cope with the high dimensionality structure of the data. All these issues are directly related and dominate the outcome of an integrative study. In the present study a strategy for metabolomic and proteomic phenotype integration is shown coping with these problems. The overall strategy is based on recent work by us for the systematic analysis of the combined covariance structure of metabolites and proteins in a complex systemic response (see Fig. 1) (9, 11). Recent approaches were restricted to only low numbers of individual proteins. In the present work we improved protein identification and quantification rates strongly without limiting the sample throughput, which is a requirement to exploit biological variability for sample classification and biological interpretation as described above (4, 9, 11).

Bottom Line: Annu.Independent component analysis revealed phenotype classification resolving genotype-dependent response effects to temperature treatment and genotype-independent general temperature compensation mechanisms.The whole concept of high-dimensional profiling data integration from many replicates, subsequent multivariate statistics for dimensionality reduction, and covariance structure analysis is proposed to be a major strategy for revealing central responses of the biological system under study.

View Article: PubMed Central - PubMed

Affiliation: Max Planck Institute of Molecular Plant Physiology, 14424 Potsdam, Germany.

ABSTRACT
Statistical mining and integration of complex molecular data including metabolites, proteins, and transcripts is one of the critical goals of systems biology (Ideker, T., Galitski, T., and Hood, L. (2001) A new approach to decoding life: systems biology. Annu. Rev. Genomics Hum. Genet. 2, 343-372). A number of studies have demonstrated the parallel analysis of metabolites and large scale transcript expression. Protein analysis has been ignored in these studies, although a clear correlation between transcript and protein levels is shown only in rare cases, necessitating that actual protein levels have to be determined for protein function analysis. Here, we present an approach to investigate the combined covariance structure of metabolite and protein dynamics in a systemic response to abiotic temperature stress in Arabidopsis thaliana wild-type and a corresponding starch-deficient mutant (phosphoglucomutase-deficient). Independent component analysis revealed phenotype classification resolving genotype-dependent response effects to temperature treatment and genotype-independent general temperature compensation mechanisms. An observation is the stress-induced increase of raffinose-family-oligosaccharide levels in the absence of transitory starch storage/mobilization in temperature-treated phosphoglucomutase plants indicating that sucrose synthesis and storage in these mutant plants is sufficient to bypass the typical starch storage/mobilization pathways under abiotic stress. Eventually, sample pattern recognition and correlation network topology analysis allowed for the detection of specific metabolite-protein co-regulation and assignment of a circadian output regulated RNA-binding protein to these processes. The whole concept of high-dimensional profiling data integration from many replicates, subsequent multivariate statistics for dimensionality reduction, and covariance structure analysis is proposed to be a major strategy for revealing central responses of the biological system under study.

Show MeSH
Related in: MedlinePlus