Limits...
Automics: an integrated platform for NMR-based metabonomics spectral processing and data analysis.

Wang T, Shao K, Chu Q, Ren Y, Mu Y, Qu L, He J, Jin C, Xia B - BMC Bioinformatics (2009)

Bottom Line: Moreover, Automics has a user-friendly graphical interface for visualizing NMR spectra and data analysis results.Using Automics, users can complete spectral processing and data analysis within one software package in most cases.Moreover, with its open source architecture, interested researchers can further develop and extend this software based on the existing infrastructure.

View Article: PubMed Central - HTML - PubMed

Affiliation: Beijing NMR Center, Peking University, PR China. super_wt@sina.com

ABSTRACT

Background: Spectral processing and post-experimental data analysis are the major tasks in NMR-based metabonomics studies. While there are commercial and free licensed software tools available to assist these tasks, researchers usually have to use multiple software packages for their studies because software packages generally focus on specific tasks. It would be beneficial to have a highly integrated platform, in which these tasks can be completed within one package. Moreover, with open source architecture, newly proposed algorithms or methods for spectral processing and data analysis can be implemented much more easily and accessed freely by the public.

Results: In this paper, we report an open source software tool, Automics, which is specifically designed for NMR-based metabonomics studies. Automics is a highly integrated platform that provides functions covering almost all the stages of NMR-based metabonomics studies. Automics provides high throughput automatic modules with most recently proposed algorithms and powerful manual modules for 1D NMR spectral processing. In addition to spectral processing functions, powerful features for data organization, data pre-processing, and data analysis have been implemented. Nine statistical methods can be applied to analyses including: feature selection (Fisher's criterion), data reduction (PCA, LDA, ULDA), unsupervised clustering (K-Mean) and supervised regression and classification (PLS/PLS-DA, KNN, SIMCA, SVM). Moreover, Automics has a user-friendly graphical interface for visualizing NMR spectra and data analysis results. The functional ability of Automics is demonstrated with an analysis of a type 2 diabetes metabolic profile.

Conclusion: Automics facilitates high throughput 1D NMR spectral processing and high dimensional data analysis for NMR-based metabonomics applications. Using Automics, users can complete spectral processing and data analysis within one software package in most cases. Moreover, with its open source architecture, interested researchers can further develop and extend this software based on the existing infrastructure.

Show MeSH

Related in: MedlinePlus

PLS analysis of type 2 diabetic samples (57) and healthy samples (41) using Automics. (A) PLS scores show evident clustering between diabetic (○) and healthy (△) samples. The optimal separation occurs in the second and third components (t2, t3). (B) Regression coefficients of the corresponding PLS model. (C) PLS scores after application of DOSC for removal of one orthogonal component. (D) Regression coefficients of the PLS model after application of DOSC. (E) PLS scores after application of O-PLS. Note that the significant improvement for separation is both achieved by DOSC and O-PLS, and now the optimal separation occurs in the first principal component. (F) Regression coefficients of the PLS model after application of O-PLS.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2666662&req=5

Figure 5: PLS analysis of type 2 diabetic samples (57) and healthy samples (41) using Automics. (A) PLS scores show evident clustering between diabetic (○) and healthy (△) samples. The optimal separation occurs in the second and third components (t2, t3). (B) Regression coefficients of the corresponding PLS model. (C) PLS scores after application of DOSC for removal of one orthogonal component. (D) Regression coefficients of the PLS model after application of DOSC. (E) PLS scores after application of O-PLS. Note that the significant improvement for separation is both achieved by DOSC and O-PLS, and now the optimal separation occurs in the first principal component. (F) Regression coefficients of the PLS model after application of O-PLS.

Mentions: To determine whether it is possible to distinguish healthy and diabetic samples based on the NMR spectra, we carried out PCA and PLS analysis using data analysis module on the mean-centered data. PCA analysis showed that the two groups are severely overlapped. The PLS score plot of the second and the third principal components (t2 and t3) show that some clustering is evident even though there is overlap between the two groups (Fig. 5–A). The regions of the NMR spectrum that most strongly influence the separation between the two groups can be indicated by the regression coefficients. However, the regression coefficients of this model (Fig. 5–B) can not give an exact explanation due to overlap. To improve the performance of the data analysis and filter out unwanted orthogonal variations, DOSC or O-PLS was first applied to the dataset before PLS analysis. After application of DOSC (the first orthogonal component was removed) or O-PLS (orthogonal variations for the first PLS component was removed), the healthy group (in blue color) and the diabetic group (in red color) were well separated by the first PLS component (Fig. 5C, E). Regression coefficients indicate the regions that most strongly influence the separation between the two groups (Fig. 5D, F). Each column of the regression coefficient plots represents a spectral region covering 0.02 ppm. Notice that we have defined a Y class indicator vector as response variables before analysis (1 for diabetic samples and 0 for normal samples), positive regression coefficients indicate that there are relatively higher concentrations of metabolites present in type 2 diabetes, while negative values indicate relatively lower concentration. From regression coefficients (Fig. 5–D, F), it is found that signals significant to the separation mainly lie around 0.86 ppm, 0.90 ppm, 1.26 ppm, 1.30 ppm, 1.34 ppm, a small region near 3.5 ppm, 5.24 ppm and 5.30 ppm. These signals can be tentatively assigned according to previous reports [46,47]. Signals near 0.86 and 0.90 ppm are mainly assigned to CH3 groups from fatty acid side chains of lipids, in particular LDL and VLDL; Signals at 1.26 and 1.30 ppm are assigned to (CH2)n groups from fatty acid side chains of lipids (mainly in VLDL, LDL); the signal at 1.34 ppm is assigned to lactate; the small region near 3.5 ppm should contain a set of signals from CH groups of glucose, sugars, glycerol and amino acids; the signals around 5.24 and 5.30 ppm are from CH groups of α-glucose and lipid, respectively. As this was a demo study, we did not investigate other signals. Most of the above mentioned regions have been assigned to glucose, lactate and lipids. They have positive regression coefficients, indicating that concentrations of these metabolites are higher in the diabetic group than those in the healthy group (also by ANOVA, p < 0.01). Diabetes mellitus is a prevalent metabolic disorder disease characterized by elevated blood glucose. It has been demonstrated that the diabetes mellitus is associated with metabolism disorder of lipids and fatty acids [48-50]. Our results are consistent with available knowledge about diabetes mellitus from previous studies and clinical information of the samples.


Automics: an integrated platform for NMR-based metabonomics spectral processing and data analysis.

Wang T, Shao K, Chu Q, Ren Y, Mu Y, Qu L, He J, Jin C, Xia B - BMC Bioinformatics (2009)

PLS analysis of type 2 diabetic samples (57) and healthy samples (41) using Automics. (A) PLS scores show evident clustering between diabetic (○) and healthy (△) samples. The optimal separation occurs in the second and third components (t2, t3). (B) Regression coefficients of the corresponding PLS model. (C) PLS scores after application of DOSC for removal of one orthogonal component. (D) Regression coefficients of the PLS model after application of DOSC. (E) PLS scores after application of O-PLS. Note that the significant improvement for separation is both achieved by DOSC and O-PLS, and now the optimal separation occurs in the first principal component. (F) Regression coefficients of the PLS model after application of O-PLS.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2666662&req=5

Figure 5: PLS analysis of type 2 diabetic samples (57) and healthy samples (41) using Automics. (A) PLS scores show evident clustering between diabetic (○) and healthy (△) samples. The optimal separation occurs in the second and third components (t2, t3). (B) Regression coefficients of the corresponding PLS model. (C) PLS scores after application of DOSC for removal of one orthogonal component. (D) Regression coefficients of the PLS model after application of DOSC. (E) PLS scores after application of O-PLS. Note that the significant improvement for separation is both achieved by DOSC and O-PLS, and now the optimal separation occurs in the first principal component. (F) Regression coefficients of the PLS model after application of O-PLS.
Mentions: To determine whether it is possible to distinguish healthy and diabetic samples based on the NMR spectra, we carried out PCA and PLS analysis using data analysis module on the mean-centered data. PCA analysis showed that the two groups are severely overlapped. The PLS score plot of the second and the third principal components (t2 and t3) show that some clustering is evident even though there is overlap between the two groups (Fig. 5–A). The regions of the NMR spectrum that most strongly influence the separation between the two groups can be indicated by the regression coefficients. However, the regression coefficients of this model (Fig. 5–B) can not give an exact explanation due to overlap. To improve the performance of the data analysis and filter out unwanted orthogonal variations, DOSC or O-PLS was first applied to the dataset before PLS analysis. After application of DOSC (the first orthogonal component was removed) or O-PLS (orthogonal variations for the first PLS component was removed), the healthy group (in blue color) and the diabetic group (in red color) were well separated by the first PLS component (Fig. 5C, E). Regression coefficients indicate the regions that most strongly influence the separation between the two groups (Fig. 5D, F). Each column of the regression coefficient plots represents a spectral region covering 0.02 ppm. Notice that we have defined a Y class indicator vector as response variables before analysis (1 for diabetic samples and 0 for normal samples), positive regression coefficients indicate that there are relatively higher concentrations of metabolites present in type 2 diabetes, while negative values indicate relatively lower concentration. From regression coefficients (Fig. 5–D, F), it is found that signals significant to the separation mainly lie around 0.86 ppm, 0.90 ppm, 1.26 ppm, 1.30 ppm, 1.34 ppm, a small region near 3.5 ppm, 5.24 ppm and 5.30 ppm. These signals can be tentatively assigned according to previous reports [46,47]. Signals near 0.86 and 0.90 ppm are mainly assigned to CH3 groups from fatty acid side chains of lipids, in particular LDL and VLDL; Signals at 1.26 and 1.30 ppm are assigned to (CH2)n groups from fatty acid side chains of lipids (mainly in VLDL, LDL); the signal at 1.34 ppm is assigned to lactate; the small region near 3.5 ppm should contain a set of signals from CH groups of glucose, sugars, glycerol and amino acids; the signals around 5.24 and 5.30 ppm are from CH groups of α-glucose and lipid, respectively. As this was a demo study, we did not investigate other signals. Most of the above mentioned regions have been assigned to glucose, lactate and lipids. They have positive regression coefficients, indicating that concentrations of these metabolites are higher in the diabetic group than those in the healthy group (also by ANOVA, p < 0.01). Diabetes mellitus is a prevalent metabolic disorder disease characterized by elevated blood glucose. It has been demonstrated that the diabetes mellitus is associated with metabolism disorder of lipids and fatty acids [48-50]. Our results are consistent with available knowledge about diabetes mellitus from previous studies and clinical information of the samples.

Bottom Line: Moreover, Automics has a user-friendly graphical interface for visualizing NMR spectra and data analysis results.Using Automics, users can complete spectral processing and data analysis within one software package in most cases.Moreover, with its open source architecture, interested researchers can further develop and extend this software based on the existing infrastructure.

View Article: PubMed Central - HTML - PubMed

Affiliation: Beijing NMR Center, Peking University, PR China. super_wt@sina.com

ABSTRACT

Background: Spectral processing and post-experimental data analysis are the major tasks in NMR-based metabonomics studies. While there are commercial and free licensed software tools available to assist these tasks, researchers usually have to use multiple software packages for their studies because software packages generally focus on specific tasks. It would be beneficial to have a highly integrated platform, in which these tasks can be completed within one package. Moreover, with open source architecture, newly proposed algorithms or methods for spectral processing and data analysis can be implemented much more easily and accessed freely by the public.

Results: In this paper, we report an open source software tool, Automics, which is specifically designed for NMR-based metabonomics studies. Automics is a highly integrated platform that provides functions covering almost all the stages of NMR-based metabonomics studies. Automics provides high throughput automatic modules with most recently proposed algorithms and powerful manual modules for 1D NMR spectral processing. In addition to spectral processing functions, powerful features for data organization, data pre-processing, and data analysis have been implemented. Nine statistical methods can be applied to analyses including: feature selection (Fisher's criterion), data reduction (PCA, LDA, ULDA), unsupervised clustering (K-Mean) and supervised regression and classification (PLS/PLS-DA, KNN, SIMCA, SVM). Moreover, Automics has a user-friendly graphical interface for visualizing NMR spectra and data analysis results. The functional ability of Automics is demonstrated with an analysis of a type 2 diabetes metabolic profile.

Conclusion: Automics facilitates high throughput 1D NMR spectral processing and high dimensional data analysis for NMR-based metabonomics applications. Using Automics, users can complete spectral processing and data analysis within one software package in most cases. Moreover, with its open source architecture, interested researchers can further develop and extend this software based on the existing infrastructure.

Show MeSH
Related in: MedlinePlus