Limits...
Decision tree supported substructure prediction of metabolites from GC-MS profiles.

Hummel J, Strehmel N, Selbig J, Walther D, Kopka J - Metabolomics (2010)

Bottom Line: Structural feature extraction was applied to sub-divide the metabolite space contained in the GMD and to define the prediction target classes.Decision tree (DT)-based prediction of the most frequent substructures based on mass spectral features and RI information is demonstrated to result in highly sensitive and specific detections of sub-structures contained in the compounds.The underlying set of DTs can be inspected by the user and are made available for batch processing via SOAP (Simple Object Access Protocol)-based web services.

View Article: PubMed Central - PubMed

ABSTRACT
Gas chromatography coupled to mass spectrometry (GC-MS) is one of the most widespread routine technologies applied to the large scale screening and discovery of novel metabolic biomarkers. However, currently the majority of mass spectral tags (MSTs) remains unidentified due to the lack of authenticated pure reference substances required for compound identification by GC-MS. Here, we accessed the information on reference compounds stored in the Golm Metabolome Database (GMD) to apply supervised machine learning approaches to the classification and identification of unidentified MSTs without relying on library searches. Non-annotated MSTs with mass spectral and retention index (RI) information together with data of already identified metabolites and reference substances have been archived in the GMD. Structural feature extraction was applied to sub-divide the metabolite space contained in the GMD and to define the prediction target classes. Decision tree (DT)-based prediction of the most frequent substructures based on mass spectral features and RI information is demonstrated to result in highly sensitive and specific detections of sub-structures contained in the compounds. The underlying set of DTs can be inspected by the user and are made available for batch processing via SOAP (Simple Object Access Protocol)-based web services. The GMD mass spectral library with the integrated DTs is freely accessible for non-commercial use at http://gmd.mpimp-golm.mpg.de/. All matching and structure search functionalities are available as SOAP-based web services. A XML + HTTP interface, which follows Representational State Transfer (REST) principles, facilitates read-only access to data base entities.

No MeSH data available.


Related in: MedlinePlus

Excerpt of the GMD scheme. MSTs (mass spectral tags, i.e. repeatedly observed mass spectra with retention behaviour) are linked to analytes via experiments and a supervised annotation process. Likewise, analytes are mapped to metabolites. Structural information has been added to both types of compounds, the metabolites and their respective analytes
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2874469&req=5

Fig1: Excerpt of the GMD scheme. MSTs (mass spectral tags, i.e. repeatedly observed mass spectra with retention behaviour) are linked to analytes via experiments and a supervised annotation process. Likewise, analytes are mapped to metabolites. Structural information has been added to both types of compounds, the metabolites and their respective analytes

Mentions: The GMD uses a Microsoft SQL Server 2008™ as the relational database backend for relating the mass spectrum and retention behaviour to an analyte, i.e. the chemically modified compound, which is mapped to represent a metabolite (Fig. 1) (Hummel et al. 2008). Both analyte and metabolite have the properties of a chemical compound and are linked to structures archived as .mol-files and InChI™ codes (http://www.iupac.org/inchi/). A typical metabolite has one to two analytes, which are generated by the chemical derivatization process inherent to the GC-MS profiling technique. Each analyte has multiple technological versions of MSTs. These replicate mass spectra and RIs are empirically determined using different mass spectral technologies, e.g. time of flight, quadrupole or ion trap based mass detectors, and variations of gas chromatographic systems (Strehmel et al. 2008).Fig. 1


Decision tree supported substructure prediction of metabolites from GC-MS profiles.

Hummel J, Strehmel N, Selbig J, Walther D, Kopka J - Metabolomics (2010)

Excerpt of the GMD scheme. MSTs (mass spectral tags, i.e. repeatedly observed mass spectra with retention behaviour) are linked to analytes via experiments and a supervised annotation process. Likewise, analytes are mapped to metabolites. Structural information has been added to both types of compounds, the metabolites and their respective analytes
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2874469&req=5

Fig1: Excerpt of the GMD scheme. MSTs (mass spectral tags, i.e. repeatedly observed mass spectra with retention behaviour) are linked to analytes via experiments and a supervised annotation process. Likewise, analytes are mapped to metabolites. Structural information has been added to both types of compounds, the metabolites and their respective analytes
Mentions: The GMD uses a Microsoft SQL Server 2008™ as the relational database backend for relating the mass spectrum and retention behaviour to an analyte, i.e. the chemically modified compound, which is mapped to represent a metabolite (Fig. 1) (Hummel et al. 2008). Both analyte and metabolite have the properties of a chemical compound and are linked to structures archived as .mol-files and InChI™ codes (http://www.iupac.org/inchi/). A typical metabolite has one to two analytes, which are generated by the chemical derivatization process inherent to the GC-MS profiling technique. Each analyte has multiple technological versions of MSTs. These replicate mass spectra and RIs are empirically determined using different mass spectral technologies, e.g. time of flight, quadrupole or ion trap based mass detectors, and variations of gas chromatographic systems (Strehmel et al. 2008).Fig. 1

Bottom Line: Structural feature extraction was applied to sub-divide the metabolite space contained in the GMD and to define the prediction target classes.Decision tree (DT)-based prediction of the most frequent substructures based on mass spectral features and RI information is demonstrated to result in highly sensitive and specific detections of sub-structures contained in the compounds.The underlying set of DTs can be inspected by the user and are made available for batch processing via SOAP (Simple Object Access Protocol)-based web services.

View Article: PubMed Central - PubMed

ABSTRACT
Gas chromatography coupled to mass spectrometry (GC-MS) is one of the most widespread routine technologies applied to the large scale screening and discovery of novel metabolic biomarkers. However, currently the majority of mass spectral tags (MSTs) remains unidentified due to the lack of authenticated pure reference substances required for compound identification by GC-MS. Here, we accessed the information on reference compounds stored in the Golm Metabolome Database (GMD) to apply supervised machine learning approaches to the classification and identification of unidentified MSTs without relying on library searches. Non-annotated MSTs with mass spectral and retention index (RI) information together with data of already identified metabolites and reference substances have been archived in the GMD. Structural feature extraction was applied to sub-divide the metabolite space contained in the GMD and to define the prediction target classes. Decision tree (DT)-based prediction of the most frequent substructures based on mass spectral features and RI information is demonstrated to result in highly sensitive and specific detections of sub-structures contained in the compounds. The underlying set of DTs can be inspected by the user and are made available for batch processing via SOAP (Simple Object Access Protocol)-based web services. The GMD mass spectral library with the integrated DTs is freely accessible for non-commercial use at http://gmd.mpimp-golm.mpg.de/. All matching and structure search functionalities are available as SOAP-based web services. A XML + HTTP interface, which follows Representational State Transfer (REST) principles, facilitates read-only access to data base entities.

No MeSH data available.


Related in: MedlinePlus