Limits...
GlycomeDB - integration of open-access carbohydrate structure databases.

Ranzinger R, Herget S, Wetter T, von der Lieth CW - BMC Bioinformatics (2008)

Bottom Line: Although carbohydrates are the third major class of biological macromolecules, after proteins and DNA, there is neither a comprehensive database for carbohydrate structures nor an established universal structure encoding scheme for computational purposes.More than 100000 datasets were imported, resulting in more than 33000 unique sequences now encoded in GlycomeDB using the universal format GlycoCT.Inconsistencies were found in all public databases, which were discussed and corrected in multiple feedback rounds with the responsible curators.

View Article: PubMed Central - HTML - PubMed

Affiliation: German Cancer Research Center DKFZ, Core Facility Molecular Structural Analysis, Im Neuenheimer Feld 280, Heidelberg, Germany. r.ranzinger@dkfz.de

ABSTRACT

Background: Although carbohydrates are the third major class of biological macromolecules, after proteins and DNA, there is neither a comprehensive database for carbohydrate structures nor an established universal structure encoding scheme for computational purposes. Funding for further development of the Complex Carbohydrate Structure Database (CCSD or CarbBank) ceased in 1997, and since then several initiatives have developed independent databases with partially overlapping foci. For each database, different encoding schemes for residues and sequence topology were designed. Therefore, it is virtually impossible to obtain an overview of all deposited structures or to compare the contents of the various databases.

Results: We have implemented procedures which download the structures contained in the seven major databases, e.g. GLYCOSCIENCES.de, the Consortium for Functional Glycomics (CFG), the Kyoto Encyclopedia of Genes and Genomes (KEGG) and the Bacterial Carbohydrate Structure Database (BCSDB). We have created a new database called GlycomeDB, containing all structures, their taxonomic annotations and references (IDs) for the original databases. More than 100000 datasets were imported, resulting in more than 33000 unique sequences now encoded in GlycomeDB using the universal format GlycoCT. Inconsistencies were found in all public databases, which were discussed and corrected in multiple feedback rounds with the responsible curators.

Conclusion: GlycomeDB is a new, publicly available database for carbohydrate sequences with a unified, all-encompassing structure encoding format and NCBI taxonomic referencing. The database is updated weekly and can be downloaded free of charge. The JAVA application GlycoUpdateDB is also available for establishing and updating a local installation of GlycomeDB. With the advent of GlycomeDB, the distributed islands of knowledge in glycomics are now bridged to form a single resource.

Show MeSH
Flow chart for structure translation. The flow chart delineates how the carbohydrate structure translation process is applied for each sequence in its original encoding as retrieved from the source database. When no errors are detected, the result is a validated GlycoCT representation for the carbohydrate structure. Detected errors (grammatical, typographical) are stored separately and reported back to the curator of the source database.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2567997&req=5

Figure 4: Flow chart for structure translation. The flow chart delineates how the carbohydrate structure translation process is applied for each sequence in its original encoding as retrieved from the source database. When no errors are detected, the result is a validated GlycoCT representation for the carbohydrate structure. Detected errors (grammatical, typographical) are stored separately and reported back to the curator of the source database.

Mentions: GlycoUpdateDB is the application program which we have designed to carry out the integration of the interpretable data obtained from the resources described above. It is a JAVA application [25], depending on a PostgreSQL database [26], which can be configured by an XML file. The configuration file contains settings for the local database and instructions for the download and data integration process. Initially, database tables with dictionaries and mappings for the taxonomic data are required. The first stage of integration includes the download process with subsequent extraction of the data files to the local PostgreSQL database. GlycoUpdateDB supports the three download strategies shown in Table 1 and can also use locally resident files (e.g. static databases such as CarbBank). The second stage of integration involves the actual translation of all downloaded and interpretable structures into their corresponding GlycoCT representations and storing of the translated structures in GlycomeDB. Figure 4 shows the workflow applied for each structure.


GlycomeDB - integration of open-access carbohydrate structure databases.

Ranzinger R, Herget S, Wetter T, von der Lieth CW - BMC Bioinformatics (2008)

Flow chart for structure translation. The flow chart delineates how the carbohydrate structure translation process is applied for each sequence in its original encoding as retrieved from the source database. When no errors are detected, the result is a validated GlycoCT representation for the carbohydrate structure. Detected errors (grammatical, typographical) are stored separately and reported back to the curator of the source database.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2567997&req=5

Figure 4: Flow chart for structure translation. The flow chart delineates how the carbohydrate structure translation process is applied for each sequence in its original encoding as retrieved from the source database. When no errors are detected, the result is a validated GlycoCT representation for the carbohydrate structure. Detected errors (grammatical, typographical) are stored separately and reported back to the curator of the source database.
Mentions: GlycoUpdateDB is the application program which we have designed to carry out the integration of the interpretable data obtained from the resources described above. It is a JAVA application [25], depending on a PostgreSQL database [26], which can be configured by an XML file. The configuration file contains settings for the local database and instructions for the download and data integration process. Initially, database tables with dictionaries and mappings for the taxonomic data are required. The first stage of integration includes the download process with subsequent extraction of the data files to the local PostgreSQL database. GlycoUpdateDB supports the three download strategies shown in Table 1 and can also use locally resident files (e.g. static databases such as CarbBank). The second stage of integration involves the actual translation of all downloaded and interpretable structures into their corresponding GlycoCT representations and storing of the translated structures in GlycomeDB. Figure 4 shows the workflow applied for each structure.

Bottom Line: Although carbohydrates are the third major class of biological macromolecules, after proteins and DNA, there is neither a comprehensive database for carbohydrate structures nor an established universal structure encoding scheme for computational purposes.More than 100000 datasets were imported, resulting in more than 33000 unique sequences now encoded in GlycomeDB using the universal format GlycoCT.Inconsistencies were found in all public databases, which were discussed and corrected in multiple feedback rounds with the responsible curators.

View Article: PubMed Central - HTML - PubMed

Affiliation: German Cancer Research Center DKFZ, Core Facility Molecular Structural Analysis, Im Neuenheimer Feld 280, Heidelberg, Germany. r.ranzinger@dkfz.de

ABSTRACT

Background: Although carbohydrates are the third major class of biological macromolecules, after proteins and DNA, there is neither a comprehensive database for carbohydrate structures nor an established universal structure encoding scheme for computational purposes. Funding for further development of the Complex Carbohydrate Structure Database (CCSD or CarbBank) ceased in 1997, and since then several initiatives have developed independent databases with partially overlapping foci. For each database, different encoding schemes for residues and sequence topology were designed. Therefore, it is virtually impossible to obtain an overview of all deposited structures or to compare the contents of the various databases.

Results: We have implemented procedures which download the structures contained in the seven major databases, e.g. GLYCOSCIENCES.de, the Consortium for Functional Glycomics (CFG), the Kyoto Encyclopedia of Genes and Genomes (KEGG) and the Bacterial Carbohydrate Structure Database (BCSDB). We have created a new database called GlycomeDB, containing all structures, their taxonomic annotations and references (IDs) for the original databases. More than 100000 datasets were imported, resulting in more than 33000 unique sequences now encoded in GlycomeDB using the universal format GlycoCT. Inconsistencies were found in all public databases, which were discussed and corrected in multiple feedback rounds with the responsible curators.

Conclusion: GlycomeDB is a new, publicly available database for carbohydrate sequences with a unified, all-encompassing structure encoding format and NCBI taxonomic referencing. The database is updated weekly and can be downloaded free of charge. The JAVA application GlycoUpdateDB is also available for establishing and updating a local installation of GlycomeDB. With the advent of GlycomeDB, the distributed islands of knowledge in glycomics are now bridged to form a single resource.

Show MeSH