Limits...
GlycomeDB - integration of open-access carbohydrate structure databases.

Ranzinger R, Herget S, Wetter T, von der Lieth CW - BMC Bioinformatics (2008)

Bottom Line: Although carbohydrates are the third major class of biological macromolecules, after proteins and DNA, there is neither a comprehensive database for carbohydrate structures nor an established universal structure encoding scheme for computational purposes.More than 100000 datasets were imported, resulting in more than 33000 unique sequences now encoded in GlycomeDB using the universal format GlycoCT.Inconsistencies were found in all public databases, which were discussed and corrected in multiple feedback rounds with the responsible curators.

View Article: PubMed Central - HTML - PubMed

Affiliation: German Cancer Research Center DKFZ, Core Facility Molecular Structural Analysis, Im Neuenheimer Feld 280, Heidelberg, Germany. r.ranzinger@dkfz.de

ABSTRACT

Background: Although carbohydrates are the third major class of biological macromolecules, after proteins and DNA, there is neither a comprehensive database for carbohydrate structures nor an established universal structure encoding scheme for computational purposes. Funding for further development of the Complex Carbohydrate Structure Database (CCSD or CarbBank) ceased in 1997, and since then several initiatives have developed independent databases with partially overlapping foci. For each database, different encoding schemes for residues and sequence topology were designed. Therefore, it is virtually impossible to obtain an overview of all deposited structures or to compare the contents of the various databases.

Results: We have implemented procedures which download the structures contained in the seven major databases, e.g. GLYCOSCIENCES.de, the Consortium for Functional Glycomics (CFG), the Kyoto Encyclopedia of Genes and Genomes (KEGG) and the Bacterial Carbohydrate Structure Database (BCSDB). We have created a new database called GlycomeDB, containing all structures, their taxonomic annotations and references (IDs) for the original databases. More than 100000 datasets were imported, resulting in more than 33000 unique sequences now encoded in GlycomeDB using the universal format GlycoCT. Inconsistencies were found in all public databases, which were discussed and corrected in multiple feedback rounds with the responsible curators.

Conclusion: GlycomeDB is a new, publicly available database for carbohydrate sequences with a unified, all-encompassing structure encoding format and NCBI taxonomic referencing. The database is updated weekly and can be downloaded free of charge. The JAVA application GlycoUpdateDB is also available for establishing and updating a local installation of GlycomeDB. With the advent of GlycomeDB, the distributed islands of knowledge in glycomics are now bridged to form a single resource.

Show MeSH
Entity relationship diagram for GlycomeDB. This diagram represents some of the schemata, tables and connectivities incorporated in GlycomeDB (see text for details). The name at the top of each table has the format schema_name.table_name. All m-to-n tables are simply represented by this name within a yellow box; other tables are shown with a list of the important attributes. Primary keys of the tables are indicated with green markers while foreign keys have red markers. Labels describing the cardinalities of the relationships between tables are given in the modified Chen notation ("1" = one, "mc" = zero, one or many). The top section of the diagram illustrates the relationships between taxonomic annotations (blue background) and structures (red background). The original structures and the GlycoCT translation are linked to each other via the remote.remote_structure_has_structure table. The tables in the orange section represent the dictionaries for the residues used and their associations with the original structures. The green section includes the GlycoCT basetypes and substituents which have relationships with the GlycoCT-encoded structures.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2567997&req=5

Figure 5: Entity relationship diagram for GlycomeDB. This diagram represents some of the schemata, tables and connectivities incorporated in GlycomeDB (see text for details). The name at the top of each table has the format schema_name.table_name. All m-to-n tables are simply represented by this name within a yellow box; other tables are shown with a list of the important attributes. Primary keys of the tables are indicated with green markers while foreign keys have red markers. Labels describing the cardinalities of the relationships between tables are given in the modified Chen notation ("1" = one, "mc" = zero, one or many). The top section of the diagram illustrates the relationships between taxonomic annotations (blue background) and structures (red background). The original structures and the GlycoCT translation are linked to each other via the remote.remote_structure_has_structure table. The tables in the orange section represent the dictionaries for the residues used and their associations with the original structures. The green section includes the GlycoCT basetypes and substituents which have relationships with the GlycoCT-encoded structures.

Mentions: Initially, the database contains the schemata core and dictionaries, with tables that include the dictionaries for residue translation and taxonomy mapping, and the schema remote, which has initially empty tables to be filled during data integration. During a GlycoUpdateDB run, a new schema is added for each downloaded database, following the naming convention raw_databasename (e.g. raw_cfg). These schemata contain the downloaded primary data from each of the external databases. Moreover, the schema ncbi is created and filled with a dump of the NCBI taxonomy database. The downloaded information in these schemata is used to fill the remote schema during the data integration process. Figure 5 shows various parts of the GlycomeDB database in an entity relationship diagram, with the taxonomic and structural parts at the top.


GlycomeDB - integration of open-access carbohydrate structure databases.

Ranzinger R, Herget S, Wetter T, von der Lieth CW - BMC Bioinformatics (2008)

Entity relationship diagram for GlycomeDB. This diagram represents some of the schemata, tables and connectivities incorporated in GlycomeDB (see text for details). The name at the top of each table has the format schema_name.table_name. All m-to-n tables are simply represented by this name within a yellow box; other tables are shown with a list of the important attributes. Primary keys of the tables are indicated with green markers while foreign keys have red markers. Labels describing the cardinalities of the relationships between tables are given in the modified Chen notation ("1" = one, "mc" = zero, one or many). The top section of the diagram illustrates the relationships between taxonomic annotations (blue background) and structures (red background). The original structures and the GlycoCT translation are linked to each other via the remote.remote_structure_has_structure table. The tables in the orange section represent the dictionaries for the residues used and their associations with the original structures. The green section includes the GlycoCT basetypes and substituents which have relationships with the GlycoCT-encoded structures.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2567997&req=5

Figure 5: Entity relationship diagram for GlycomeDB. This diagram represents some of the schemata, tables and connectivities incorporated in GlycomeDB (see text for details). The name at the top of each table has the format schema_name.table_name. All m-to-n tables are simply represented by this name within a yellow box; other tables are shown with a list of the important attributes. Primary keys of the tables are indicated with green markers while foreign keys have red markers. Labels describing the cardinalities of the relationships between tables are given in the modified Chen notation ("1" = one, "mc" = zero, one or many). The top section of the diagram illustrates the relationships between taxonomic annotations (blue background) and structures (red background). The original structures and the GlycoCT translation are linked to each other via the remote.remote_structure_has_structure table. The tables in the orange section represent the dictionaries for the residues used and their associations with the original structures. The green section includes the GlycoCT basetypes and substituents which have relationships with the GlycoCT-encoded structures.
Mentions: Initially, the database contains the schemata core and dictionaries, with tables that include the dictionaries for residue translation and taxonomy mapping, and the schema remote, which has initially empty tables to be filled during data integration. During a GlycoUpdateDB run, a new schema is added for each downloaded database, following the naming convention raw_databasename (e.g. raw_cfg). These schemata contain the downloaded primary data from each of the external databases. Moreover, the schema ncbi is created and filled with a dump of the NCBI taxonomy database. The downloaded information in these schemata is used to fill the remote schema during the data integration process. Figure 5 shows various parts of the GlycomeDB database in an entity relationship diagram, with the taxonomic and structural parts at the top.

Bottom Line: Although carbohydrates are the third major class of biological macromolecules, after proteins and DNA, there is neither a comprehensive database for carbohydrate structures nor an established universal structure encoding scheme for computational purposes.More than 100000 datasets were imported, resulting in more than 33000 unique sequences now encoded in GlycomeDB using the universal format GlycoCT.Inconsistencies were found in all public databases, which were discussed and corrected in multiple feedback rounds with the responsible curators.

View Article: PubMed Central - HTML - PubMed

Affiliation: German Cancer Research Center DKFZ, Core Facility Molecular Structural Analysis, Im Neuenheimer Feld 280, Heidelberg, Germany. r.ranzinger@dkfz.de

ABSTRACT

Background: Although carbohydrates are the third major class of biological macromolecules, after proteins and DNA, there is neither a comprehensive database for carbohydrate structures nor an established universal structure encoding scheme for computational purposes. Funding for further development of the Complex Carbohydrate Structure Database (CCSD or CarbBank) ceased in 1997, and since then several initiatives have developed independent databases with partially overlapping foci. For each database, different encoding schemes for residues and sequence topology were designed. Therefore, it is virtually impossible to obtain an overview of all deposited structures or to compare the contents of the various databases.

Results: We have implemented procedures which download the structures contained in the seven major databases, e.g. GLYCOSCIENCES.de, the Consortium for Functional Glycomics (CFG), the Kyoto Encyclopedia of Genes and Genomes (KEGG) and the Bacterial Carbohydrate Structure Database (BCSDB). We have created a new database called GlycomeDB, containing all structures, their taxonomic annotations and references (IDs) for the original databases. More than 100000 datasets were imported, resulting in more than 33000 unique sequences now encoded in GlycomeDB using the universal format GlycoCT. Inconsistencies were found in all public databases, which were discussed and corrected in multiple feedback rounds with the responsible curators.

Conclusion: GlycomeDB is a new, publicly available database for carbohydrate sequences with a unified, all-encompassing structure encoding format and NCBI taxonomic referencing. The database is updated weekly and can be downloaded free of charge. The JAVA application GlycoUpdateDB is also available for establishing and updating a local installation of GlycomeDB. With the advent of GlycomeDB, the distributed islands of knowledge in glycomics are now bridged to form a single resource.

Show MeSH