Limits...
The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification.

Reddy TB, Thomas AD, Stamatis D, Bertsch J, Isbandi M, Jansson J, Mallajosyula J, Pagani I, Lobos EA, Kyrpides NC - Nucleic Acids Res. (2014)

Bottom Line: The database currently hosts information for about 19,200 studies, 56,000 Biosamples, 56,000 sequencing projects and 39,400 analysis projects.The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted.GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards.

View Article: PubMed Central - PubMed

Affiliation: Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, CA 94598, USA tbreddy@lbl.gov.

Show MeSH
The four level project classification system implemented in v.5 to describe Studies, Biosamples, Sequencing Projects and Analysis Projects. Studies group one or more related Biosamples. Biosamples describe an individual sample of genetic material. Sequencing projects are the sequencing deliverables from the Biosamples. Analysis projects are the data processing methods applied to sequencing projects. (A) Biosamples may be merged prior to sequencing projects (e.g., 16S amplicon data combined prior to sequencing). (B) Sequencing Projects may be merged prior to analysis (e.g., multiple single-cell genomes combined for assembly).
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4384021&req=5

Figure 1: The four level project classification system implemented in v.5 to describe Studies, Biosamples, Sequencing Projects and Analysis Projects. Studies group one or more related Biosamples. Biosamples describe an individual sample of genetic material. Sequencing projects are the sequencing deliverables from the Biosamples. Analysis projects are the data processing methods applied to sequencing projects. (A) Biosamples may be merged prior to sequencing projects (e.g., 16S amplicon data combined prior to sequencing). (B) Sequencing Projects may be merged prior to analysis (e.g., multiple single-cell genomes combined for assembly).

Mentions: Version 5 of the database is founded on a fundamentally redesigned schema to accommodate a four level project classification system (Figure 1). The new classification system is comprised of Studies, Biosamples, Sequencing Projects (SPs) and Analysis Projects (APs). Studies constitute the highest level of classification in the system, containing Biosamples, SPs and APs that are part of a single initiative. GOLD's Biosamples represent the physical isolate or environmental material from which genetic material is extracted for sequencing. GOLD's Biosamples have no relation to NCBI BioSamples. GOLD's SPs represent sequencing protocols such as whole genome sequencing (WGS), transcriptomes, metagenomes, metatranscriptomes, methylation sequencing, etc. applied to Biosamples. APs are the analytical processes applied to the SPs. Multiple different assemblies or annotations of the same SPs would result in multiple different APs with varying metadata that need to be captured. These four components are described in more detail below.


The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification.

Reddy TB, Thomas AD, Stamatis D, Bertsch J, Isbandi M, Jansson J, Mallajosyula J, Pagani I, Lobos EA, Kyrpides NC - Nucleic Acids Res. (2014)

The four level project classification system implemented in v.5 to describe Studies, Biosamples, Sequencing Projects and Analysis Projects. Studies group one or more related Biosamples. Biosamples describe an individual sample of genetic material. Sequencing projects are the sequencing deliverables from the Biosamples. Analysis projects are the data processing methods applied to sequencing projects. (A) Biosamples may be merged prior to sequencing projects (e.g., 16S amplicon data combined prior to sequencing). (B) Sequencing Projects may be merged prior to analysis (e.g., multiple single-cell genomes combined for assembly).
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4384021&req=5

Figure 1: The four level project classification system implemented in v.5 to describe Studies, Biosamples, Sequencing Projects and Analysis Projects. Studies group one or more related Biosamples. Biosamples describe an individual sample of genetic material. Sequencing projects are the sequencing deliverables from the Biosamples. Analysis projects are the data processing methods applied to sequencing projects. (A) Biosamples may be merged prior to sequencing projects (e.g., 16S amplicon data combined prior to sequencing). (B) Sequencing Projects may be merged prior to analysis (e.g., multiple single-cell genomes combined for assembly).
Mentions: Version 5 of the database is founded on a fundamentally redesigned schema to accommodate a four level project classification system (Figure 1). The new classification system is comprised of Studies, Biosamples, Sequencing Projects (SPs) and Analysis Projects (APs). Studies constitute the highest level of classification in the system, containing Biosamples, SPs and APs that are part of a single initiative. GOLD's Biosamples represent the physical isolate or environmental material from which genetic material is extracted for sequencing. GOLD's Biosamples have no relation to NCBI BioSamples. GOLD's SPs represent sequencing protocols such as whole genome sequencing (WGS), transcriptomes, metagenomes, metatranscriptomes, methylation sequencing, etc. applied to Biosamples. APs are the analytical processes applied to the SPs. Multiple different assemblies or annotations of the same SPs would result in multiple different APs with varying metadata that need to be captured. These four components are described in more detail below.

Bottom Line: The database currently hosts information for about 19,200 studies, 56,000 Biosamples, 56,000 sequencing projects and 39,400 analysis projects.The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted.GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards.

View Article: PubMed Central - PubMed

Affiliation: Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, CA 94598, USA tbreddy@lbl.gov.

Show MeSH