The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification.
Bottom Line: The database currently hosts information for about 19,200 studies, 56,000 Biosamples, 56,000 sequencing projects and 39,400 analysis projects.The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted.GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards.
Affiliation: Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, CA 94598, USA firstname.lastname@example.org.Show MeSH
Mentions: Version 5 of the database is founded on a fundamentally redesigned schema to accommodate a four level project classification system (Figure 1). The new classification system is comprised of Studies, Biosamples, Sequencing Projects (SPs) and Analysis Projects (APs). Studies constitute the highest level of classification in the system, containing Biosamples, SPs and APs that are part of a single initiative. GOLD's Biosamples represent the physical isolate or environmental material from which genetic material is extracted for sequencing. GOLD's Biosamples have no relation to NCBI BioSamples. GOLD's SPs represent sequencing protocols such as whole genome sequencing (WGS), transcriptomes, metagenomes, metatranscriptomes, methylation sequencing, etc. applied to Biosamples. APs are the analytical processes applied to the SPs. Multiple different assemblies or annotations of the same SPs would result in multiple different APs with varying metadata that need to be captured. These four components are described in more detail below.
Affiliation: Prokaryotic Super Program, DOE Joint Genome Institute, Walnut Creek, CA 94598, USA email@example.com.