The SUPERFAMILY 1.75 database in 2014: a doubling of data.
Bottom Line: This tree is built with genomic-scale domain annotation data as before, but constantly updated when new species are introduced to the sequence library.Our Gene Ontology and other functional and phenotypic annotations previously reported have stood up to critical assessment by the function prediction community.We have now introduced these data in an integrated manner online at the level of an individual sequence, and--in the case of whole genomes--with enrichment analysis against a taxonomically defined background.
Affiliation: Computer Science, University of Bristol, Bristol, BS8 1UB, UK Matt.Oates@bristol.ac.uk.Show MeSH
Mentions: In total 1376 proteomes have been updated to a new version with 1818 completely new proteomes added since the original 1.75 release: 679 representing completely new species (181 Eukarya, 450 Bacteria, 48 Archaea), and 1139 strains of existing species (30 Eukarya, 1076 Bacteria, 33 Archaea). Of special note, 40 new Viridiplantae proteomes have been added since the last release, bringing the total number to 59. This extends the represented species of other resources, such as Phytozome (version 10.0.4) (16), that provided 48 of the proteomes included in SUPERFAMILY. At the level of individual sequences SUPERFAMILY now contains 34 222 445 linked sequence objects in the complete proteomes collection (including species with draft assemblies), with a total of 111 392 143 linked protein sequences covering other sources, such as UniProt. For a more detailed view of the sequence space interface between UniProt, SUPERFAMILY and known structures from the Protein Data Bank (PDB; as of 18 September 2014) please see Figure 2.
Affiliation: Computer Science, University of Bristol, Bristol, BS8 1UB, UK Matt.Oates@bristol.ac.uk.