Limits...
Gene3D: comprehensive structural and functional annotation of genomes.

Yeats C, Lees J, Reid A, Kellam P, Martin N, Liu X, Orengo C - Nucleic Acids Res. (2007)

Bottom Line: In order to enhance the investigations possible with this data, we have also incorporated a variety of protein annotation resources, including protein-protein interaction data, GO functional assignments, KEGG pathways, FUNCAT functional descriptions and links to microarray expression data.All of this data can be accessed through a newly re-designed website that has a focus on flexibility and clarity, with searches that can be restricted to a single genome or across the entire sequence database.Currently Gene3D contains over 3.5 million domain assignments for nearly 5 million proteins including 527 completed genomes.

View Article: PubMed Central - PubMed

Affiliation: UCL, Department of Molecular Biology & Biochemistry, Darwin Building, Gower St, London, UK. yeats@biochem.ucl.ac.uk

ABSTRACT
Gene3D provides comprehensive structural and functional annotation of most available protein sequences, including the UniProt, RefSeq and Integr8 resources. The main structural annotation is generated through scanning these sequences against the CATH structural domain database profile-HMM library. CATH is a database of manually derived PDB-based structural domains, placed within a hierarchy reflecting topology, homology and conservation and is able to infer more ancient and divergent homology relationships than sequence-based approaches. This data is supplemented with Pfam-A, other non-domain structural predictions (i.e. coiled coils) and experimental data from UniProt. In order to enhance the investigations possible with this data, we have also incorporated a variety of protein annotation resources, including protein-protein interaction data, GO functional assignments, KEGG pathways, FUNCAT functional descriptions and links to microarray expression data. All of this data can be accessed through a newly re-designed website that has a focus on flexibility and clarity, with searches that can be restricted to a single genome or across the entire sequence database. Currently Gene3D contains over 3.5 million domain assignments for nearly 5 million proteins including 527 completed genomes. This is available at: http://gene3d.biochem.ucl.ac.uk/

Show MeSH
Gene coverage of completed genomes in Gene3D. Shown in this figure are the percentages of genes in bacteria, archaea and eukaryotes that have at least one domain assigned by either (A) CATH, (B) Pfam or (C) both. It should be noted that not all the genomes have been completely scanned with Pfam—hence the coverage is lower than would be expected.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2238970&req=5

Figure 1: Gene coverage of completed genomes in Gene3D. Shown in this figure are the percentages of genes in bacteria, archaea and eukaryotes that have at least one domain assigned by either (A) CATH, (B) Pfam or (C) both. It should be noted that not all the genomes have been completely scanned with Pfam—hence the coverage is lower than would be expected.

Mentions: The September 2007 version of Gene3D contains ∼4.5 million distinct proteins, grouped into 190 000 protein families with more than 5 members (method described below)—around 600 000 proteins remain as ‘singletons’. Included in this are also 527 species (676 strains)—50 eukaryotes, 437 eubacteria and 39 archaea—totalling ∼1.9 million distinct proteins. See Figure 1 for the coverage of these genomes with CATH and Pfam domains. All the HMM-identified domains assigned to the 2046 CATH v3.1.0 superfamilies are sub-clustered at ten discrete sequence identity levels, ranging from 30–95% (files available for download), so as to aid accurate function transfer. For further details on additional annotation, including Pfam, low complexity regions, coiled coils, transmembrane helices, see Supplementary Table 2.Figure 1.


Gene3D: comprehensive structural and functional annotation of genomes.

Yeats C, Lees J, Reid A, Kellam P, Martin N, Liu X, Orengo C - Nucleic Acids Res. (2007)

Gene coverage of completed genomes in Gene3D. Shown in this figure are the percentages of genes in bacteria, archaea and eukaryotes that have at least one domain assigned by either (A) CATH, (B) Pfam or (C) both. It should be noted that not all the genomes have been completely scanned with Pfam—hence the coverage is lower than would be expected.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2238970&req=5

Figure 1: Gene coverage of completed genomes in Gene3D. Shown in this figure are the percentages of genes in bacteria, archaea and eukaryotes that have at least one domain assigned by either (A) CATH, (B) Pfam or (C) both. It should be noted that not all the genomes have been completely scanned with Pfam—hence the coverage is lower than would be expected.
Mentions: The September 2007 version of Gene3D contains ∼4.5 million distinct proteins, grouped into 190 000 protein families with more than 5 members (method described below)—around 600 000 proteins remain as ‘singletons’. Included in this are also 527 species (676 strains)—50 eukaryotes, 437 eubacteria and 39 archaea—totalling ∼1.9 million distinct proteins. See Figure 1 for the coverage of these genomes with CATH and Pfam domains. All the HMM-identified domains assigned to the 2046 CATH v3.1.0 superfamilies are sub-clustered at ten discrete sequence identity levels, ranging from 30–95% (files available for download), so as to aid accurate function transfer. For further details on additional annotation, including Pfam, low complexity regions, coiled coils, transmembrane helices, see Supplementary Table 2.Figure 1.

Bottom Line: In order to enhance the investigations possible with this data, we have also incorporated a variety of protein annotation resources, including protein-protein interaction data, GO functional assignments, KEGG pathways, FUNCAT functional descriptions and links to microarray expression data.All of this data can be accessed through a newly re-designed website that has a focus on flexibility and clarity, with searches that can be restricted to a single genome or across the entire sequence database.Currently Gene3D contains over 3.5 million domain assignments for nearly 5 million proteins including 527 completed genomes.

View Article: PubMed Central - PubMed

Affiliation: UCL, Department of Molecular Biology & Biochemistry, Darwin Building, Gower St, London, UK. yeats@biochem.ucl.ac.uk

ABSTRACT
Gene3D provides comprehensive structural and functional annotation of most available protein sequences, including the UniProt, RefSeq and Integr8 resources. The main structural annotation is generated through scanning these sequences against the CATH structural domain database profile-HMM library. CATH is a database of manually derived PDB-based structural domains, placed within a hierarchy reflecting topology, homology and conservation and is able to infer more ancient and divergent homology relationships than sequence-based approaches. This data is supplemented with Pfam-A, other non-domain structural predictions (i.e. coiled coils) and experimental data from UniProt. In order to enhance the investigations possible with this data, we have also incorporated a variety of protein annotation resources, including protein-protein interaction data, GO functional assignments, KEGG pathways, FUNCAT functional descriptions and links to microarray expression data. All of this data can be accessed through a newly re-designed website that has a focus on flexibility and clarity, with searches that can be restricted to a single genome or across the entire sequence database. Currently Gene3D contains over 3.5 million domain assignments for nearly 5 million proteins including 527 completed genomes. This is available at: http://gene3d.biochem.ucl.ac.uk/

Show MeSH