Limits...
ECOD: an evolutionary classification of protein domains.

Cheng H, Schaeffer RD, Liao Y, Kinch LN, Pei J, Shi S, Kim BH, Grishin NV - PLoS Comput. Biol. (2014)

Bottom Line: The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates.This synchronization with PDB uniquely distinguishes ECOD among all protein classifications.Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies.

View Article: PubMed Central - PubMed

Affiliation: Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America.

ABSTRACT
Understanding the evolution of a protein, including both close and distant relationships, often reveals insight into its structure and function. Fast and easy access to such up-to-date information facilitates research. We have developed a hierarchical evolutionary classification of all proteins with experimentally determined spatial structures, and presented it as an interactive and updatable online database. ECOD (Evolutionary Classification of protein Domains) is distinct from other structural classifications in that it groups domains primarily by evolutionary relationships (homology), rather than topology (or "fold"). This distinction highlights cases of homology between domains of differing topology to aid in understanding of protein structure evolution. ECOD uniquely emphasizes distantly related homologs that are difficult to detect, and thus catalogs the largest number of evolutionary links among structural domain classifications. Placing distant homologs together underscores the ancestral similarities of these proteins and draws attention to the most important regions of sequence and structure, as well as conserved functional sites. ECOD also recognizes closer sequence-based relationships between protein domains. Currently, approximately 100,000 protein structures are classified in ECOD into 9,000 sequence families clustered into close to 2,000 evolutionary groups. The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates. This synchronization with PDB uniquely distinguishes ECOD among all protein classifications. Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies.

Show MeSH
Classification of ECOD and ECOD hierarchical levels with respect to the PDB and other classifications.A) A cumulative sum of PDB release dates from Jan-2000 to Jan-2014 (red) compared to classified PDB depositions in ECOD (green), SCOP (cyan), and CATH (blue). Any deposition with at least one domain classified is counted. ECOD consistently classifies more structures than SCOP and CATH and is more up-to-date. b) The cumulative sum of PDB deposition dates in ECOD hierarchical levels. Each group is classified once by its oldest deposition. The number of new levels increases consistently over time over the 2000 to 2014 time period.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4256011&req=5

pcbi-1003926-g004: Classification of ECOD and ECOD hierarchical levels with respect to the PDB and other classifications.A) A cumulative sum of PDB release dates from Jan-2000 to Jan-2014 (red) compared to classified PDB depositions in ECOD (green), SCOP (cyan), and CATH (blue). Any deposition with at least one domain classified is counted. ECOD consistently classifies more structures than SCOP and CATH and is more up-to-date. b) The cumulative sum of PDB deposition dates in ECOD hierarchical levels. Each group is classified once by its oldest deposition. The number of new levels increases consistently over time over the 2000 to 2014 time period.

Mentions: We also compare ECOD to the most recent releases of SCOP and CATH. ECOD, SCOP, and CATH differ in domain partition strategy, classification hierarchy, and simply in the number of structures considered. At the time of writing, ECOD classifies 93,663 PDB depositions containing 239,303 protein chains, SCOP 1.75 contains 38,221 PDBs and 85,141 chains, and CATH v3.5 contains 51,334 PDBs and 118,792 chains. Of those chains classified in ECOD that are not in SCOP (and not in a special architecture), 137,794 were automatically classified and 2,484 were classified manually. Of those chains classified in ECOD, but not in CATH (and not in a special architecture), 106,474 were automatically classified and 2,521 were classified manually. The growth of the PDB over time is compared to the number of structures classified in ECOD, CATH, and SCOP (Fig. 4(a)). The difference between the number of structures in the PDB and those in the main architectures of ECOD can be primarily accounted for by the number of structures contained in ECOD special architectures (i.e. coiled-coil, peptide, non-peptide polymers, and low-resolution structures that could not be classified by sequence). The growth of the hierarchical levels from 2000–2013 indicates that although evolutionary distinct groups (i.e. X- and H- groups) are being discovered at a steady pace, the predominant source of new domains in ECOD is from sequence families (F-groups) being associated with existing homologous groups (Fig. 4(b)).


ECOD: an evolutionary classification of protein domains.

Cheng H, Schaeffer RD, Liao Y, Kinch LN, Pei J, Shi S, Kim BH, Grishin NV - PLoS Comput. Biol. (2014)

Classification of ECOD and ECOD hierarchical levels with respect to the PDB and other classifications.A) A cumulative sum of PDB release dates from Jan-2000 to Jan-2014 (red) compared to classified PDB depositions in ECOD (green), SCOP (cyan), and CATH (blue). Any deposition with at least one domain classified is counted. ECOD consistently classifies more structures than SCOP and CATH and is more up-to-date. b) The cumulative sum of PDB deposition dates in ECOD hierarchical levels. Each group is classified once by its oldest deposition. The number of new levels increases consistently over time over the 2000 to 2014 time period.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4256011&req=5

pcbi-1003926-g004: Classification of ECOD and ECOD hierarchical levels with respect to the PDB and other classifications.A) A cumulative sum of PDB release dates from Jan-2000 to Jan-2014 (red) compared to classified PDB depositions in ECOD (green), SCOP (cyan), and CATH (blue). Any deposition with at least one domain classified is counted. ECOD consistently classifies more structures than SCOP and CATH and is more up-to-date. b) The cumulative sum of PDB deposition dates in ECOD hierarchical levels. Each group is classified once by its oldest deposition. The number of new levels increases consistently over time over the 2000 to 2014 time period.
Mentions: We also compare ECOD to the most recent releases of SCOP and CATH. ECOD, SCOP, and CATH differ in domain partition strategy, classification hierarchy, and simply in the number of structures considered. At the time of writing, ECOD classifies 93,663 PDB depositions containing 239,303 protein chains, SCOP 1.75 contains 38,221 PDBs and 85,141 chains, and CATH v3.5 contains 51,334 PDBs and 118,792 chains. Of those chains classified in ECOD that are not in SCOP (and not in a special architecture), 137,794 were automatically classified and 2,484 were classified manually. Of those chains classified in ECOD, but not in CATH (and not in a special architecture), 106,474 were automatically classified and 2,521 were classified manually. The growth of the PDB over time is compared to the number of structures classified in ECOD, CATH, and SCOP (Fig. 4(a)). The difference between the number of structures in the PDB and those in the main architectures of ECOD can be primarily accounted for by the number of structures contained in ECOD special architectures (i.e. coiled-coil, peptide, non-peptide polymers, and low-resolution structures that could not be classified by sequence). The growth of the hierarchical levels from 2000–2013 indicates that although evolutionary distinct groups (i.e. X- and H- groups) are being discovered at a steady pace, the predominant source of new domains in ECOD is from sequence families (F-groups) being associated with existing homologous groups (Fig. 4(b)).

Bottom Line: The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates.This synchronization with PDB uniquely distinguishes ECOD among all protein classifications.Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies.

View Article: PubMed Central - PubMed

Affiliation: Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America.

ABSTRACT
Understanding the evolution of a protein, including both close and distant relationships, often reveals insight into its structure and function. Fast and easy access to such up-to-date information facilitates research. We have developed a hierarchical evolutionary classification of all proteins with experimentally determined spatial structures, and presented it as an interactive and updatable online database. ECOD (Evolutionary Classification of protein Domains) is distinct from other structural classifications in that it groups domains primarily by evolutionary relationships (homology), rather than topology (or "fold"). This distinction highlights cases of homology between domains of differing topology to aid in understanding of protein structure evolution. ECOD uniquely emphasizes distantly related homologs that are difficult to detect, and thus catalogs the largest number of evolutionary links among structural domain classifications. Placing distant homologs together underscores the ancestral similarities of these proteins and draws attention to the most important regions of sequence and structure, as well as conserved functional sites. ECOD also recognizes closer sequence-based relationships between protein domains. Currently, approximately 100,000 protein structures are classified in ECOD into 9,000 sequence families clustered into close to 2,000 evolutionary groups. The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates. This synchronization with PDB uniquely distinguishes ECOD among all protein classifications. Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies.

Show MeSH