Limits...
ECOD: an evolutionary classification of protein domains.

Cheng H, Schaeffer RD, Liao Y, Kinch LN, Pei J, Shi S, Kim BH, Grishin NV - PLoS Comput. Biol. (2014)

Bottom Line: The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates.This synchronization with PDB uniquely distinguishes ECOD among all protein classifications.Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies.

View Article: PubMed Central - PubMed

Affiliation: Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America.

ABSTRACT
Understanding the evolution of a protein, including both close and distant relationships, often reveals insight into its structure and function. Fast and easy access to such up-to-date information facilitates research. We have developed a hierarchical evolutionary classification of all proteins with experimentally determined spatial structures, and presented it as an interactive and updatable online database. ECOD (Evolutionary Classification of protein Domains) is distinct from other structural classifications in that it groups domains primarily by evolutionary relationships (homology), rather than topology (or "fold"). This distinction highlights cases of homology between domains of differing topology to aid in understanding of protein structure evolution. ECOD uniquely emphasizes distantly related homologs that are difficult to detect, and thus catalogs the largest number of evolutionary links among structural domain classifications. Placing distant homologs together underscores the ancestral similarities of these proteins and draws attention to the most important regions of sequence and structure, as well as conserved functional sites. ECOD also recognizes closer sequence-based relationships between protein domains. Currently, approximately 100,000 protein structures are classified in ECOD into 9,000 sequence families clustered into close to 2,000 evolutionary groups. The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates. This synchronization with PDB uniquely distinguishes ECOD among all protein classifications. Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies.

Show MeSH
Classification methods used for non-redundant (NR) chains for weekly ECOD updates.“Automatic” chains could be completely and confidently classified by domain pipeline and required no manual intervention. “Manual” chains were at best partly classified by software and required manual curation (i.e. some domain boundaries could not be properly detected or some domains could not be reliably classified using sequence methods). Non-domain” chains contained peptides, coiled-coils, or other cases requiring manual curation.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4256011&req=5

pcbi-1003926-g005: Classification methods used for non-redundant (NR) chains for weekly ECOD updates.“Automatic” chains could be completely and confidently classified by domain pipeline and required no manual intervention. “Manual” chains were at best partly classified by software and required manual curation (i.e. some domain boundaries could not be properly detected or some domains could not be reliably classified using sequence methods). Non-domain” chains contained peptides, coiled-coils, or other cases requiring manual curation.

Mentions: Since the July 2013 version, whose statistics are presented here, the subsequent 25 weekly releases by the PDB have been automatically classified (Fig. 5). Each week, protein chains are clustered at 95% redundancy, representatives for those non-redundant chains are classified; those remaining chains are classified when the initial automatic and manual classification pass are completed. For each weekly update, the majority (∼89%) of non-redundant (<95%) chains can be partitioned and assigned automatically (134.1±40.4). Those chains that cannot be resolved automatically are manually curated. On average, 11.7±4.9 chains per week were classified as manual representatives in ECOD, whereas 5.1±3.2 were chains not containing domains (i.e. peptides, coiled-coils, or fragments) that were resolved by assignment to special categories or other methods that did not modify the hierarchy. Overall, the majority of protein chains in weekly PDB releases can be classified automatically into ECOD.


ECOD: an evolutionary classification of protein domains.

Cheng H, Schaeffer RD, Liao Y, Kinch LN, Pei J, Shi S, Kim BH, Grishin NV - PLoS Comput. Biol. (2014)

Classification methods used for non-redundant (NR) chains for weekly ECOD updates.“Automatic” chains could be completely and confidently classified by domain pipeline and required no manual intervention. “Manual” chains were at best partly classified by software and required manual curation (i.e. some domain boundaries could not be properly detected or some domains could not be reliably classified using sequence methods). Non-domain” chains contained peptides, coiled-coils, or other cases requiring manual curation.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4256011&req=5

pcbi-1003926-g005: Classification methods used for non-redundant (NR) chains for weekly ECOD updates.“Automatic” chains could be completely and confidently classified by domain pipeline and required no manual intervention. “Manual” chains were at best partly classified by software and required manual curation (i.e. some domain boundaries could not be properly detected or some domains could not be reliably classified using sequence methods). Non-domain” chains contained peptides, coiled-coils, or other cases requiring manual curation.
Mentions: Since the July 2013 version, whose statistics are presented here, the subsequent 25 weekly releases by the PDB have been automatically classified (Fig. 5). Each week, protein chains are clustered at 95% redundancy, representatives for those non-redundant chains are classified; those remaining chains are classified when the initial automatic and manual classification pass are completed. For each weekly update, the majority (∼89%) of non-redundant (<95%) chains can be partitioned and assigned automatically (134.1±40.4). Those chains that cannot be resolved automatically are manually curated. On average, 11.7±4.9 chains per week were classified as manual representatives in ECOD, whereas 5.1±3.2 were chains not containing domains (i.e. peptides, coiled-coils, or fragments) that were resolved by assignment to special categories or other methods that did not modify the hierarchy. Overall, the majority of protein chains in weekly PDB releases can be classified automatically into ECOD.

Bottom Line: The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates.This synchronization with PDB uniquely distinguishes ECOD among all protein classifications.Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies.

View Article: PubMed Central - PubMed

Affiliation: Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America.

ABSTRACT
Understanding the evolution of a protein, including both close and distant relationships, often reveals insight into its structure and function. Fast and easy access to such up-to-date information facilitates research. We have developed a hierarchical evolutionary classification of all proteins with experimentally determined spatial structures, and presented it as an interactive and updatable online database. ECOD (Evolutionary Classification of protein Domains) is distinct from other structural classifications in that it groups domains primarily by evolutionary relationships (homology), rather than topology (or "fold"). This distinction highlights cases of homology between domains of differing topology to aid in understanding of protein structure evolution. ECOD uniquely emphasizes distantly related homologs that are difficult to detect, and thus catalogs the largest number of evolutionary links among structural domain classifications. Placing distant homologs together underscores the ancestral similarities of these proteins and draws attention to the most important regions of sequence and structure, as well as conserved functional sites. ECOD also recognizes closer sequence-based relationships between protein domains. Currently, approximately 100,000 protein structures are classified in ECOD into 9,000 sequence families clustered into close to 2,000 evolutionary groups. The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates. This synchronization with PDB uniquely distinguishes ECOD among all protein classifications. Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies.

Show MeSH