Limits...
ECOD: an evolutionary classification of protein domains.

Cheng H, Schaeffer RD, Liao Y, Kinch LN, Pei J, Shi S, Kim BH, Grishin NV - PLoS Comput. Biol. (2014)

Bottom Line: The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates.This synchronization with PDB uniquely distinguishes ECOD among all protein classifications.Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies.

View Article: PubMed Central - PubMed

Affiliation: Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America.

ABSTRACT
Understanding the evolution of a protein, including both close and distant relationships, often reveals insight into its structure and function. Fast and easy access to such up-to-date information facilitates research. We have developed a hierarchical evolutionary classification of all proteins with experimentally determined spatial structures, and presented it as an interactive and updatable online database. ECOD (Evolutionary Classification of protein Domains) is distinct from other structural classifications in that it groups domains primarily by evolutionary relationships (homology), rather than topology (or "fold"). This distinction highlights cases of homology between domains of differing topology to aid in understanding of protein structure evolution. ECOD uniquely emphasizes distantly related homologs that are difficult to detect, and thus catalogs the largest number of evolutionary links among structural domain classifications. Placing distant homologs together underscores the ancestral similarities of these proteins and draws attention to the most important regions of sequence and structure, as well as conserved functional sites. ECOD also recognizes closer sequence-based relationships between protein domains. Currently, approximately 100,000 protein structures are classified in ECOD into 9,000 sequence families clustered into close to 2,000 evolutionary groups. The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates. This synchronization with PDB uniquely distinguishes ECOD among all protein classifications. Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies.

Show MeSH
Venn diagram of the shared homologous domain pairs among those ECOD (cyan), SCOP (green), and CATH (red) nonredundant domains with similar (80%) domain ranges.A plurality of domain pairs are shared among all three classifications. A large fraction of domain pairs can solely be observed in ECOD. 11.4% of domain pairs are only shared between ECOD and CATH.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4256011&req=5

pcbi-1003926-g009: Venn diagram of the shared homologous domain pairs among those ECOD (cyan), SCOP (green), and CATH (red) nonredundant domains with similar (80%) domain ranges.A plurality of domain pairs are shared among all three classifications. A large fraction of domain pairs can solely be observed in ECOD. 11.4% of domain pairs are only shared between ECOD and CATH.

Mentions: The difference in homologous links among equivalent domains was analyzed in ECOD, SCOP, and CATH. We define equivalent domains as those that share 80% residue coverage in all classifications. This subset of domains contains those domains whose partition is similar among classifications, but whose classification and homologous cluster size differ. We then analyze whether those domains that share a homologous link within one classification also share that link in other classifications. For the purposes of this analysis, only SCOP domains from canonical SCOP classes [a–d] are considered. Of the total domains in ECOD, 67,559 are defined equivalently (by 80% residue coverage) in SCOP and CATH. As many of these domains are identical or near identical in sequence, only domains with less than 95% sequence identity are used. There are 9,523 equivalent, non-redundant domains, shared among SCOP, CATH, and ECOD. Any pair of those equivalent domains belonging to the same H-group is considered to be homologous, 1,030,085 of these homologous domain pairs were observed in ECOD. Similar analysis was performed on SCOP superfamilies and CATH homologous superfamilies, where 711,894 and 680,726 homologous domain pairs were observed respectively. On average, 49.5% of domain pairs were shared between classifications, 36.6% of domain pairs were only observed in ECOD, 11.4% of domain pairs were observed only between ECOD and CATH (Fig. 9). Negligible numbers of domain pairs were observed in SCOP only, CATH only, or SCOP/CATH only. These results reflect a set in which most known homologous relationships among similarly partitioned domains are similar in ECOD as in SCOP and CATH. Additionally, ECOD catalogs many homologous relationships (among these similarly partitioned domains) that are not observed elsewhere.


ECOD: an evolutionary classification of protein domains.

Cheng H, Schaeffer RD, Liao Y, Kinch LN, Pei J, Shi S, Kim BH, Grishin NV - PLoS Comput. Biol. (2014)

Venn diagram of the shared homologous domain pairs among those ECOD (cyan), SCOP (green), and CATH (red) nonredundant domains with similar (80%) domain ranges.A plurality of domain pairs are shared among all three classifications. A large fraction of domain pairs can solely be observed in ECOD. 11.4% of domain pairs are only shared between ECOD and CATH.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4256011&req=5

pcbi-1003926-g009: Venn diagram of the shared homologous domain pairs among those ECOD (cyan), SCOP (green), and CATH (red) nonredundant domains with similar (80%) domain ranges.A plurality of domain pairs are shared among all three classifications. A large fraction of domain pairs can solely be observed in ECOD. 11.4% of domain pairs are only shared between ECOD and CATH.
Mentions: The difference in homologous links among equivalent domains was analyzed in ECOD, SCOP, and CATH. We define equivalent domains as those that share 80% residue coverage in all classifications. This subset of domains contains those domains whose partition is similar among classifications, but whose classification and homologous cluster size differ. We then analyze whether those domains that share a homologous link within one classification also share that link in other classifications. For the purposes of this analysis, only SCOP domains from canonical SCOP classes [a–d] are considered. Of the total domains in ECOD, 67,559 are defined equivalently (by 80% residue coverage) in SCOP and CATH. As many of these domains are identical or near identical in sequence, only domains with less than 95% sequence identity are used. There are 9,523 equivalent, non-redundant domains, shared among SCOP, CATH, and ECOD. Any pair of those equivalent domains belonging to the same H-group is considered to be homologous, 1,030,085 of these homologous domain pairs were observed in ECOD. Similar analysis was performed on SCOP superfamilies and CATH homologous superfamilies, where 711,894 and 680,726 homologous domain pairs were observed respectively. On average, 49.5% of domain pairs were shared between classifications, 36.6% of domain pairs were only observed in ECOD, 11.4% of domain pairs were observed only between ECOD and CATH (Fig. 9). Negligible numbers of domain pairs were observed in SCOP only, CATH only, or SCOP/CATH only. These results reflect a set in which most known homologous relationships among similarly partitioned domains are similar in ECOD as in SCOP and CATH. Additionally, ECOD catalogs many homologous relationships (among these similarly partitioned domains) that are not observed elsewhere.

Bottom Line: The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates.This synchronization with PDB uniquely distinguishes ECOD among all protein classifications.Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies.

View Article: PubMed Central - PubMed

Affiliation: Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America.

ABSTRACT
Understanding the evolution of a protein, including both close and distant relationships, often reveals insight into its structure and function. Fast and easy access to such up-to-date information facilitates research. We have developed a hierarchical evolutionary classification of all proteins with experimentally determined spatial structures, and presented it as an interactive and updatable online database. ECOD (Evolutionary Classification of protein Domains) is distinct from other structural classifications in that it groups domains primarily by evolutionary relationships (homology), rather than topology (or "fold"). This distinction highlights cases of homology between domains of differing topology to aid in understanding of protein structure evolution. ECOD uniquely emphasizes distantly related homologs that are difficult to detect, and thus catalogs the largest number of evolutionary links among structural domain classifications. Placing distant homologs together underscores the ancestral similarities of these proteins and draws attention to the most important regions of sequence and structure, as well as conserved functional sites. ECOD also recognizes closer sequence-based relationships between protein domains. Currently, approximately 100,000 protein structures are classified in ECOD into 9,000 sequence families clustered into close to 2,000 evolutionary groups. The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates. This synchronization with PDB uniquely distinguishes ECOD among all protein classifications. Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies.

Show MeSH