Limits...
ECOD: an evolutionary classification of protein domains.

Cheng H, Schaeffer RD, Liao Y, Kinch LN, Pei J, Shi S, Kim BH, Grishin NV - PLoS Comput. Biol. (2014)

Bottom Line: The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates.This synchronization with PDB uniquely distinguishes ECOD among all protein classifications.Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies.

View Article: PubMed Central - PubMed

Affiliation: Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America.

ABSTRACT
Understanding the evolution of a protein, including both close and distant relationships, often reveals insight into its structure and function. Fast and easy access to such up-to-date information facilitates research. We have developed a hierarchical evolutionary classification of all proteins with experimentally determined spatial structures, and presented it as an interactive and updatable online database. ECOD (Evolutionary Classification of protein Domains) is distinct from other structural classifications in that it groups domains primarily by evolutionary relationships (homology), rather than topology (or "fold"). This distinction highlights cases of homology between domains of differing topology to aid in understanding of protein structure evolution. ECOD uniquely emphasizes distantly related homologs that are difficult to detect, and thus catalogs the largest number of evolutionary links among structural domain classifications. Placing distant homologs together underscores the ancestral similarities of these proteins and draws attention to the most important regions of sequence and structure, as well as conserved functional sites. ECOD also recognizes closer sequence-based relationships between protein domains. Currently, approximately 100,000 protein structures are classified in ECOD into 9,000 sequence families clustered into close to 2,000 evolutionary groups. The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates. This synchronization with PDB uniquely distinguishes ECOD among all protein classifications. Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies.

Show MeSH

Related in: MedlinePlus

Structures of homologous members of the FZ-CRD (A,B), glypican (C), folate receptor (D), and NPC1 (E).Conserved disulfide bonds are shown in pink sticks with labels by their sides. Four core helices are labeled H1–H4. N- and C-termini are shown. Homology detected by distinct cysteine residue patterns was used as the basis for merging these families into a homologous group (H-group) in ECOD. F. Pairwise Dali Z- scores between pairs of the structures. G. Multiple sequence alignment of the structures shown, with conserved cysteines highlighted on black background. Cysteines forming a disulfide bond are labeled by the same sign for FZ-CRDs from Frizzled8 and MuSK (line above the sequences) and glypican, folate receptor and NPC1 (line below the sequences). Four core helices (H1–H4) are shown below the alignment in cylinder representation.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4256011&req=5

pcbi-1003926-g013: Structures of homologous members of the FZ-CRD (A,B), glypican (C), folate receptor (D), and NPC1 (E).Conserved disulfide bonds are shown in pink sticks with labels by their sides. Four core helices are labeled H1–H4. N- and C-termini are shown. Homology detected by distinct cysteine residue patterns was used as the basis for merging these families into a homologous group (H-group) in ECOD. F. Pairwise Dali Z- scores between pairs of the structures. G. Multiple sequence alignment of the structures shown, with conserved cysteines highlighted on black background. Cysteines forming a disulfide bond are labeled by the same sign for FZ-CRDs from Frizzled8 and MuSK (line above the sequences) and glypican, folate receptor and NPC1 (line below the sequences). Four core helices (H1–H4) are shown below the alignment in cylinder representation.

Mentions: In SCOP, SAM MTases and Rossmann domains are classified in different folds (and therefore different superfamilies, SAM MTases: c.66.1; Rossmann domains: c.2.1), while in CATH, they are in the same topology group but different homology groups (SAM MTases: 3.40.50.150; Rossmann domains: 3.40.50.720). Although both SCOP and CATH indicate by their classification that SAM MTases and Rossmann domains are not homologous, literature suggests that they are actually related [46], [47]. As noted in reference [41], the overall structural similarity between SAM MTases and Rossmann domains is reflected in the observation that they are reciprocally the closest DALI hits to each other. In addition, SAM MTases and Rossmann domains bind their respective cofactors in a very similar fashion: the common adenosine part of the cofactors resides on top of a glycine-rich loop between the first strand and the first helix, and the adenosine ribose hydroxyls usually form hydrogen bonds with a conserved aspartate or glutamate residue at the end of the second strand (Fig. 12(a,b)) [41], [45], [46]. Indeed, the sequence-based homology detection algorithm HHsearch [21] and server HHpred [48] also provide statistical evidence that SAM MTases and Rossmann domains are related. In Cytoscape [26] display of SCOP domains and high-scoring links between them, numerous links with HHsearch probability above 90% exist between SAM MTases and Rossmann domains. In HHpred runs, for instance, when the Rossmann-domain in formaldehyde dehydrogenase (SCOP domain d1kola2, classified in c.2.1, Fig. 12(b)) is submitted as query to search against scop95_v1.75B database with secondary structure scoring turned off, the top hits within the same c.2.1 superfamily are followed by a region of mixed hits from both Rossmann domains superfamily (c.2.1) and SAM MTases superfamily (c.66.1). The highest-scoring hit from SAM MTases superfamily is hypothetical protein TM0748 (SCOP domain d1o54a_) with a 97.89% probability, E-value 9.4e-09, and identities 17% out of 110 aligned residues. Another SAM-MTase, ribosomal protein L11 methyltransferase (SCOP domain d2nxca1, Fig. 13(a) shows a same domain d2nxea1 with SAM bound), is detected with a 97.33% probability, E-value 3.4e-07, and identities 23% out of 102 aligned residues. Based on overall structural similarity, cofactor-binding resemblance, the number of confident homologous links observed between domains in each group, and statistically significant sequence similarity, ECOD classifies SAM MTases and Rossmann domains in the same homology (H-) group but different topology (T-) groups.


ECOD: an evolutionary classification of protein domains.

Cheng H, Schaeffer RD, Liao Y, Kinch LN, Pei J, Shi S, Kim BH, Grishin NV - PLoS Comput. Biol. (2014)

Structures of homologous members of the FZ-CRD (A,B), glypican (C), folate receptor (D), and NPC1 (E).Conserved disulfide bonds are shown in pink sticks with labels by their sides. Four core helices are labeled H1–H4. N- and C-termini are shown. Homology detected by distinct cysteine residue patterns was used as the basis for merging these families into a homologous group (H-group) in ECOD. F. Pairwise Dali Z- scores between pairs of the structures. G. Multiple sequence alignment of the structures shown, with conserved cysteines highlighted on black background. Cysteines forming a disulfide bond are labeled by the same sign for FZ-CRDs from Frizzled8 and MuSK (line above the sequences) and glypican, folate receptor and NPC1 (line below the sequences). Four core helices (H1–H4) are shown below the alignment in cylinder representation.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4256011&req=5

pcbi-1003926-g013: Structures of homologous members of the FZ-CRD (A,B), glypican (C), folate receptor (D), and NPC1 (E).Conserved disulfide bonds are shown in pink sticks with labels by their sides. Four core helices are labeled H1–H4. N- and C-termini are shown. Homology detected by distinct cysteine residue patterns was used as the basis for merging these families into a homologous group (H-group) in ECOD. F. Pairwise Dali Z- scores between pairs of the structures. G. Multiple sequence alignment of the structures shown, with conserved cysteines highlighted on black background. Cysteines forming a disulfide bond are labeled by the same sign for FZ-CRDs from Frizzled8 and MuSK (line above the sequences) and glypican, folate receptor and NPC1 (line below the sequences). Four core helices (H1–H4) are shown below the alignment in cylinder representation.
Mentions: In SCOP, SAM MTases and Rossmann domains are classified in different folds (and therefore different superfamilies, SAM MTases: c.66.1; Rossmann domains: c.2.1), while in CATH, they are in the same topology group but different homology groups (SAM MTases: 3.40.50.150; Rossmann domains: 3.40.50.720). Although both SCOP and CATH indicate by their classification that SAM MTases and Rossmann domains are not homologous, literature suggests that they are actually related [46], [47]. As noted in reference [41], the overall structural similarity between SAM MTases and Rossmann domains is reflected in the observation that they are reciprocally the closest DALI hits to each other. In addition, SAM MTases and Rossmann domains bind their respective cofactors in a very similar fashion: the common adenosine part of the cofactors resides on top of a glycine-rich loop between the first strand and the first helix, and the adenosine ribose hydroxyls usually form hydrogen bonds with a conserved aspartate or glutamate residue at the end of the second strand (Fig. 12(a,b)) [41], [45], [46]. Indeed, the sequence-based homology detection algorithm HHsearch [21] and server HHpred [48] also provide statistical evidence that SAM MTases and Rossmann domains are related. In Cytoscape [26] display of SCOP domains and high-scoring links between them, numerous links with HHsearch probability above 90% exist between SAM MTases and Rossmann domains. In HHpred runs, for instance, when the Rossmann-domain in formaldehyde dehydrogenase (SCOP domain d1kola2, classified in c.2.1, Fig. 12(b)) is submitted as query to search against scop95_v1.75B database with secondary structure scoring turned off, the top hits within the same c.2.1 superfamily are followed by a region of mixed hits from both Rossmann domains superfamily (c.2.1) and SAM MTases superfamily (c.66.1). The highest-scoring hit from SAM MTases superfamily is hypothetical protein TM0748 (SCOP domain d1o54a_) with a 97.89% probability, E-value 9.4e-09, and identities 17% out of 110 aligned residues. Another SAM-MTase, ribosomal protein L11 methyltransferase (SCOP domain d2nxca1, Fig. 13(a) shows a same domain d2nxea1 with SAM bound), is detected with a 97.33% probability, E-value 3.4e-07, and identities 23% out of 102 aligned residues. Based on overall structural similarity, cofactor-binding resemblance, the number of confident homologous links observed between domains in each group, and statistically significant sequence similarity, ECOD classifies SAM MTases and Rossmann domains in the same homology (H-) group but different topology (T-) groups.

Bottom Line: The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates.This synchronization with PDB uniquely distinguishes ECOD among all protein classifications.Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies.

View Article: PubMed Central - PubMed

Affiliation: Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America.

ABSTRACT
Understanding the evolution of a protein, including both close and distant relationships, often reveals insight into its structure and function. Fast and easy access to such up-to-date information facilitates research. We have developed a hierarchical evolutionary classification of all proteins with experimentally determined spatial structures, and presented it as an interactive and updatable online database. ECOD (Evolutionary Classification of protein Domains) is distinct from other structural classifications in that it groups domains primarily by evolutionary relationships (homology), rather than topology (or "fold"). This distinction highlights cases of homology between domains of differing topology to aid in understanding of protein structure evolution. ECOD uniquely emphasizes distantly related homologs that are difficult to detect, and thus catalogs the largest number of evolutionary links among structural domain classifications. Placing distant homologs together underscores the ancestral similarities of these proteins and draws attention to the most important regions of sequence and structure, as well as conserved functional sites. ECOD also recognizes closer sequence-based relationships between protein domains. Currently, approximately 100,000 protein structures are classified in ECOD into 9,000 sequence families clustered into close to 2,000 evolutionary groups. The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates. This synchronization with PDB uniquely distinguishes ECOD among all protein classifications. Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies.

Show MeSH
Related in: MedlinePlus