Limits...
ECOD: an evolutionary classification of protein domains.

Cheng H, Schaeffer RD, Liao Y, Kinch LN, Pei J, Shi S, Kim BH, Grishin NV - PLoS Comput. Biol. (2014)

Bottom Line: The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates.This synchronization with PDB uniquely distinguishes ECOD among all protein classifications.Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies.

View Article: PubMed Central - PubMed

Affiliation: Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America.

ABSTRACT
Understanding the evolution of a protein, including both close and distant relationships, often reveals insight into its structure and function. Fast and easy access to such up-to-date information facilitates research. We have developed a hierarchical evolutionary classification of all proteins with experimentally determined spatial structures, and presented it as an interactive and updatable online database. ECOD (Evolutionary Classification of protein Domains) is distinct from other structural classifications in that it groups domains primarily by evolutionary relationships (homology), rather than topology (or "fold"). This distinction highlights cases of homology between domains of differing topology to aid in understanding of protein structure evolution. ECOD uniquely emphasizes distantly related homologs that are difficult to detect, and thus catalogs the largest number of evolutionary links among structural domain classifications. Placing distant homologs together underscores the ancestral similarities of these proteins and draws attention to the most important regions of sequence and structure, as well as conserved functional sites. ECOD also recognizes closer sequence-based relationships between protein domains. Currently, approximately 100,000 protein structures are classified in ECOD into 9,000 sequence families clustered into close to 2,000 evolutionary groups. The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates. This synchronization with PDB uniquely distinguishes ECOD among all protein classifications. Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies.

Show MeSH
ECOD recognizes novel evolutionary relationships.A) Duf371 (3cbn) forms an 8-stranded β-barrel from intertwined β-strands of a tandem structural duplication. The N-terminal half (blue shades) includes an overside connection between adjacent β-strands (blue) that follows a conserved His (black spheres). The symmetrically related C-terminal half (red shades) includes a similar overside connection (red) following a less conserved His (gray spheres). B) The Duf371 C-terminal repeat (salmon) is rotated about the Z-axis to superimpose (RMSD 1.3) with the N-terminal repeat (slate). C) The GutA-like PTS system IIA component (2f9h) forms a similar duplicated β-barrel. An invariant His in the C-terminal half likely represent the PTS IIA phosphorylation site. D) The PK β-barrel domain-like fold (1pkla1) displays a similar intertwined topology, but retains only a single overside connection (blue) in the N-terminal half. E) PSI-BLAST alignment of the Duf371 repeats detected with Mefer0473 sequence supports the duplication event, with sequence similarities indicated between N-terminal and C-terminal halves. A structure-based alignment of the 2F9H C-terminus is included below. Structural elements (arrow for strand and cylinder for helix) and conservations (calculated by Al2Co [59]) are indicated above/below the corresponding sequences. Conserved positions are highlighted yellow (mainly hydrophobic) and black (polar). Surface representations of F) PTSIIA in the same orientation as in panel C and G) Duf371 in the rotated orientation of panel B are colored in rainbow according to conservation, from blue (less) to red (more).
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4256011&req=5

pcbi-1003926-g014: ECOD recognizes novel evolutionary relationships.A) Duf371 (3cbn) forms an 8-stranded β-barrel from intertwined β-strands of a tandem structural duplication. The N-terminal half (blue shades) includes an overside connection between adjacent β-strands (blue) that follows a conserved His (black spheres). The symmetrically related C-terminal half (red shades) includes a similar overside connection (red) following a less conserved His (gray spheres). B) The Duf371 C-terminal repeat (salmon) is rotated about the Z-axis to superimpose (RMSD 1.3) with the N-terminal repeat (slate). C) The GutA-like PTS system IIA component (2f9h) forms a similar duplicated β-barrel. An invariant His in the C-terminal half likely represent the PTS IIA phosphorylation site. D) The PK β-barrel domain-like fold (1pkla1) displays a similar intertwined topology, but retains only a single overside connection (blue) in the N-terminal half. E) PSI-BLAST alignment of the Duf371 repeats detected with Mefer0473 sequence supports the duplication event, with sequence similarities indicated between N-terminal and C-terminal halves. A structure-based alignment of the 2F9H C-terminus is included below. Structural elements (arrow for strand and cylinder for helix) and conservations (calculated by Al2Co [59]) are indicated above/below the corresponding sequences. Conserved positions are highlighted yellow (mainly hydrophobic) and black (polar). Surface representations of F) PTSIIA in the same orientation as in panel C and G) Duf371 in the rotated orientation of panel B are colored in rainbow according to conservation, from blue (less) to red (more).

Mentions: ECOD establishes a previously unrecognized homologous link between a domain of unknown function (Duf371, PDB:3cbn) and the bacterial GutA-like PTS system glucitol/sorbitol-specific IIA component (PTSIIA, PDB:2f9h). While Duf371 is absent in SCOP, CATH classifies its fold (2.60.120.630) separately from that of PTSIIA (2.40.33.40). Duf371 forms an 8-stranded β-barrel from the intertwined β-strands of a tandem duplication (Fig. 14(a)). The duplicated structure elements can be superimposed (RMSD 1.3 Å), with a conserved His-containing motif from the N-terminal repeat overlapping a somewhat less conserved His-containing motif from the C-terminal repeat (Fig. 14(b)). Accordingly, PSI-BLAST [57] provides sequence evidence for this duplication, with both halves of the Duf371 query (PBD:3cbn, gi/169404770) confidently detecting the Methanocaldococcus fervens sequence Mefer0473 (3cbn[A:6-141] hits Mefer0473 with E-value 1e-30 in the first iteration, and 3cbn C-terminal range [A:77-142] hits with E-value 0.003 in second iteration). PTSIIA adopts a similar β-barrel topology as Duf371 and is noted in SCOP as consisting of two intertwined structural repeats (Fig. 14(c)). The overside connections between adjacent β-strands of the duplicated structure motifs in Duf371 and PTSIIA do not frequently appear in barrel architectures and distinguish the two folds.


ECOD: an evolutionary classification of protein domains.

Cheng H, Schaeffer RD, Liao Y, Kinch LN, Pei J, Shi S, Kim BH, Grishin NV - PLoS Comput. Biol. (2014)

ECOD recognizes novel evolutionary relationships.A) Duf371 (3cbn) forms an 8-stranded β-barrel from intertwined β-strands of a tandem structural duplication. The N-terminal half (blue shades) includes an overside connection between adjacent β-strands (blue) that follows a conserved His (black spheres). The symmetrically related C-terminal half (red shades) includes a similar overside connection (red) following a less conserved His (gray spheres). B) The Duf371 C-terminal repeat (salmon) is rotated about the Z-axis to superimpose (RMSD 1.3) with the N-terminal repeat (slate). C) The GutA-like PTS system IIA component (2f9h) forms a similar duplicated β-barrel. An invariant His in the C-terminal half likely represent the PTS IIA phosphorylation site. D) The PK β-barrel domain-like fold (1pkla1) displays a similar intertwined topology, but retains only a single overside connection (blue) in the N-terminal half. E) PSI-BLAST alignment of the Duf371 repeats detected with Mefer0473 sequence supports the duplication event, with sequence similarities indicated between N-terminal and C-terminal halves. A structure-based alignment of the 2F9H C-terminus is included below. Structural elements (arrow for strand and cylinder for helix) and conservations (calculated by Al2Co [59]) are indicated above/below the corresponding sequences. Conserved positions are highlighted yellow (mainly hydrophobic) and black (polar). Surface representations of F) PTSIIA in the same orientation as in panel C and G) Duf371 in the rotated orientation of panel B are colored in rainbow according to conservation, from blue (less) to red (more).
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4256011&req=5

pcbi-1003926-g014: ECOD recognizes novel evolutionary relationships.A) Duf371 (3cbn) forms an 8-stranded β-barrel from intertwined β-strands of a tandem structural duplication. The N-terminal half (blue shades) includes an overside connection between adjacent β-strands (blue) that follows a conserved His (black spheres). The symmetrically related C-terminal half (red shades) includes a similar overside connection (red) following a less conserved His (gray spheres). B) The Duf371 C-terminal repeat (salmon) is rotated about the Z-axis to superimpose (RMSD 1.3) with the N-terminal repeat (slate). C) The GutA-like PTS system IIA component (2f9h) forms a similar duplicated β-barrel. An invariant His in the C-terminal half likely represent the PTS IIA phosphorylation site. D) The PK β-barrel domain-like fold (1pkla1) displays a similar intertwined topology, but retains only a single overside connection (blue) in the N-terminal half. E) PSI-BLAST alignment of the Duf371 repeats detected with Mefer0473 sequence supports the duplication event, with sequence similarities indicated between N-terminal and C-terminal halves. A structure-based alignment of the 2F9H C-terminus is included below. Structural elements (arrow for strand and cylinder for helix) and conservations (calculated by Al2Co [59]) are indicated above/below the corresponding sequences. Conserved positions are highlighted yellow (mainly hydrophobic) and black (polar). Surface representations of F) PTSIIA in the same orientation as in panel C and G) Duf371 in the rotated orientation of panel B are colored in rainbow according to conservation, from blue (less) to red (more).
Mentions: ECOD establishes a previously unrecognized homologous link between a domain of unknown function (Duf371, PDB:3cbn) and the bacterial GutA-like PTS system glucitol/sorbitol-specific IIA component (PTSIIA, PDB:2f9h). While Duf371 is absent in SCOP, CATH classifies its fold (2.60.120.630) separately from that of PTSIIA (2.40.33.40). Duf371 forms an 8-stranded β-barrel from the intertwined β-strands of a tandem duplication (Fig. 14(a)). The duplicated structure elements can be superimposed (RMSD 1.3 Å), with a conserved His-containing motif from the N-terminal repeat overlapping a somewhat less conserved His-containing motif from the C-terminal repeat (Fig. 14(b)). Accordingly, PSI-BLAST [57] provides sequence evidence for this duplication, with both halves of the Duf371 query (PBD:3cbn, gi/169404770) confidently detecting the Methanocaldococcus fervens sequence Mefer0473 (3cbn[A:6-141] hits Mefer0473 with E-value 1e-30 in the first iteration, and 3cbn C-terminal range [A:77-142] hits with E-value 0.003 in second iteration). PTSIIA adopts a similar β-barrel topology as Duf371 and is noted in SCOP as consisting of two intertwined structural repeats (Fig. 14(c)). The overside connections between adjacent β-strands of the duplicated structure motifs in Duf371 and PTSIIA do not frequently appear in barrel architectures and distinguish the two folds.

Bottom Line: The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates.This synchronization with PDB uniquely distinguishes ECOD among all protein classifications.Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies.

View Article: PubMed Central - PubMed

Affiliation: Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America.

ABSTRACT
Understanding the evolution of a protein, including both close and distant relationships, often reveals insight into its structure and function. Fast and easy access to such up-to-date information facilitates research. We have developed a hierarchical evolutionary classification of all proteins with experimentally determined spatial structures, and presented it as an interactive and updatable online database. ECOD (Evolutionary Classification of protein Domains) is distinct from other structural classifications in that it groups domains primarily by evolutionary relationships (homology), rather than topology (or "fold"). This distinction highlights cases of homology between domains of differing topology to aid in understanding of protein structure evolution. ECOD uniquely emphasizes distantly related homologs that are difficult to detect, and thus catalogs the largest number of evolutionary links among structural domain classifications. Placing distant homologs together underscores the ancestral similarities of these proteins and draws attention to the most important regions of sequence and structure, as well as conserved functional sites. ECOD also recognizes closer sequence-based relationships between protein domains. Currently, approximately 100,000 protein structures are classified in ECOD into 9,000 sequence families clustered into close to 2,000 evolutionary groups. The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates. This synchronization with PDB uniquely distinguishes ECOD among all protein classifications. Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies.

Show MeSH