Limits...
ECOD: an evolutionary classification of protein domains.

Cheng H, Schaeffer RD, Liao Y, Kinch LN, Pei J, Shi S, Kim BH, Grishin NV - PLoS Comput. Biol. (2014)

Bottom Line: The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates.This synchronization with PDB uniquely distinguishes ECOD among all protein classifications.Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies.

View Article: PubMed Central - PubMed

Affiliation: Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America.

ABSTRACT
Understanding the evolution of a protein, including both close and distant relationships, often reveals insight into its structure and function. Fast and easy access to such up-to-date information facilitates research. We have developed a hierarchical evolutionary classification of all proteins with experimentally determined spatial structures, and presented it as an interactive and updatable online database. ECOD (Evolutionary Classification of protein Domains) is distinct from other structural classifications in that it groups domains primarily by evolutionary relationships (homology), rather than topology (or "fold"). This distinction highlights cases of homology between domains of differing topology to aid in understanding of protein structure evolution. ECOD uniquely emphasizes distantly related homologs that are difficult to detect, and thus catalogs the largest number of evolutionary links among structural domain classifications. Placing distant homologs together underscores the ancestral similarities of these proteins and draws attention to the most important regions of sequence and structure, as well as conserved functional sites. ECOD also recognizes closer sequence-based relationships between protein domains. Currently, approximately 100,000 protein structures are classified in ECOD into 9,000 sequence families clustered into close to 2,000 evolutionary groups. The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates. This synchronization with PDB uniquely distinguishes ECOD among all protein classifications. Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies.

Show MeSH
SAM MTases and Rossmann domains.(A) SAM MTases as represented by ribosomal protein L11 methyltransferase complexed with SAM (PDB 2nxe). (B) Rossmann domains as represented by formaldehyde dehydrogenase complexed with NAD (PDB 1kol). In (A) and (B), helices are colored in cyan, strands in yellow, and loops in white. The additional strand 7 in SAM-MTase is colored in orange. The respective cofactor, SAM or NAD, is shown in sticks. The Gly-rich loop beneath the cofactor is colored in magenta. The conserved Asp or Glu that forms hydrogen bonds with the adenosine ribose hydroxyls is shown in sticks. Diagrams are made by Pymol (The PyMOL Molecular Graphics System, Schrödinger, LLC. http://www.pymol.org/). (C) Manually modified DALI [22] alignment between the two domains shown in (A) and (B). Starting and ending residue numbers are labeled before and after the alignment. β-strands and α-helices are labeled numerically and shown in arrows and cylinders respectively above the sequence alignment. The Gly-rich loop is highlighted in magenta, and the conserved Asp or Glu is highlighted in red.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4256011&req=5

pcbi-1003926-g012: SAM MTases and Rossmann domains.(A) SAM MTases as represented by ribosomal protein L11 methyltransferase complexed with SAM (PDB 2nxe). (B) Rossmann domains as represented by formaldehyde dehydrogenase complexed with NAD (PDB 1kol). In (A) and (B), helices are colored in cyan, strands in yellow, and loops in white. The additional strand 7 in SAM-MTase is colored in orange. The respective cofactor, SAM or NAD, is shown in sticks. The Gly-rich loop beneath the cofactor is colored in magenta. The conserved Asp or Glu that forms hydrogen bonds with the adenosine ribose hydroxyls is shown in sticks. Diagrams are made by Pymol (The PyMOL Molecular Graphics System, Schrödinger, LLC. http://www.pymol.org/). (C) Manually modified DALI [22] alignment between the two domains shown in (A) and (B). Starting and ending residue numbers are labeled before and after the alignment. β-strands and α-helices are labeled numerically and shown in arrows and cylinders respectively above the sequence alignment. The Gly-rich loop is highlighted in magenta, and the conserved Asp or Glu is highlighted in red.

Mentions: ECOD contains many homologous links that are not recorded in other classification databases. One example is the relationship between S-adenosyl-L-methionine-dependent methyltransferases (SAM MTases) and NAD(P)-binding Rossmann-fold domains (Rossmann domains). SAM MTases methylate a wide range of substrates using the methyl group donated by the cofactor SAM, which is comprised of an adenosine nucleoside and a methionine amino acid joined together. Rossmann domains are found in many oxidoreductases that transfer electrons between substrates and the cofactor NAD(P), which is comprised of a nicotinamide nucleotide and an adenine nucleotide joined together. Thus, SAM and NAD(P) share the adenosine part but differ in the other half, and the two enzyme superfamilies exploit the dissimilar parts of the cofactors to catalyze different reactions [41], [42], [43]. SAM MTases have a consensus structure of a 7-stranded β-sheet sandwiched between connecting α-helices (strand order 3214576 with strand 7 antiparallel to the other six strands, Fig. 12(a)) [44]. Rossmann domains have a consensus structure of a parallel 6-stranded β-sheet sandwiched between connecting α-helices (strand order 321456, Fig. 12 (b)) [45]. Thus, the SAM MTase structure can be viewed as Rossmann domain structure with a strand invasion: the additional strand 7 is inserted into the β-sheet between strands 5 and 6.


ECOD: an evolutionary classification of protein domains.

Cheng H, Schaeffer RD, Liao Y, Kinch LN, Pei J, Shi S, Kim BH, Grishin NV - PLoS Comput. Biol. (2014)

SAM MTases and Rossmann domains.(A) SAM MTases as represented by ribosomal protein L11 methyltransferase complexed with SAM (PDB 2nxe). (B) Rossmann domains as represented by formaldehyde dehydrogenase complexed with NAD (PDB 1kol). In (A) and (B), helices are colored in cyan, strands in yellow, and loops in white. The additional strand 7 in SAM-MTase is colored in orange. The respective cofactor, SAM or NAD, is shown in sticks. The Gly-rich loop beneath the cofactor is colored in magenta. The conserved Asp or Glu that forms hydrogen bonds with the adenosine ribose hydroxyls is shown in sticks. Diagrams are made by Pymol (The PyMOL Molecular Graphics System, Schrödinger, LLC. http://www.pymol.org/). (C) Manually modified DALI [22] alignment between the two domains shown in (A) and (B). Starting and ending residue numbers are labeled before and after the alignment. β-strands and α-helices are labeled numerically and shown in arrows and cylinders respectively above the sequence alignment. The Gly-rich loop is highlighted in magenta, and the conserved Asp or Glu is highlighted in red.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4256011&req=5

pcbi-1003926-g012: SAM MTases and Rossmann domains.(A) SAM MTases as represented by ribosomal protein L11 methyltransferase complexed with SAM (PDB 2nxe). (B) Rossmann domains as represented by formaldehyde dehydrogenase complexed with NAD (PDB 1kol). In (A) and (B), helices are colored in cyan, strands in yellow, and loops in white. The additional strand 7 in SAM-MTase is colored in orange. The respective cofactor, SAM or NAD, is shown in sticks. The Gly-rich loop beneath the cofactor is colored in magenta. The conserved Asp or Glu that forms hydrogen bonds with the adenosine ribose hydroxyls is shown in sticks. Diagrams are made by Pymol (The PyMOL Molecular Graphics System, Schrödinger, LLC. http://www.pymol.org/). (C) Manually modified DALI [22] alignment between the two domains shown in (A) and (B). Starting and ending residue numbers are labeled before and after the alignment. β-strands and α-helices are labeled numerically and shown in arrows and cylinders respectively above the sequence alignment. The Gly-rich loop is highlighted in magenta, and the conserved Asp or Glu is highlighted in red.
Mentions: ECOD contains many homologous links that are not recorded in other classification databases. One example is the relationship between S-adenosyl-L-methionine-dependent methyltransferases (SAM MTases) and NAD(P)-binding Rossmann-fold domains (Rossmann domains). SAM MTases methylate a wide range of substrates using the methyl group donated by the cofactor SAM, which is comprised of an adenosine nucleoside and a methionine amino acid joined together. Rossmann domains are found in many oxidoreductases that transfer electrons between substrates and the cofactor NAD(P), which is comprised of a nicotinamide nucleotide and an adenine nucleotide joined together. Thus, SAM and NAD(P) share the adenosine part but differ in the other half, and the two enzyme superfamilies exploit the dissimilar parts of the cofactors to catalyze different reactions [41], [42], [43]. SAM MTases have a consensus structure of a 7-stranded β-sheet sandwiched between connecting α-helices (strand order 3214576 with strand 7 antiparallel to the other six strands, Fig. 12(a)) [44]. Rossmann domains have a consensus structure of a parallel 6-stranded β-sheet sandwiched between connecting α-helices (strand order 321456, Fig. 12 (b)) [45]. Thus, the SAM MTase structure can be viewed as Rossmann domain structure with a strand invasion: the additional strand 7 is inserted into the β-sheet between strands 5 and 6.

Bottom Line: The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates.This synchronization with PDB uniquely distinguishes ECOD among all protein classifications.Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies.

View Article: PubMed Central - PubMed

Affiliation: Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America.

ABSTRACT
Understanding the evolution of a protein, including both close and distant relationships, often reveals insight into its structure and function. Fast and easy access to such up-to-date information facilitates research. We have developed a hierarchical evolutionary classification of all proteins with experimentally determined spatial structures, and presented it as an interactive and updatable online database. ECOD (Evolutionary Classification of protein Domains) is distinct from other structural classifications in that it groups domains primarily by evolutionary relationships (homology), rather than topology (or "fold"). This distinction highlights cases of homology between domains of differing topology to aid in understanding of protein structure evolution. ECOD uniquely emphasizes distantly related homologs that are difficult to detect, and thus catalogs the largest number of evolutionary links among structural domain classifications. Placing distant homologs together underscores the ancestral similarities of these proteins and draws attention to the most important regions of sequence and structure, as well as conserved functional sites. ECOD also recognizes closer sequence-based relationships between protein domains. Currently, approximately 100,000 protein structures are classified in ECOD into 9,000 sequence families clustered into close to 2,000 evolutionary groups. The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates. This synchronization with PDB uniquely distinguishes ECOD among all protein classifications. Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies.

Show MeSH