Limits...
3D complex: a structural classification of protein complexes.

Levy ED, Pereira-Leal JB, Chothia C, Teichmann SA - PLoS Comput. Biol. (2006)

Bottom Line: We also analyse the structures in terms of the topological arrangement of their subunits and find that they form a small number of arrangements compared with all theoretically possible ones.This is because most complexes contain four subunits or less, and the large majority are homomeric.In addition, there is a strong tendency for symmetry in complexes, even for heteromeric complexes.

View Article: PubMed Central - PubMed

Affiliation: Medical Research Council Laboratory of Molecular Biology, Cambridge, United Kingdom. elevy@mrc-lmb.cam.ac.uk

ABSTRACT
Most of the proteins in a cell assemble into complexes to carry out their function. It is therefore crucial to understand the physicochemical properties as well as the evolution of interactions between proteins. The Protein Data Bank represents an important source of information for such studies, because more than half of the structures are homo- or heteromeric protein complexes. Here we propose the first hierarchical classification of whole protein complexes of known 3-D structure, based on representing their fundamental structural features as a graph. This classification provides the first overview of all the complexes in the Protein Data Bank and allows nonredundant sets to be derived at different levels of detail. This reveals that between one-half and two-thirds of known structures are multimeric, depending on the level of redundancy accepted. We also analyse the structures in terms of the topological arrangement of their subunits and find that they form a small number of arrangements compared with all theoretically possible ones. This is because most complexes contain four subunits or less, and the large majority are homomeric. In addition, there is a strong tendency for symmetry in complexes, even for heteromeric complexes. Finally, through comparison of Biological Units in the Protein Data Bank with the Protein Quaternary Structure database, we identified many possible errors in quaternary structure assignments. Our classification, available as a database and Web server at http://www.3Dcomplex.org, will be a starting point for future work aimed at understanding the structure and evolution of protein complexes.

Show MeSH
A Hierarchy of Protein Complexes of Known Three-Dimensional StructureThe hierarchy has 12 levels, namely, from top to bottom: QS topology, QS family, QS, QS20, QS30…QS100. At the top of the hierarchy, there are 192 QS topologies. One particular QS topology (orange circle) with four subunits is expanded below. It comprises 161 QS families in total, of which two are detailed: the E. coli lyase and the H. sapiens hemoglobin γ4. All complexes in the E. coli lyase QS family are encoded by a single gene and therefore correspond to a single QS. However, the hemoglobin QS Family contains two QSs: one with a single gene, the hemoglobin γ4, and one with two genes, the hemoglobin α2β2 from H. sapiens. The last level in the hierarchy indicates the number of structures found in the complete set (PDB). There are 30 redundant complexes corresponding to the lyase QS, four corresponding to the hemoglobin γ4 QS, and 80 to the hemoglobin α2β2 QS. We also see that there are 9,978 monomers, 6,803 dimers, 814 triangular trimers, etc. Note that there are intermediate levels using sequence identity thresholds (fourth to twelfth level) between the QS level and the complete set, which are not shown in detail here.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1636673&req=5

pcbi-0020155-g001: A Hierarchy of Protein Complexes of Known Three-Dimensional StructureThe hierarchy has 12 levels, namely, from top to bottom: QS topology, QS family, QS, QS20, QS30…QS100. At the top of the hierarchy, there are 192 QS topologies. One particular QS topology (orange circle) with four subunits is expanded below. It comprises 161 QS families in total, of which two are detailed: the E. coli lyase and the H. sapiens hemoglobin γ4. All complexes in the E. coli lyase QS family are encoded by a single gene and therefore correspond to a single QS. However, the hemoglobin QS Family contains two QSs: one with a single gene, the hemoglobin γ4, and one with two genes, the hemoglobin α2β2 from H. sapiens. The last level in the hierarchy indicates the number of structures found in the complete set (PDB). There are 30 redundant complexes corresponding to the lyase QS, four corresponding to the hemoglobin γ4 QS, and 80 to the hemoglobin α2β2 QS. We also see that there are 9,978 monomers, 6,803 dimers, 814 triangular trimers, etc. Note that there are intermediate levels using sequence identity thresholds (fourth to twelfth level) between the QS level and the complete set, which are not shown in detail here.

Mentions: Our structural classification of whole protein complexes (Figure 1) includes a novel strategy of visualization and comparison of complexes (Figure 2). We use a simplified graph representation of each complex, in which each polypeptide chain is a node in the graph, and chains with an interface are connected by edges. We compare complexes with a customized graph-matching procedure that takes into account the topology of the graph, which represents the pattern of chain–chain interfaces, as well as the structure and sequence similarity between the constituent chains. We use these properties to generate a hierarchical classification of protein complexes. It provides a nonredundant set of protein complexes that can be used to derive statistics in an unbiased manner. We illustrate this by drawing on different levels of the classification to address questions related to the topology, the symmetry, and the evolution of protein complexes.


3D complex: a structural classification of protein complexes.

Levy ED, Pereira-Leal JB, Chothia C, Teichmann SA - PLoS Comput. Biol. (2006)

A Hierarchy of Protein Complexes of Known Three-Dimensional StructureThe hierarchy has 12 levels, namely, from top to bottom: QS topology, QS family, QS, QS20, QS30…QS100. At the top of the hierarchy, there are 192 QS topologies. One particular QS topology (orange circle) with four subunits is expanded below. It comprises 161 QS families in total, of which two are detailed: the E. coli lyase and the H. sapiens hemoglobin γ4. All complexes in the E. coli lyase QS family are encoded by a single gene and therefore correspond to a single QS. However, the hemoglobin QS Family contains two QSs: one with a single gene, the hemoglobin γ4, and one with two genes, the hemoglobin α2β2 from H. sapiens. The last level in the hierarchy indicates the number of structures found in the complete set (PDB). There are 30 redundant complexes corresponding to the lyase QS, four corresponding to the hemoglobin γ4 QS, and 80 to the hemoglobin α2β2 QS. We also see that there are 9,978 monomers, 6,803 dimers, 814 triangular trimers, etc. Note that there are intermediate levels using sequence identity thresholds (fourth to twelfth level) between the QS level and the complete set, which are not shown in detail here.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1636673&req=5

pcbi-0020155-g001: A Hierarchy of Protein Complexes of Known Three-Dimensional StructureThe hierarchy has 12 levels, namely, from top to bottom: QS topology, QS family, QS, QS20, QS30…QS100. At the top of the hierarchy, there are 192 QS topologies. One particular QS topology (orange circle) with four subunits is expanded below. It comprises 161 QS families in total, of which two are detailed: the E. coli lyase and the H. sapiens hemoglobin γ4. All complexes in the E. coli lyase QS family are encoded by a single gene and therefore correspond to a single QS. However, the hemoglobin QS Family contains two QSs: one with a single gene, the hemoglobin γ4, and one with two genes, the hemoglobin α2β2 from H. sapiens. The last level in the hierarchy indicates the number of structures found in the complete set (PDB). There are 30 redundant complexes corresponding to the lyase QS, four corresponding to the hemoglobin γ4 QS, and 80 to the hemoglobin α2β2 QS. We also see that there are 9,978 monomers, 6,803 dimers, 814 triangular trimers, etc. Note that there are intermediate levels using sequence identity thresholds (fourth to twelfth level) between the QS level and the complete set, which are not shown in detail here.
Mentions: Our structural classification of whole protein complexes (Figure 1) includes a novel strategy of visualization and comparison of complexes (Figure 2). We use a simplified graph representation of each complex, in which each polypeptide chain is a node in the graph, and chains with an interface are connected by edges. We compare complexes with a customized graph-matching procedure that takes into account the topology of the graph, which represents the pattern of chain–chain interfaces, as well as the structure and sequence similarity between the constituent chains. We use these properties to generate a hierarchical classification of protein complexes. It provides a nonredundant set of protein complexes that can be used to derive statistics in an unbiased manner. We illustrate this by drawing on different levels of the classification to address questions related to the topology, the symmetry, and the evolution of protein complexes.

Bottom Line: We also analyse the structures in terms of the topological arrangement of their subunits and find that they form a small number of arrangements compared with all theoretically possible ones.This is because most complexes contain four subunits or less, and the large majority are homomeric.In addition, there is a strong tendency for symmetry in complexes, even for heteromeric complexes.

View Article: PubMed Central - PubMed

Affiliation: Medical Research Council Laboratory of Molecular Biology, Cambridge, United Kingdom. elevy@mrc-lmb.cam.ac.uk

ABSTRACT
Most of the proteins in a cell assemble into complexes to carry out their function. It is therefore crucial to understand the physicochemical properties as well as the evolution of interactions between proteins. The Protein Data Bank represents an important source of information for such studies, because more than half of the structures are homo- or heteromeric protein complexes. Here we propose the first hierarchical classification of whole protein complexes of known 3-D structure, based on representing their fundamental structural features as a graph. This classification provides the first overview of all the complexes in the Protein Data Bank and allows nonredundant sets to be derived at different levels of detail. This reveals that between one-half and two-thirds of known structures are multimeric, depending on the level of redundancy accepted. We also analyse the structures in terms of the topological arrangement of their subunits and find that they form a small number of arrangements compared with all theoretically possible ones. This is because most complexes contain four subunits or less, and the large majority are homomeric. In addition, there is a strong tendency for symmetry in complexes, even for heteromeric complexes. Finally, through comparison of Biological Units in the Protein Data Bank with the Protein Quaternary Structure database, we identified many possible errors in quaternary structure assignments. Our classification, available as a database and Web server at http://www.3Dcomplex.org, will be a starting point for future work aimed at understanding the structure and evolution of protein complexes.

Show MeSH