Limits...
A PDB-wide, evolution-based assessment of protein-protein interfaces.

Baskaran K, Duarte JM, Biyani N, Bliven S, Capitani G - BMC Struct. Biol. (2014)

Bottom Line: An automated computational pipeline was developed to run our Evolutionary Protein-Protein Interface Classifier (EPPIC) software on the entire PDB and store the results in a relational database, currently containing > 800,000 interfaces.By comparing our safest predictions to the PDB author annotations, we provide a lower-bound estimate of the error rate of biological unit annotations in the PDB.These tools enable the comprehensive study of several aspects of protein-protein contacts in the PDB and represent a basis for future, even larger scale studies of protein-protein interactions.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: Thanks to the growth in sequence and structure databases, more than 50 million sequences are now available in UniProt and 100,000 structures in the PDB. Rich information about protein-protein interfaces can be obtained by a comprehensive study of protein contacts in the PDB, their sequence conservation and geometric features.

Results: An automated computational pipeline was developed to run our Evolutionary Protein-Protein Interface Classifier (EPPIC) software on the entire PDB and store the results in a relational database, currently containing > 800,000 interfaces. This allows the analysis of interface data on a PDB-wide scale. Two large benchmark datasets of biological interfaces and crystal contacts, each containing about 3000 entries, were automatically generated based on criteria thought to be strong indicators of interface type. The BioMany set of biological interfaces includes NMR dimers solved as crystal structures and interfaces that are preserved across diverse crystal forms, as catalogued by the Protein Common Interface Database (ProtCID) from Xu and Dunbrack. The second dataset, XtalMany, is derived from interfaces that would lead to infinite assemblies and are therefore crystal contacts. BioMany and XtalMany were used to benchmark the EPPIC approach. The performance of EPPIC was also compared to classifications from the Protein Interfaces, Surfaces, and Assemblies (PISA) program on a PDB-wide scale, finding that the two approaches give the same call in about 88% of PDB interfaces. By comparing our safest predictions to the PDB author annotations, we provide a lower-bound estimate of the error rate of biological unit annotations in the PDB. Additionally, we developed a PyMOL plugin for direct download and easy visualization of EPPIC interfaces for any PDB entry. Both the datasets and the PyMOL plugin are available at http://www.eppic-web.org/ewui/\#downloads.

Conclusions: Our computational pipeline allows us to analyze protein-protein contacts and their sequence conservation across the entire PDB. Two new benchmark datasets are provided, which are over an order of magnitude larger than existing manually curated ones. These tools enable the comprehensive study of several aspects of protein-protein contacts in the PDB and represent a basis for future, even larger scale studies of protein-protein interactions.

Show MeSH

Related in: MedlinePlus

Interface classification as a function of operator type. The green portions of the bars represent interfaces classified as bio, the red ones interfaces classified as xtal. Operators are denoted as follows, from left to right: 2S, two-fold screw axis; AU, non-crystallographic symmetry; XT, crystal cell translation; 2, two-fold axis; 3S, three-fold screw axis; 4S, four-fold screw axis; 3, three-fold axis; FT, fractional translation; 6S, six-fold screw axis; 4, four-fold axis; 6, six-fold axis; -1, inversion center; -4, four-fold rotoinversion axis; GL, glide plane.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4274722&req=5

Figure 9: Interface classification as a function of operator type. The green portions of the bars represent interfaces classified as bio, the red ones interfaces classified as xtal. Operators are denoted as follows, from left to right: 2S, two-fold screw axis; AU, non-crystallographic symmetry; XT, crystal cell translation; 2, two-fold axis; 3S, three-fold screw axis; 4S, four-fold screw axis; 3, three-fold axis; FT, fractional translation; 6S, six-fold screw axis; 4, four-fold axis; 6, six-fold axis; -1, inversion center; -4, four-fold rotoinversion axis; GL, glide plane.

Mentions: Since biological multimers can be mediated by non-crystallographic symmetry or by different crystallographic operators, we analyzed the results of interface classification as a function of operator type. The results, depicted in Figure 9, show a difference in the occurrence of biological contacts in the asymmetric unit (i.e. mediated by non-crystallographic symmetry operators), as compared to those via crystal operators. Among the former, more than one-third are biological contacts (37.3%), while contacts through crystal operators were much less likely to be biological. More specifically, 13.4% of the contacts via a pure two-fold crystallographic axis are classified as bio, 19.8% for pure three-folds, 25.2% for pure four-folds and 12.6% for pure six-folds. Only 1% for two-fold screw and three-fold screw axis operators were predicted to be biological, and other types of operators were negligible. The above findings provide information that can usefully be applied in interface classification. The higher percentage of biological contacts mediated by non-crystallographic symmetry may be ascribed to several factors, the most obvious of which are the intrinsic conformational heterogeneity of dimeric assemblies and common practice in the choice of asymmetric unit in PDB entries. In addition, authors may have chosen a lower symmetry space group than allowed by the symmetry of diffraction data, thereby substituting crystallographic operators with non-crystallographic ones. Thus, a dimer mediated by a crystallographic two-fold axis, with a monomer per asymmetric unit, would become a non-crystallographic symmetry dimer.


A PDB-wide, evolution-based assessment of protein-protein interfaces.

Baskaran K, Duarte JM, Biyani N, Bliven S, Capitani G - BMC Struct. Biol. (2014)

Interface classification as a function of operator type. The green portions of the bars represent interfaces classified as bio, the red ones interfaces classified as xtal. Operators are denoted as follows, from left to right: 2S, two-fold screw axis; AU, non-crystallographic symmetry; XT, crystal cell translation; 2, two-fold axis; 3S, three-fold screw axis; 4S, four-fold screw axis; 3, three-fold axis; FT, fractional translation; 6S, six-fold screw axis; 4, four-fold axis; 6, six-fold axis; -1, inversion center; -4, four-fold rotoinversion axis; GL, glide plane.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4274722&req=5

Figure 9: Interface classification as a function of operator type. The green portions of the bars represent interfaces classified as bio, the red ones interfaces classified as xtal. Operators are denoted as follows, from left to right: 2S, two-fold screw axis; AU, non-crystallographic symmetry; XT, crystal cell translation; 2, two-fold axis; 3S, three-fold screw axis; 4S, four-fold screw axis; 3, three-fold axis; FT, fractional translation; 6S, six-fold screw axis; 4, four-fold axis; 6, six-fold axis; -1, inversion center; -4, four-fold rotoinversion axis; GL, glide plane.
Mentions: Since biological multimers can be mediated by non-crystallographic symmetry or by different crystallographic operators, we analyzed the results of interface classification as a function of operator type. The results, depicted in Figure 9, show a difference in the occurrence of biological contacts in the asymmetric unit (i.e. mediated by non-crystallographic symmetry operators), as compared to those via crystal operators. Among the former, more than one-third are biological contacts (37.3%), while contacts through crystal operators were much less likely to be biological. More specifically, 13.4% of the contacts via a pure two-fold crystallographic axis are classified as bio, 19.8% for pure three-folds, 25.2% for pure four-folds and 12.6% for pure six-folds. Only 1% for two-fold screw and three-fold screw axis operators were predicted to be biological, and other types of operators were negligible. The above findings provide information that can usefully be applied in interface classification. The higher percentage of biological contacts mediated by non-crystallographic symmetry may be ascribed to several factors, the most obvious of which are the intrinsic conformational heterogeneity of dimeric assemblies and common practice in the choice of asymmetric unit in PDB entries. In addition, authors may have chosen a lower symmetry space group than allowed by the symmetry of diffraction data, thereby substituting crystallographic operators with non-crystallographic ones. Thus, a dimer mediated by a crystallographic two-fold axis, with a monomer per asymmetric unit, would become a non-crystallographic symmetry dimer.

Bottom Line: An automated computational pipeline was developed to run our Evolutionary Protein-Protein Interface Classifier (EPPIC) software on the entire PDB and store the results in a relational database, currently containing > 800,000 interfaces.By comparing our safest predictions to the PDB author annotations, we provide a lower-bound estimate of the error rate of biological unit annotations in the PDB.These tools enable the comprehensive study of several aspects of protein-protein contacts in the PDB and represent a basis for future, even larger scale studies of protein-protein interactions.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: Thanks to the growth in sequence and structure databases, more than 50 million sequences are now available in UniProt and 100,000 structures in the PDB. Rich information about protein-protein interfaces can be obtained by a comprehensive study of protein contacts in the PDB, their sequence conservation and geometric features.

Results: An automated computational pipeline was developed to run our Evolutionary Protein-Protein Interface Classifier (EPPIC) software on the entire PDB and store the results in a relational database, currently containing > 800,000 interfaces. This allows the analysis of interface data on a PDB-wide scale. Two large benchmark datasets of biological interfaces and crystal contacts, each containing about 3000 entries, were automatically generated based on criteria thought to be strong indicators of interface type. The BioMany set of biological interfaces includes NMR dimers solved as crystal structures and interfaces that are preserved across diverse crystal forms, as catalogued by the Protein Common Interface Database (ProtCID) from Xu and Dunbrack. The second dataset, XtalMany, is derived from interfaces that would lead to infinite assemblies and are therefore crystal contacts. BioMany and XtalMany were used to benchmark the EPPIC approach. The performance of EPPIC was also compared to classifications from the Protein Interfaces, Surfaces, and Assemblies (PISA) program on a PDB-wide scale, finding that the two approaches give the same call in about 88% of PDB interfaces. By comparing our safest predictions to the PDB author annotations, we provide a lower-bound estimate of the error rate of biological unit annotations in the PDB. Additionally, we developed a PyMOL plugin for direct download and easy visualization of EPPIC interfaces for any PDB entry. Both the datasets and the PyMOL plugin are available at http://www.eppic-web.org/ewui/\#downloads.

Conclusions: Our computational pipeline allows us to analyze protein-protein contacts and their sequence conservation across the entire PDB. Two new benchmark datasets are provided, which are over an order of magnitude larger than existing manually curated ones. These tools enable the comprehensive study of several aspects of protein-protein contacts in the PDB and represent a basis for future, even larger scale studies of protein-protein interactions.

Show MeSH
Related in: MedlinePlus