Limits...
Protein-protein interaction site predictions with three-dimensional probability distributions of interacting atoms on protein surfaces.

Chen CT, Peng HP, Jian JW, Tsai KC, Chang JY, Yang EW, Chen JB, Ho SY, Hsu WL, Yang AS - PLoS ONE (2012)

Bottom Line: The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces.In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence.The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors.

View Article: PubMed Central - PubMed

Affiliation: Genomics Research Center, Academia Sinica, Taipei, Taiwan.

ABSTRACT
Protein-protein interactions are key to many biological processes. Computational methodologies devised to predict protein-protein interaction (PPI) sites on protein surfaces are important tools in providing insights into the biological functions of proteins and in developing therapeutics targeting the protein-protein interaction sites. One of the general features of PPI sites is that the core regions from the two interacting protein surfaces are complementary to each other, similar to the interior of proteins in packing density and in the physicochemical nature of the amino acid composition. In this work, we simulated the physicochemical complementarities by constructing three-dimensional probability density maps of non-covalent interacting atoms on the protein surfaces. The interacting probabilities were derived from the interior of known structures. Machine learning algorithms were applied to learn the characteristic patterns of the probability density maps specific to the PPI sites. The trained predictors for PPI sites were cross-validated with the training cases (consisting of 432 proteins) and were tested on an independent dataset (consisting of 142 proteins). The residue-based Matthews correlation coefficient for the independent test set was 0.423; the accuracy, precision, sensitivity, specificity were 0.753, 0.519, 0.677, and 0.779 respectively. The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces. In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence. The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors.

Show MeSH

Related in: MedlinePlus

Visualization of prediction results for example protein targets with different prediction accuracy.Panels (A) to (D) demonstrate four proteins with two-class prediction MCC of 0.650, 0.454, 0.262, and 0.107, respectively. The target proteins were selected from the S142 dataset. The predictions were carried out with the best ANN_BAGGING model from the 5-fold cross validation on the S432 dataset. In each panel, the left structure shows the atom-based positive prediction confidence level from blue (confidence level of 0) to red (confidence level 1) for each of the surface atoms. The middle structure shows the residue-based predictions. The atoms colored in red were predicted with confidence level greater than 0.6; atoms in orange are the atoms belonging to the residues in the residue-based PPI site prediction but the prediction confidence levels are less than 0.6. The right-hand-side structure shows the actual PPI sites: the PPI surface atoms are colored according to dSASA (see Equation (4)) from blue (dSASA of 0 for atoms not involving in PPI) to red (dSASA of 1 for atoms completely buried in the protein complex). The color-codes are shown at the top of the figure. Atoms not used in prediction (colored in yellow) belong to residues with incomplete phi and psi angles, as in the N-termini or C-termini of proteins. The non-surface atoms are colored in gray. The complete prediction results can also be viewed in color-coded 3-D protein structures from the web server http://ismblab.genomics.sinica.edu.tw/> benchmark >protein-protein.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3368894&req=5

pone-0037706-g003: Visualization of prediction results for example protein targets with different prediction accuracy.Panels (A) to (D) demonstrate four proteins with two-class prediction MCC of 0.650, 0.454, 0.262, and 0.107, respectively. The target proteins were selected from the S142 dataset. The predictions were carried out with the best ANN_BAGGING model from the 5-fold cross validation on the S432 dataset. In each panel, the left structure shows the atom-based positive prediction confidence level from blue (confidence level of 0) to red (confidence level 1) for each of the surface atoms. The middle structure shows the residue-based predictions. The atoms colored in red were predicted with confidence level greater than 0.6; atoms in orange are the atoms belonging to the residues in the residue-based PPI site prediction but the prediction confidence levels are less than 0.6. The right-hand-side structure shows the actual PPI sites: the PPI surface atoms are colored according to dSASA (see Equation (4)) from blue (dSASA of 0 for atoms not involving in PPI) to red (dSASA of 1 for atoms completely buried in the protein complex). The color-codes are shown at the top of the figure. Atoms not used in prediction (colored in yellow) belong to residues with incomplete phi and psi angles, as in the N-termini or C-termini of proteins. The non-surface atoms are colored in gray. The complete prediction results can also be viewed in color-coded 3-D protein structures from the web server http://ismblab.genomics.sinica.edu.tw/> benchmark >protein-protein.

Mentions: The PPI surface patches on protein surfaces were predicted by combining the machine learning predictions for each of the surface atoms. The activity (probability) outputs from the machine learning models were first converted into prediction confidence levels so that surface atoms with high confidence level predictions can be clustered into surface patches as PPI sites (see Methods). Figure 3 shows a few examples of protein surface PPI site predictions, compared side-by-side with actual PPI sites, with various prediction accuracies (residue-based MCC ranging from 0.7 to 0.1). The complete set of prediction results on the proteins from the training and test sets can be viewed with interactive 3-D structural presentation from the web server http://ismblab.genomics.sinica.edu.tw/> benchmark >protein-protein.


Protein-protein interaction site predictions with three-dimensional probability distributions of interacting atoms on protein surfaces.

Chen CT, Peng HP, Jian JW, Tsai KC, Chang JY, Yang EW, Chen JB, Ho SY, Hsu WL, Yang AS - PLoS ONE (2012)

Visualization of prediction results for example protein targets with different prediction accuracy.Panels (A) to (D) demonstrate four proteins with two-class prediction MCC of 0.650, 0.454, 0.262, and 0.107, respectively. The target proteins were selected from the S142 dataset. The predictions were carried out with the best ANN_BAGGING model from the 5-fold cross validation on the S432 dataset. In each panel, the left structure shows the atom-based positive prediction confidence level from blue (confidence level of 0) to red (confidence level 1) for each of the surface atoms. The middle structure shows the residue-based predictions. The atoms colored in red were predicted with confidence level greater than 0.6; atoms in orange are the atoms belonging to the residues in the residue-based PPI site prediction but the prediction confidence levels are less than 0.6. The right-hand-side structure shows the actual PPI sites: the PPI surface atoms are colored according to dSASA (see Equation (4)) from blue (dSASA of 0 for atoms not involving in PPI) to red (dSASA of 1 for atoms completely buried in the protein complex). The color-codes are shown at the top of the figure. Atoms not used in prediction (colored in yellow) belong to residues with incomplete phi and psi angles, as in the N-termini or C-termini of proteins. The non-surface atoms are colored in gray. The complete prediction results can also be viewed in color-coded 3-D protein structures from the web server http://ismblab.genomics.sinica.edu.tw/> benchmark >protein-protein.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3368894&req=5

pone-0037706-g003: Visualization of prediction results for example protein targets with different prediction accuracy.Panels (A) to (D) demonstrate four proteins with two-class prediction MCC of 0.650, 0.454, 0.262, and 0.107, respectively. The target proteins were selected from the S142 dataset. The predictions were carried out with the best ANN_BAGGING model from the 5-fold cross validation on the S432 dataset. In each panel, the left structure shows the atom-based positive prediction confidence level from blue (confidence level of 0) to red (confidence level 1) for each of the surface atoms. The middle structure shows the residue-based predictions. The atoms colored in red were predicted with confidence level greater than 0.6; atoms in orange are the atoms belonging to the residues in the residue-based PPI site prediction but the prediction confidence levels are less than 0.6. The right-hand-side structure shows the actual PPI sites: the PPI surface atoms are colored according to dSASA (see Equation (4)) from blue (dSASA of 0 for atoms not involving in PPI) to red (dSASA of 1 for atoms completely buried in the protein complex). The color-codes are shown at the top of the figure. Atoms not used in prediction (colored in yellow) belong to residues with incomplete phi and psi angles, as in the N-termini or C-termini of proteins. The non-surface atoms are colored in gray. The complete prediction results can also be viewed in color-coded 3-D protein structures from the web server http://ismblab.genomics.sinica.edu.tw/> benchmark >protein-protein.
Mentions: The PPI surface patches on protein surfaces were predicted by combining the machine learning predictions for each of the surface atoms. The activity (probability) outputs from the machine learning models were first converted into prediction confidence levels so that surface atoms with high confidence level predictions can be clustered into surface patches as PPI sites (see Methods). Figure 3 shows a few examples of protein surface PPI site predictions, compared side-by-side with actual PPI sites, with various prediction accuracies (residue-based MCC ranging from 0.7 to 0.1). The complete set of prediction results on the proteins from the training and test sets can be viewed with interactive 3-D structural presentation from the web server http://ismblab.genomics.sinica.edu.tw/> benchmark >protein-protein.

Bottom Line: The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces.In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence.The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors.

View Article: PubMed Central - PubMed

Affiliation: Genomics Research Center, Academia Sinica, Taipei, Taiwan.

ABSTRACT
Protein-protein interactions are key to many biological processes. Computational methodologies devised to predict protein-protein interaction (PPI) sites on protein surfaces are important tools in providing insights into the biological functions of proteins and in developing therapeutics targeting the protein-protein interaction sites. One of the general features of PPI sites is that the core regions from the two interacting protein surfaces are complementary to each other, similar to the interior of proteins in packing density and in the physicochemical nature of the amino acid composition. In this work, we simulated the physicochemical complementarities by constructing three-dimensional probability density maps of non-covalent interacting atoms on the protein surfaces. The interacting probabilities were derived from the interior of known structures. Machine learning algorithms were applied to learn the characteristic patterns of the probability density maps specific to the PPI sites. The trained predictors for PPI sites were cross-validated with the training cases (consisting of 432 proteins) and were tested on an independent dataset (consisting of 142 proteins). The residue-based Matthews correlation coefficient for the independent test set was 0.423; the accuracy, precision, sensitivity, specificity were 0.753, 0.519, 0.677, and 0.779 respectively. The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces. In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence. The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors.

Show MeSH
Related in: MedlinePlus