Limits...
Protein-protein interaction site predictions with three-dimensional probability distributions of interacting atoms on protein surfaces.

Chen CT, Peng HP, Jian JW, Tsai KC, Chang JY, Yang EW, Chen JB, Ho SY, Hsu WL, Yang AS - PLoS ONE (2012)

Bottom Line: The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces.In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence.The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors.

View Article: PubMed Central - PubMed

Affiliation: Genomics Research Center, Academia Sinica, Taipei, Taiwan.

ABSTRACT
Protein-protein interactions are key to many biological processes. Computational methodologies devised to predict protein-protein interaction (PPI) sites on protein surfaces are important tools in providing insights into the biological functions of proteins and in developing therapeutics targeting the protein-protein interaction sites. One of the general features of PPI sites is that the core regions from the two interacting protein surfaces are complementary to each other, similar to the interior of proteins in packing density and in the physicochemical nature of the amino acid composition. In this work, we simulated the physicochemical complementarities by constructing three-dimensional probability density maps of non-covalent interacting atoms on the protein surfaces. The interacting probabilities were derived from the interior of known structures. Machine learning algorithms were applied to learn the characteristic patterns of the probability density maps specific to the PPI sites. The trained predictors for PPI sites were cross-validated with the training cases (consisting of 432 proteins) and were tested on an independent dataset (consisting of 142 proteins). The residue-based Matthews correlation coefficient for the independent test set was 0.423; the accuracy, precision, sensitivity, specificity were 0.753, 0.519, 0.677, and 0.779 respectively. The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces. In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence. The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors.

Show MeSH
Atom-based prediction accuracies for each of the 30 protein atom types.The x-axis represents indexes for the 30 atom types shown in Table 1. The y-axis shows averaged two-class prediction MCCs from the 5-fold cross validation of the ANN_BAGGING and SVM_BAGGING predictors trained and tested for each of the specific protein atom type with the S432 dataset. The prediction MCCs for the independent test with ANN_BAGGING on the S142 dataset are also shown for comparison.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3368894&req=5

pone-0037706-g002: Atom-based prediction accuracies for each of the 30 protein atom types.The x-axis represents indexes for the 30 atom types shown in Table 1. The y-axis shows averaged two-class prediction MCCs from the 5-fold cross validation of the ANN_BAGGING and SVM_BAGGING predictors trained and tested for each of the specific protein atom type with the S432 dataset. The prediction MCCs for the independent test with ANN_BAGGING on the S142 dataset are also shown for comparison.

Mentions: The results in Figure 1 indicate that the 31 features calculated with PDMs (a set of example PDMs on a protein are shown in Figure S1) and the 32nd feature based on the surface atom local geometry for each of the 30 protein atom types can be used as effective attributes in training machine learning models for PPI site predictions. Machine learning algorithms ANN_BAGGING and SVM_BAGGING were trained for each of the 30 protein surface atom types with five-fold cross validation on the S432 dataset as described in the Methods section. The atom-based MCCs for the five-fold cross validation for each of the atom types are summarized in Figure 2. The benchmarks for the prediction models are shown in Table 2. The differences of the averaged performance for the two machine learning algorithms are essentially indistinguishable (Figure 2 and Table 2), and thus only the ANN_BAGGING models with the best performance were used to benchmark on the S142 dataset as an independent test. The benchmark results on the independent test are compared with the five-fold cross validation in Figure 2 and in Table 2. The benchmark results for the independent test were comparable with the five-fold cross validation results, indicating that the machine learning predictors can be generalized to predict PPI sites on protein surfaces of unknown interaction partners. Figure 2 shows that the prediction models for the atom types from hydrophobic residues with aliphatic and aromatic side chains (atom type index = 8,9,18∼24,30) were predicted with relatively higher accuracies than the atom types from main chain and hydrophilic side chains. This suggests that the core PPI interfaces composed of hot-spot residues (except Arg) are more distinguishable as PPI sites in comparison with the surrounding rim regions populated with higher percentage of polar groups.


Protein-protein interaction site predictions with three-dimensional probability distributions of interacting atoms on protein surfaces.

Chen CT, Peng HP, Jian JW, Tsai KC, Chang JY, Yang EW, Chen JB, Ho SY, Hsu WL, Yang AS - PLoS ONE (2012)

Atom-based prediction accuracies for each of the 30 protein atom types.The x-axis represents indexes for the 30 atom types shown in Table 1. The y-axis shows averaged two-class prediction MCCs from the 5-fold cross validation of the ANN_BAGGING and SVM_BAGGING predictors trained and tested for each of the specific protein atom type with the S432 dataset. The prediction MCCs for the independent test with ANN_BAGGING on the S142 dataset are also shown for comparison.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3368894&req=5

pone-0037706-g002: Atom-based prediction accuracies for each of the 30 protein atom types.The x-axis represents indexes for the 30 atom types shown in Table 1. The y-axis shows averaged two-class prediction MCCs from the 5-fold cross validation of the ANN_BAGGING and SVM_BAGGING predictors trained and tested for each of the specific protein atom type with the S432 dataset. The prediction MCCs for the independent test with ANN_BAGGING on the S142 dataset are also shown for comparison.
Mentions: The results in Figure 1 indicate that the 31 features calculated with PDMs (a set of example PDMs on a protein are shown in Figure S1) and the 32nd feature based on the surface atom local geometry for each of the 30 protein atom types can be used as effective attributes in training machine learning models for PPI site predictions. Machine learning algorithms ANN_BAGGING and SVM_BAGGING were trained for each of the 30 protein surface atom types with five-fold cross validation on the S432 dataset as described in the Methods section. The atom-based MCCs for the five-fold cross validation for each of the atom types are summarized in Figure 2. The benchmarks for the prediction models are shown in Table 2. The differences of the averaged performance for the two machine learning algorithms are essentially indistinguishable (Figure 2 and Table 2), and thus only the ANN_BAGGING models with the best performance were used to benchmark on the S142 dataset as an independent test. The benchmark results on the independent test are compared with the five-fold cross validation in Figure 2 and in Table 2. The benchmark results for the independent test were comparable with the five-fold cross validation results, indicating that the machine learning predictors can be generalized to predict PPI sites on protein surfaces of unknown interaction partners. Figure 2 shows that the prediction models for the atom types from hydrophobic residues with aliphatic and aromatic side chains (atom type index = 8,9,18∼24,30) were predicted with relatively higher accuracies than the atom types from main chain and hydrophilic side chains. This suggests that the core PPI interfaces composed of hot-spot residues (except Arg) are more distinguishable as PPI sites in comparison with the surrounding rim regions populated with higher percentage of polar groups.

Bottom Line: The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces.In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence.The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors.

View Article: PubMed Central - PubMed

Affiliation: Genomics Research Center, Academia Sinica, Taipei, Taiwan.

ABSTRACT
Protein-protein interactions are key to many biological processes. Computational methodologies devised to predict protein-protein interaction (PPI) sites on protein surfaces are important tools in providing insights into the biological functions of proteins and in developing therapeutics targeting the protein-protein interaction sites. One of the general features of PPI sites is that the core regions from the two interacting protein surfaces are complementary to each other, similar to the interior of proteins in packing density and in the physicochemical nature of the amino acid composition. In this work, we simulated the physicochemical complementarities by constructing three-dimensional probability density maps of non-covalent interacting atoms on the protein surfaces. The interacting probabilities were derived from the interior of known structures. Machine learning algorithms were applied to learn the characteristic patterns of the probability density maps specific to the PPI sites. The trained predictors for PPI sites were cross-validated with the training cases (consisting of 432 proteins) and were tested on an independent dataset (consisting of 142 proteins). The residue-based Matthews correlation coefficient for the independent test set was 0.423; the accuracy, precision, sensitivity, specificity were 0.753, 0.519, 0.677, and 0.779 respectively. The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces. In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence. The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors.

Show MeSH