Limits...
Protein-protein interaction site predictions with three-dimensional probability distributions of interacting atoms on protein surfaces.

Chen CT, Peng HP, Jian JW, Tsai KC, Chang JY, Yang EW, Chen JB, Ho SY, Hsu WL, Yang AS - PLoS ONE (2012)

Bottom Line: The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces.In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence.The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors.

View Article: PubMed Central - PubMed

Affiliation: Genomics Research Center, Academia Sinica, Taipei, Taiwan.

ABSTRACT
Protein-protein interactions are key to many biological processes. Computational methodologies devised to predict protein-protein interaction (PPI) sites on protein surfaces are important tools in providing insights into the biological functions of proteins and in developing therapeutics targeting the protein-protein interaction sites. One of the general features of PPI sites is that the core regions from the two interacting protein surfaces are complementary to each other, similar to the interior of proteins in packing density and in the physicochemical nature of the amino acid composition. In this work, we simulated the physicochemical complementarities by constructing three-dimensional probability density maps of non-covalent interacting atoms on the protein surfaces. The interacting probabilities were derived from the interior of known structures. Machine learning algorithms were applied to learn the characteristic patterns of the probability density maps specific to the PPI sites. The trained predictors for PPI sites were cross-validated with the training cases (consisting of 432 proteins) and were tested on an independent dataset (consisting of 142 proteins). The residue-based Matthews correlation coefficient for the independent test set was 0.423; the accuracy, precision, sensitivity, specificity were 0.753, 0.519, 0.677, and 0.779 respectively. The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces. In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence. The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors.

Show MeSH
Residue-based two-class prediction MCCs for each of the 20 natural amino acid types.The MCCs were calculated as the average value from the 5-fold cross validation with the ANN_BAGGING and SVM_BAGGING predictors on the S432 dataset. The independent test MCCs with the best ANN_BAGGING predictors from the 5-fold cross validation on the S142 dataset are also shown for comparison.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3368894&req=5

pone-0037706-g004: Residue-based two-class prediction MCCs for each of the 20 natural amino acid types.The MCCs were calculated as the average value from the 5-fold cross validation with the ANN_BAGGING and SVM_BAGGING predictors on the S432 dataset. The independent test MCCs with the best ANN_BAGGING predictors from the 5-fold cross validation on the S142 dataset are also shown for comparison.

Mentions: Residues in the predicted PPI surface patches were predicted based on the atom-based PPI site predictions (see Methods) and were benchmarked with the residues in actual PPI sites. The example residue-based PPI site predictions are also compared side-by-side with the atom-based predictions and the actual PPI sites in Figure 3. The residue-based MCC for each of the amino acid types are shown in Figure 4. The accuracy benchmarks are summarized in Table 3. Again, the two machine learning algorithms are comparable in terms of the prediction performance (Table 3 and Figure 4). The generalized prediction capacity of the ANN_BAGGING models was demonstrated with the results of the independent test, for which the results were essentially indistinguishable from the results of the five-fold cross validation as shown in Figure 4 and Table 3. Accuracy benchmarks for each protein from the cross validation (with ANN_BAGGING and SVM_BAGGING) and from the independent test (with ANN_BAGGING) are listed in Table S2, S3, and S4 respectively. The prediction results can also be viewed in color-coded 3-D protein structures from the web server http://ismblab.genomics.sinica.edu.tw/> benchmark >protein-protein.


Protein-protein interaction site predictions with three-dimensional probability distributions of interacting atoms on protein surfaces.

Chen CT, Peng HP, Jian JW, Tsai KC, Chang JY, Yang EW, Chen JB, Ho SY, Hsu WL, Yang AS - PLoS ONE (2012)

Residue-based two-class prediction MCCs for each of the 20 natural amino acid types.The MCCs were calculated as the average value from the 5-fold cross validation with the ANN_BAGGING and SVM_BAGGING predictors on the S432 dataset. The independent test MCCs with the best ANN_BAGGING predictors from the 5-fold cross validation on the S142 dataset are also shown for comparison.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3368894&req=5

pone-0037706-g004: Residue-based two-class prediction MCCs for each of the 20 natural amino acid types.The MCCs were calculated as the average value from the 5-fold cross validation with the ANN_BAGGING and SVM_BAGGING predictors on the S432 dataset. The independent test MCCs with the best ANN_BAGGING predictors from the 5-fold cross validation on the S142 dataset are also shown for comparison.
Mentions: Residues in the predicted PPI surface patches were predicted based on the atom-based PPI site predictions (see Methods) and were benchmarked with the residues in actual PPI sites. The example residue-based PPI site predictions are also compared side-by-side with the atom-based predictions and the actual PPI sites in Figure 3. The residue-based MCC for each of the amino acid types are shown in Figure 4. The accuracy benchmarks are summarized in Table 3. Again, the two machine learning algorithms are comparable in terms of the prediction performance (Table 3 and Figure 4). The generalized prediction capacity of the ANN_BAGGING models was demonstrated with the results of the independent test, for which the results were essentially indistinguishable from the results of the five-fold cross validation as shown in Figure 4 and Table 3. Accuracy benchmarks for each protein from the cross validation (with ANN_BAGGING and SVM_BAGGING) and from the independent test (with ANN_BAGGING) are listed in Table S2, S3, and S4 respectively. The prediction results can also be viewed in color-coded 3-D protein structures from the web server http://ismblab.genomics.sinica.edu.tw/> benchmark >protein-protein.

Bottom Line: The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces.In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence.The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors.

View Article: PubMed Central - PubMed

Affiliation: Genomics Research Center, Academia Sinica, Taipei, Taiwan.

ABSTRACT
Protein-protein interactions are key to many biological processes. Computational methodologies devised to predict protein-protein interaction (PPI) sites on protein surfaces are important tools in providing insights into the biological functions of proteins and in developing therapeutics targeting the protein-protein interaction sites. One of the general features of PPI sites is that the core regions from the two interacting protein surfaces are complementary to each other, similar to the interior of proteins in packing density and in the physicochemical nature of the amino acid composition. In this work, we simulated the physicochemical complementarities by constructing three-dimensional probability density maps of non-covalent interacting atoms on the protein surfaces. The interacting probabilities were derived from the interior of known structures. Machine learning algorithms were applied to learn the characteristic patterns of the probability density maps specific to the PPI sites. The trained predictors for PPI sites were cross-validated with the training cases (consisting of 432 proteins) and were tested on an independent dataset (consisting of 142 proteins). The residue-based Matthews correlation coefficient for the independent test set was 0.423; the accuracy, precision, sensitivity, specificity were 0.753, 0.519, 0.677, and 0.779 respectively. The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces. In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence. The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors.

Show MeSH