Limits...
Protein-protein interaction site predictions with three-dimensional probability distributions of interacting atoms on protein surfaces.

Chen CT, Peng HP, Jian JW, Tsai KC, Chang JY, Yang EW, Chen JB, Ho SY, Hsu WL, Yang AS - PLoS ONE (2012)

Bottom Line: The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces.In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence.The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors.

View Article: PubMed Central - PubMed

Affiliation: Genomics Research Center, Academia Sinica, Taipei, Taiwan.

ABSTRACT
Protein-protein interactions are key to many biological processes. Computational methodologies devised to predict protein-protein interaction (PPI) sites on protein surfaces are important tools in providing insights into the biological functions of proteins and in developing therapeutics targeting the protein-protein interaction sites. One of the general features of PPI sites is that the core regions from the two interacting protein surfaces are complementary to each other, similar to the interior of proteins in packing density and in the physicochemical nature of the amino acid composition. In this work, we simulated the physicochemical complementarities by constructing three-dimensional probability density maps of non-covalent interacting atoms on the protein surfaces. The interacting probabilities were derived from the interior of known structures. Machine learning algorithms were applied to learn the characteristic patterns of the probability density maps specific to the PPI sites. The trained predictors for PPI sites were cross-validated with the training cases (consisting of 432 proteins) and were tested on an independent dataset (consisting of 142 proteins). The residue-based Matthews correlation coefficient for the independent test set was 0.423; the accuracy, precision, sensitivity, specificity were 0.753, 0.519, 0.677, and 0.779 respectively. The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces. In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence. The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors.

Show MeSH

Related in: MedlinePlus

The distributions of the prediction accuracies on the 5-fold cross validations and on the independent test.The y-axis on the left-hand-side of the panel is associated with the histograms, showing the distributions of the number of proteins in the 5-fold cross validations or in the independent test that were predicted with the MCC within the MCC range shown in x-axis. The y-axis on the right-hand-side of the panel is associated with the curves connecting the dots representing the cumulative percentage of the proteins predicted with the residue-based MCC shown in the x-axis. The 5-fold cross validations were carried out with the ANN_BAGGING and SVM_BAGGING predictors on the S432 dataset; the independent test was carried out with the best ANN_BAGGING predictors from the 5-fold cross validation on the S142 dataset.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3368894&req=5

pone-0037706-g005: The distributions of the prediction accuracies on the 5-fold cross validations and on the independent test.The y-axis on the left-hand-side of the panel is associated with the histograms, showing the distributions of the number of proteins in the 5-fold cross validations or in the independent test that were predicted with the MCC within the MCC range shown in x-axis. The y-axis on the right-hand-side of the panel is associated with the curves connecting the dots representing the cumulative percentage of the proteins predicted with the residue-based MCC shown in the x-axis. The 5-fold cross validations were carried out with the ANN_BAGGING and SVM_BAGGING predictors on the S432 dataset; the independent test was carried out with the best ANN_BAGGING predictors from the 5-fold cross validation on the S142 dataset.

Mentions: The distribution of prediction accuracy for proteins in the S432 and S142 dataset are shown in Figure 5, for which the overall benchmark results are summarized in Table 3. The independent test (MCC = 0.423) for the residue-based PPI site predictions, as shown in Table 3, can be compared with previous publications based on the same training and test datasets. Porollo et al. [27] developed SPPIDER predictor for PPI site residue predictions with essential the same training and test datasets based on a combination of structural and sequence features. Their residue-based prediction MCC for the independent dataset is 0.42. In another work, a detailed analysis of the sequence and structural attributes on the same training and test datasets has concluded that the best performance for independent PPI site residue-based predictions yielded MCC of 0.37 on the same test set [3]. By taking away the evolutionary information from the prediction inputs, the MCC dropped to 0.34. Hence, the PPI site predictions based on the physicochemical complementarities derived from the PDMs on the protein surfaces are currently the best structure-based predictors judging by the MCC of the residue-based predictions. The performance of the predictors developed in this work would be further improved if the evolutionary information of the query proteins is to be integrated into the prediction algorithms.


Protein-protein interaction site predictions with three-dimensional probability distributions of interacting atoms on protein surfaces.

Chen CT, Peng HP, Jian JW, Tsai KC, Chang JY, Yang EW, Chen JB, Ho SY, Hsu WL, Yang AS - PLoS ONE (2012)

The distributions of the prediction accuracies on the 5-fold cross validations and on the independent test.The y-axis on the left-hand-side of the panel is associated with the histograms, showing the distributions of the number of proteins in the 5-fold cross validations or in the independent test that were predicted with the MCC within the MCC range shown in x-axis. The y-axis on the right-hand-side of the panel is associated with the curves connecting the dots representing the cumulative percentage of the proteins predicted with the residue-based MCC shown in the x-axis. The 5-fold cross validations were carried out with the ANN_BAGGING and SVM_BAGGING predictors on the S432 dataset; the independent test was carried out with the best ANN_BAGGING predictors from the 5-fold cross validation on the S142 dataset.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3368894&req=5

pone-0037706-g005: The distributions of the prediction accuracies on the 5-fold cross validations and on the independent test.The y-axis on the left-hand-side of the panel is associated with the histograms, showing the distributions of the number of proteins in the 5-fold cross validations or in the independent test that were predicted with the MCC within the MCC range shown in x-axis. The y-axis on the right-hand-side of the panel is associated with the curves connecting the dots representing the cumulative percentage of the proteins predicted with the residue-based MCC shown in the x-axis. The 5-fold cross validations were carried out with the ANN_BAGGING and SVM_BAGGING predictors on the S432 dataset; the independent test was carried out with the best ANN_BAGGING predictors from the 5-fold cross validation on the S142 dataset.
Mentions: The distribution of prediction accuracy for proteins in the S432 and S142 dataset are shown in Figure 5, for which the overall benchmark results are summarized in Table 3. The independent test (MCC = 0.423) for the residue-based PPI site predictions, as shown in Table 3, can be compared with previous publications based on the same training and test datasets. Porollo et al. [27] developed SPPIDER predictor for PPI site residue predictions with essential the same training and test datasets based on a combination of structural and sequence features. Their residue-based prediction MCC for the independent dataset is 0.42. In another work, a detailed analysis of the sequence and structural attributes on the same training and test datasets has concluded that the best performance for independent PPI site residue-based predictions yielded MCC of 0.37 on the same test set [3]. By taking away the evolutionary information from the prediction inputs, the MCC dropped to 0.34. Hence, the PPI site predictions based on the physicochemical complementarities derived from the PDMs on the protein surfaces are currently the best structure-based predictors judging by the MCC of the residue-based predictions. The performance of the predictors developed in this work would be further improved if the evolutionary information of the query proteins is to be integrated into the prediction algorithms.

Bottom Line: The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces.In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence.The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors.

View Article: PubMed Central - PubMed

Affiliation: Genomics Research Center, Academia Sinica, Taipei, Taiwan.

ABSTRACT
Protein-protein interactions are key to many biological processes. Computational methodologies devised to predict protein-protein interaction (PPI) sites on protein surfaces are important tools in providing insights into the biological functions of proteins and in developing therapeutics targeting the protein-protein interaction sites. One of the general features of PPI sites is that the core regions from the two interacting protein surfaces are complementary to each other, similar to the interior of proteins in packing density and in the physicochemical nature of the amino acid composition. In this work, we simulated the physicochemical complementarities by constructing three-dimensional probability density maps of non-covalent interacting atoms on the protein surfaces. The interacting probabilities were derived from the interior of known structures. Machine learning algorithms were applied to learn the characteristic patterns of the probability density maps specific to the PPI sites. The trained predictors for PPI sites were cross-validated with the training cases (consisting of 432 proteins) and were tested on an independent dataset (consisting of 142 proteins). The residue-based Matthews correlation coefficient for the independent test set was 0.423; the accuracy, precision, sensitivity, specificity were 0.753, 0.519, 0.677, and 0.779 respectively. The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces. In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence. The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors.

Show MeSH
Related in: MedlinePlus