Limits...
Protein-protein interaction site predictions with three-dimensional probability distributions of interacting atoms on protein surfaces.

Chen CT, Peng HP, Jian JW, Tsai KC, Chang JY, Yang EW, Chen JB, Ho SY, Hsu WL, Yang AS - PLoS ONE (2012)

Bottom Line: The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces.In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence.The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors.

View Article: PubMed Central - PubMed

Affiliation: Genomics Research Center, Academia Sinica, Taipei, Taiwan.

ABSTRACT
Protein-protein interactions are key to many biological processes. Computational methodologies devised to predict protein-protein interaction (PPI) sites on protein surfaces are important tools in providing insights into the biological functions of proteins and in developing therapeutics targeting the protein-protein interaction sites. One of the general features of PPI sites is that the core regions from the two interacting protein surfaces are complementary to each other, similar to the interior of proteins in packing density and in the physicochemical nature of the amino acid composition. In this work, we simulated the physicochemical complementarities by constructing three-dimensional probability density maps of non-covalent interacting atoms on the protein surfaces. The interacting probabilities were derived from the interior of known structures. Machine learning algorithms were applied to learn the characteristic patterns of the probability density maps specific to the PPI sites. The trained predictors for PPI sites were cross-validated with the training cases (consisting of 432 proteins) and were tested on an independent dataset (consisting of 142 proteins). The residue-based Matthews correlation coefficient for the independent test set was 0.423; the accuracy, precision, sensitivity, specificity were 0.753, 0.519, 0.677, and 0.779 respectively. The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces. In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence. The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors.

Show MeSH
Correlations of PPI site prediction accuracy to PPI features.The data were derived from the independent test of the best ANN_BAGGING predictors on the S142 dataset. (A) PPI patch size averaged over the proteins predicted within the residue-based MCC range shown in the x-axis is plotted against the MCC range. Patch size is defined as the number of residues in the actual PPI-site. (B) PPI patch hydrophobicity ratio averaged over the proteins predicted within the residue-based MCC range shown in the x-axis is plotted against the MCC range. Hydrophobic residues include Ala, Cys, Ile, Leu, Met, Phe, Pro, Tyr, Trp, and Val. Ratio of hydrophobic residues was computed as the number of hydrophobic residues in the PPI-site divided by the total number of residues in the site. (C) False negative ratio (FNR) and false positive ratio (FPR) averaged over the proteins predicted within the reisude-based MCC range shown in the x-axis is plotted against the MCC range. FNR was calculated as (FN/(TP+TN+FP+FN))×100%, and FPR was calculated as (FP/(TP+TN+FP+FN))×100%. The TP (true positive), TN (true negative), FP (false positive), and FN (false negative) were derived from residue-based predictions. (D) Distributions of homo-oligomers and hetero-oligomers are plotted against the residue-based MCC range. The detailed assignments of the PPI type for the proteins in the S142 dataset are shown in Table S4. MCC was calculated based on residue-based predictions.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3368894&req=5

pone-0037706-g007: Correlations of PPI site prediction accuracy to PPI features.The data were derived from the independent test of the best ANN_BAGGING predictors on the S142 dataset. (A) PPI patch size averaged over the proteins predicted within the residue-based MCC range shown in the x-axis is plotted against the MCC range. Patch size is defined as the number of residues in the actual PPI-site. (B) PPI patch hydrophobicity ratio averaged over the proteins predicted within the residue-based MCC range shown in the x-axis is plotted against the MCC range. Hydrophobic residues include Ala, Cys, Ile, Leu, Met, Phe, Pro, Tyr, Trp, and Val. Ratio of hydrophobic residues was computed as the number of hydrophobic residues in the PPI-site divided by the total number of residues in the site. (C) False negative ratio (FNR) and false positive ratio (FPR) averaged over the proteins predicted within the reisude-based MCC range shown in the x-axis is plotted against the MCC range. FNR was calculated as (FN/(TP+TN+FP+FN))×100%, and FPR was calculated as (FP/(TP+TN+FP+FN))×100%. The TP (true positive), TN (true negative), FP (false positive), and FN (false negative) were derived from residue-based predictions. (D) Distributions of homo-oligomers and hetero-oligomers are plotted against the residue-based MCC range. The detailed assignments of the PPI type for the proteins in the S142 dataset are shown in Table S4. MCC was calculated based on residue-based predictions.

Mentions: Figure 6 shows that the protein surface atoms predicted with high confidence level are more buried in the actual PPI sites and are mostly from hydrophobic and aromatic residues. Figure 6A shows the linear correlation between the prediction confidence level and the burial level – the higher the prediction confidence level for a surface atom to be in a PPI site, the more buried for the atom to be in an actual PPI interface. As expected, as shown in Figure 6B, the residues for which the atoms were predicted with confidence level ≥ 0.6 were mostly hydrophobic residues as Ile, Leu, Met, Phe, Tyr, and Val. The residue atoms predicted with modest confidence level between 0.2 and 0.6 are not as hydrophobic as those predicted with high confidence level (Figure 6B), and are not as hydrophilic as those predicted with confidence level less than 0.2 (Figure 6B). These results imply that the PPI sites with less prominent hydrophobic cores are less likely to be predicted with high accuracy. Indeed, this implication is validated in Figures 7, 8, and 9.


Protein-protein interaction site predictions with three-dimensional probability distributions of interacting atoms on protein surfaces.

Chen CT, Peng HP, Jian JW, Tsai KC, Chang JY, Yang EW, Chen JB, Ho SY, Hsu WL, Yang AS - PLoS ONE (2012)

Correlations of PPI site prediction accuracy to PPI features.The data were derived from the independent test of the best ANN_BAGGING predictors on the S142 dataset. (A) PPI patch size averaged over the proteins predicted within the residue-based MCC range shown in the x-axis is plotted against the MCC range. Patch size is defined as the number of residues in the actual PPI-site. (B) PPI patch hydrophobicity ratio averaged over the proteins predicted within the residue-based MCC range shown in the x-axis is plotted against the MCC range. Hydrophobic residues include Ala, Cys, Ile, Leu, Met, Phe, Pro, Tyr, Trp, and Val. Ratio of hydrophobic residues was computed as the number of hydrophobic residues in the PPI-site divided by the total number of residues in the site. (C) False negative ratio (FNR) and false positive ratio (FPR) averaged over the proteins predicted within the reisude-based MCC range shown in the x-axis is plotted against the MCC range. FNR was calculated as (FN/(TP+TN+FP+FN))×100%, and FPR was calculated as (FP/(TP+TN+FP+FN))×100%. The TP (true positive), TN (true negative), FP (false positive), and FN (false negative) were derived from residue-based predictions. (D) Distributions of homo-oligomers and hetero-oligomers are plotted against the residue-based MCC range. The detailed assignments of the PPI type for the proteins in the S142 dataset are shown in Table S4. MCC was calculated based on residue-based predictions.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3368894&req=5

pone-0037706-g007: Correlations of PPI site prediction accuracy to PPI features.The data were derived from the independent test of the best ANN_BAGGING predictors on the S142 dataset. (A) PPI patch size averaged over the proteins predicted within the residue-based MCC range shown in the x-axis is plotted against the MCC range. Patch size is defined as the number of residues in the actual PPI-site. (B) PPI patch hydrophobicity ratio averaged over the proteins predicted within the residue-based MCC range shown in the x-axis is plotted against the MCC range. Hydrophobic residues include Ala, Cys, Ile, Leu, Met, Phe, Pro, Tyr, Trp, and Val. Ratio of hydrophobic residues was computed as the number of hydrophobic residues in the PPI-site divided by the total number of residues in the site. (C) False negative ratio (FNR) and false positive ratio (FPR) averaged over the proteins predicted within the reisude-based MCC range shown in the x-axis is plotted against the MCC range. FNR was calculated as (FN/(TP+TN+FP+FN))×100%, and FPR was calculated as (FP/(TP+TN+FP+FN))×100%. The TP (true positive), TN (true negative), FP (false positive), and FN (false negative) were derived from residue-based predictions. (D) Distributions of homo-oligomers and hetero-oligomers are plotted against the residue-based MCC range. The detailed assignments of the PPI type for the proteins in the S142 dataset are shown in Table S4. MCC was calculated based on residue-based predictions.
Mentions: Figure 6 shows that the protein surface atoms predicted with high confidence level are more buried in the actual PPI sites and are mostly from hydrophobic and aromatic residues. Figure 6A shows the linear correlation between the prediction confidence level and the burial level – the higher the prediction confidence level for a surface atom to be in a PPI site, the more buried for the atom to be in an actual PPI interface. As expected, as shown in Figure 6B, the residues for which the atoms were predicted with confidence level ≥ 0.6 were mostly hydrophobic residues as Ile, Leu, Met, Phe, Tyr, and Val. The residue atoms predicted with modest confidence level between 0.2 and 0.6 are not as hydrophobic as those predicted with high confidence level (Figure 6B), and are not as hydrophilic as those predicted with confidence level less than 0.2 (Figure 6B). These results imply that the PPI sites with less prominent hydrophobic cores are less likely to be predicted with high accuracy. Indeed, this implication is validated in Figures 7, 8, and 9.

Bottom Line: The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces.In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence.The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors.

View Article: PubMed Central - PubMed

Affiliation: Genomics Research Center, Academia Sinica, Taipei, Taiwan.

ABSTRACT
Protein-protein interactions are key to many biological processes. Computational methodologies devised to predict protein-protein interaction (PPI) sites on protein surfaces are important tools in providing insights into the biological functions of proteins and in developing therapeutics targeting the protein-protein interaction sites. One of the general features of PPI sites is that the core regions from the two interacting protein surfaces are complementary to each other, similar to the interior of proteins in packing density and in the physicochemical nature of the amino acid composition. In this work, we simulated the physicochemical complementarities by constructing three-dimensional probability density maps of non-covalent interacting atoms on the protein surfaces. The interacting probabilities were derived from the interior of known structures. Machine learning algorithms were applied to learn the characteristic patterns of the probability density maps specific to the PPI sites. The trained predictors for PPI sites were cross-validated with the training cases (consisting of 432 proteins) and were tested on an independent dataset (consisting of 142 proteins). The residue-based Matthews correlation coefficient for the independent test set was 0.423; the accuracy, precision, sensitivity, specificity were 0.753, 0.519, 0.677, and 0.779 respectively. The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces. In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence. The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors.

Show MeSH