Limits...
Protein-protein interaction site predictions with three-dimensional probability distributions of interacting atoms on protein surfaces.

Chen CT, Peng HP, Jian JW, Tsai KC, Chang JY, Yang EW, Chen JB, Ho SY, Hsu WL, Yang AS - PLoS ONE (2012)

Bottom Line: The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces.In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence.The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors.

View Article: PubMed Central - PubMed

Affiliation: Genomics Research Center, Academia Sinica, Taipei, Taiwan.

ABSTRACT
Protein-protein interactions are key to many biological processes. Computational methodologies devised to predict protein-protein interaction (PPI) sites on protein surfaces are important tools in providing insights into the biological functions of proteins and in developing therapeutics targeting the protein-protein interaction sites. One of the general features of PPI sites is that the core regions from the two interacting protein surfaces are complementary to each other, similar to the interior of proteins in packing density and in the physicochemical nature of the amino acid composition. In this work, we simulated the physicochemical complementarities by constructing three-dimensional probability density maps of non-covalent interacting atoms on the protein surfaces. The interacting probabilities were derived from the interior of known structures. Machine learning algorithms were applied to learn the characteristic patterns of the probability density maps specific to the PPI sites. The trained predictors for PPI sites were cross-validated with the training cases (consisting of 432 proteins) and were tested on an independent dataset (consisting of 142 proteins). The residue-based Matthews correlation coefficient for the independent test set was 0.423; the accuracy, precision, sensitivity, specificity were 0.753, 0.519, 0.677, and 0.779 respectively. The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces. In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence. The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors.

Show MeSH

Related in: MedlinePlus

Ranking of the attributes derived from PDMs.Each of the surface atoms i in the S142 dataset has a confidence level on the prediction of the atom to be in a PPI site. This prediction confidence level is correlated to various extents with the 32 attributes (ai,j (j = 1∼32) as shown in Equation (3)), which were used as inputs for the machine learning predictors in making the predictions. The blue histogram shows the correlations between prediction confidence levels and attributes derived from concentrations of PDMs. The Pearson’s correlation coefficients, which are the measurements for the linear correlations between the prediction confidence level and the attributes, are shown in the y-axis. The x-axis shows the feature types (Table 1), each of which corresponds to one of the ai,j. The red histogram shows the Pearson’s correlation coefficients between the positive (1 for PPI site atoms) or negative (0 for non-PPI site atoms) assignments for protein surface atoms and the attribute values for the protein surface atoms.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3368894&req=5

pone-0037706-g008: Ranking of the attributes derived from PDMs.Each of the surface atoms i in the S142 dataset has a confidence level on the prediction of the atom to be in a PPI site. This prediction confidence level is correlated to various extents with the 32 attributes (ai,j (j = 1∼32) as shown in Equation (3)), which were used as inputs for the machine learning predictors in making the predictions. The blue histogram shows the correlations between prediction confidence levels and attributes derived from concentrations of PDMs. The Pearson’s correlation coefficients, which are the measurements for the linear correlations between the prediction confidence level and the attributes, are shown in the y-axis. The x-axis shows the feature types (Table 1), each of which corresponds to one of the ai,j. The red histogram shows the Pearson’s correlation coefficients between the positive (1 for PPI site atoms) or negative (0 for non-PPI site atoms) assignments for protein surface atoms and the attribute values for the protein surface atoms.

Mentions: Figure 6 shows that the protein surface atoms predicted with high confidence level are more buried in the actual PPI sites and are mostly from hydrophobic and aromatic residues. Figure 6A shows the linear correlation between the prediction confidence level and the burial level – the higher the prediction confidence level for a surface atom to be in a PPI site, the more buried for the atom to be in an actual PPI interface. As expected, as shown in Figure 6B, the residues for which the atoms were predicted with confidence level ≥ 0.6 were mostly hydrophobic residues as Ile, Leu, Met, Phe, Tyr, and Val. The residue atoms predicted with modest confidence level between 0.2 and 0.6 are not as hydrophobic as those predicted with high confidence level (Figure 6B), and are not as hydrophilic as those predicted with confidence level less than 0.2 (Figure 6B). These results imply that the PPI sites with less prominent hydrophobic cores are less likely to be predicted with high accuracy. Indeed, this implication is validated in Figures 7, 8, and 9.


Protein-protein interaction site predictions with three-dimensional probability distributions of interacting atoms on protein surfaces.

Chen CT, Peng HP, Jian JW, Tsai KC, Chang JY, Yang EW, Chen JB, Ho SY, Hsu WL, Yang AS - PLoS ONE (2012)

Ranking of the attributes derived from PDMs.Each of the surface atoms i in the S142 dataset has a confidence level on the prediction of the atom to be in a PPI site. This prediction confidence level is correlated to various extents with the 32 attributes (ai,j (j = 1∼32) as shown in Equation (3)), which were used as inputs for the machine learning predictors in making the predictions. The blue histogram shows the correlations between prediction confidence levels and attributes derived from concentrations of PDMs. The Pearson’s correlation coefficients, which are the measurements for the linear correlations between the prediction confidence level and the attributes, are shown in the y-axis. The x-axis shows the feature types (Table 1), each of which corresponds to one of the ai,j. The red histogram shows the Pearson’s correlation coefficients between the positive (1 for PPI site atoms) or negative (0 for non-PPI site atoms) assignments for protein surface atoms and the attribute values for the protein surface atoms.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3368894&req=5

pone-0037706-g008: Ranking of the attributes derived from PDMs.Each of the surface atoms i in the S142 dataset has a confidence level on the prediction of the atom to be in a PPI site. This prediction confidence level is correlated to various extents with the 32 attributes (ai,j (j = 1∼32) as shown in Equation (3)), which were used as inputs for the machine learning predictors in making the predictions. The blue histogram shows the correlations between prediction confidence levels and attributes derived from concentrations of PDMs. The Pearson’s correlation coefficients, which are the measurements for the linear correlations between the prediction confidence level and the attributes, are shown in the y-axis. The x-axis shows the feature types (Table 1), each of which corresponds to one of the ai,j. The red histogram shows the Pearson’s correlation coefficients between the positive (1 for PPI site atoms) or negative (0 for non-PPI site atoms) assignments for protein surface atoms and the attribute values for the protein surface atoms.
Mentions: Figure 6 shows that the protein surface atoms predicted with high confidence level are more buried in the actual PPI sites and are mostly from hydrophobic and aromatic residues. Figure 6A shows the linear correlation between the prediction confidence level and the burial level – the higher the prediction confidence level for a surface atom to be in a PPI site, the more buried for the atom to be in an actual PPI interface. As expected, as shown in Figure 6B, the residues for which the atoms were predicted with confidence level ≥ 0.6 were mostly hydrophobic residues as Ile, Leu, Met, Phe, Tyr, and Val. The residue atoms predicted with modest confidence level between 0.2 and 0.6 are not as hydrophobic as those predicted with high confidence level (Figure 6B), and are not as hydrophilic as those predicted with confidence level less than 0.2 (Figure 6B). These results imply that the PPI sites with less prominent hydrophobic cores are less likely to be predicted with high accuracy. Indeed, this implication is validated in Figures 7, 8, and 9.

Bottom Line: The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces.In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence.The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors.

View Article: PubMed Central - PubMed

Affiliation: Genomics Research Center, Academia Sinica, Taipei, Taiwan.

ABSTRACT
Protein-protein interactions are key to many biological processes. Computational methodologies devised to predict protein-protein interaction (PPI) sites on protein surfaces are important tools in providing insights into the biological functions of proteins and in developing therapeutics targeting the protein-protein interaction sites. One of the general features of PPI sites is that the core regions from the two interacting protein surfaces are complementary to each other, similar to the interior of proteins in packing density and in the physicochemical nature of the amino acid composition. In this work, we simulated the physicochemical complementarities by constructing three-dimensional probability density maps of non-covalent interacting atoms on the protein surfaces. The interacting probabilities were derived from the interior of known structures. Machine learning algorithms were applied to learn the characteristic patterns of the probability density maps specific to the PPI sites. The trained predictors for PPI sites were cross-validated with the training cases (consisting of 432 proteins) and were tested on an independent dataset (consisting of 142 proteins). The residue-based Matthews correlation coefficient for the independent test set was 0.423; the accuracy, precision, sensitivity, specificity were 0.753, 0.519, 0.677, and 0.779 respectively. The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces. In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence. The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors.

Show MeSH
Related in: MedlinePlus