Limits...
Protein-protein interaction site predictions with three-dimensional probability distributions of interacting atoms on protein surfaces.

Chen CT, Peng HP, Jian JW, Tsai KC, Chang JY, Yang EW, Chen JB, Ho SY, Hsu WL, Yang AS - PLoS ONE (2012)

Bottom Line: The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces.In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence.The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors.

View Article: PubMed Central - PubMed

Affiliation: Genomics Research Center, Academia Sinica, Taipei, Taiwan.

ABSTRACT
Protein-protein interactions are key to many biological processes. Computational methodologies devised to predict protein-protein interaction (PPI) sites on protein surfaces are important tools in providing insights into the biological functions of proteins and in developing therapeutics targeting the protein-protein interaction sites. One of the general features of PPI sites is that the core regions from the two interacting protein surfaces are complementary to each other, similar to the interior of proteins in packing density and in the physicochemical nature of the amino acid composition. In this work, we simulated the physicochemical complementarities by constructing three-dimensional probability density maps of non-covalent interacting atoms on the protein surfaces. The interacting probabilities were derived from the interior of known structures. Machine learning algorithms were applied to learn the characteristic patterns of the probability density maps specific to the PPI sites. The trained predictors for PPI sites were cross-validated with the training cases (consisting of 432 proteins) and were tested on an independent dataset (consisting of 142 proteins). The residue-based Matthews correlation coefficient for the independent test set was 0.423; the accuracy, precision, sensitivity, specificity were 0.753, 0.519, 0.677, and 0.779 respectively. The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces. In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence. The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors.

Show MeSH
Correlations of PPI site prediction confidence level to atomic burial in protein complexes and to amino acid type.(A) Atom-based prediction confidence level range (shown in the x-axis of the panel) is correlated to the averaged burial level (measured by dSASA (Equation (4)) of the sub-group of atoms in the protein complexes predicted within the confidence level range. The correlation is shown by the diamond symbols, corresponding to the y-axis on the left-hand-side of the panel. The distribution of the atom-based predictions as shown by the curve, corresponding to the y-axis on the right-hand-side, is plotted against the prediction confidence level range in the x-axis. The data were derived from the independent test with the ANN_BAGGING predictors on the S142 dataset. (B) The histograms in this panel show the distributions of amino acid types in three groups of protein surface residues with various atom-based prediction confidence level ranges. The first group of residues contained atom-based prediction confidence level ≥ 0.6 for at least one atom in each of the residues. The second group of residues contained atom-based prediction confidence level between 0.6 and 0.2 for at least one atom in each of the residues. The third group of residues contained atom-based prediction confidence level less than 0.2 for at least one atom in each of the residues. The distribution of the percentage of the amino acid types in each of the three groups is shown by a histogram in the panel. The data were derived from the independent test of the best ANN_BAGGING predictors on the S142 dataset.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3368894&req=5

pone-0037706-g006: Correlations of PPI site prediction confidence level to atomic burial in protein complexes and to amino acid type.(A) Atom-based prediction confidence level range (shown in the x-axis of the panel) is correlated to the averaged burial level (measured by dSASA (Equation (4)) of the sub-group of atoms in the protein complexes predicted within the confidence level range. The correlation is shown by the diamond symbols, corresponding to the y-axis on the left-hand-side of the panel. The distribution of the atom-based predictions as shown by the curve, corresponding to the y-axis on the right-hand-side, is plotted against the prediction confidence level range in the x-axis. The data were derived from the independent test with the ANN_BAGGING predictors on the S142 dataset. (B) The histograms in this panel show the distributions of amino acid types in three groups of protein surface residues with various atom-based prediction confidence level ranges. The first group of residues contained atom-based prediction confidence level ≥ 0.6 for at least one atom in each of the residues. The second group of residues contained atom-based prediction confidence level between 0.6 and 0.2 for at least one atom in each of the residues. The third group of residues contained atom-based prediction confidence level less than 0.2 for at least one atom in each of the residues. The distribution of the percentage of the amino acid types in each of the three groups is shown by a histogram in the panel. The data were derived from the independent test of the best ANN_BAGGING predictors on the S142 dataset.

Mentions: Figure 6 shows that the protein surface atoms predicted with high confidence level are more buried in the actual PPI sites and are mostly from hydrophobic and aromatic residues. Figure 6A shows the linear correlation between the prediction confidence level and the burial level – the higher the prediction confidence level for a surface atom to be in a PPI site, the more buried for the atom to be in an actual PPI interface. As expected, as shown in Figure 6B, the residues for which the atoms were predicted with confidence level ≥ 0.6 were mostly hydrophobic residues as Ile, Leu, Met, Phe, Tyr, and Val. The residue atoms predicted with modest confidence level between 0.2 and 0.6 are not as hydrophobic as those predicted with high confidence level (Figure 6B), and are not as hydrophilic as those predicted with confidence level less than 0.2 (Figure 6B). These results imply that the PPI sites with less prominent hydrophobic cores are less likely to be predicted with high accuracy. Indeed, this implication is validated in Figures 7, 8, and 9.


Protein-protein interaction site predictions with three-dimensional probability distributions of interacting atoms on protein surfaces.

Chen CT, Peng HP, Jian JW, Tsai KC, Chang JY, Yang EW, Chen JB, Ho SY, Hsu WL, Yang AS - PLoS ONE (2012)

Correlations of PPI site prediction confidence level to atomic burial in protein complexes and to amino acid type.(A) Atom-based prediction confidence level range (shown in the x-axis of the panel) is correlated to the averaged burial level (measured by dSASA (Equation (4)) of the sub-group of atoms in the protein complexes predicted within the confidence level range. The correlation is shown by the diamond symbols, corresponding to the y-axis on the left-hand-side of the panel. The distribution of the atom-based predictions as shown by the curve, corresponding to the y-axis on the right-hand-side, is plotted against the prediction confidence level range in the x-axis. The data were derived from the independent test with the ANN_BAGGING predictors on the S142 dataset. (B) The histograms in this panel show the distributions of amino acid types in three groups of protein surface residues with various atom-based prediction confidence level ranges. The first group of residues contained atom-based prediction confidence level ≥ 0.6 for at least one atom in each of the residues. The second group of residues contained atom-based prediction confidence level between 0.6 and 0.2 for at least one atom in each of the residues. The third group of residues contained atom-based prediction confidence level less than 0.2 for at least one atom in each of the residues. The distribution of the percentage of the amino acid types in each of the three groups is shown by a histogram in the panel. The data were derived from the independent test of the best ANN_BAGGING predictors on the S142 dataset.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3368894&req=5

pone-0037706-g006: Correlations of PPI site prediction confidence level to atomic burial in protein complexes and to amino acid type.(A) Atom-based prediction confidence level range (shown in the x-axis of the panel) is correlated to the averaged burial level (measured by dSASA (Equation (4)) of the sub-group of atoms in the protein complexes predicted within the confidence level range. The correlation is shown by the diamond symbols, corresponding to the y-axis on the left-hand-side of the panel. The distribution of the atom-based predictions as shown by the curve, corresponding to the y-axis on the right-hand-side, is plotted against the prediction confidence level range in the x-axis. The data were derived from the independent test with the ANN_BAGGING predictors on the S142 dataset. (B) The histograms in this panel show the distributions of amino acid types in three groups of protein surface residues with various atom-based prediction confidence level ranges. The first group of residues contained atom-based prediction confidence level ≥ 0.6 for at least one atom in each of the residues. The second group of residues contained atom-based prediction confidence level between 0.6 and 0.2 for at least one atom in each of the residues. The third group of residues contained atom-based prediction confidence level less than 0.2 for at least one atom in each of the residues. The distribution of the percentage of the amino acid types in each of the three groups is shown by a histogram in the panel. The data were derived from the independent test of the best ANN_BAGGING predictors on the S142 dataset.
Mentions: Figure 6 shows that the protein surface atoms predicted with high confidence level are more buried in the actual PPI sites and are mostly from hydrophobic and aromatic residues. Figure 6A shows the linear correlation between the prediction confidence level and the burial level – the higher the prediction confidence level for a surface atom to be in a PPI site, the more buried for the atom to be in an actual PPI interface. As expected, as shown in Figure 6B, the residues for which the atoms were predicted with confidence level ≥ 0.6 were mostly hydrophobic residues as Ile, Leu, Met, Phe, Tyr, and Val. The residue atoms predicted with modest confidence level between 0.2 and 0.6 are not as hydrophobic as those predicted with high confidence level (Figure 6B), and are not as hydrophilic as those predicted with confidence level less than 0.2 (Figure 6B). These results imply that the PPI sites with less prominent hydrophobic cores are less likely to be predicted with high accuracy. Indeed, this implication is validated in Figures 7, 8, and 9.

Bottom Line: The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces.In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence.The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors.

View Article: PubMed Central - PubMed

Affiliation: Genomics Research Center, Academia Sinica, Taipei, Taiwan.

ABSTRACT
Protein-protein interactions are key to many biological processes. Computational methodologies devised to predict protein-protein interaction (PPI) sites on protein surfaces are important tools in providing insights into the biological functions of proteins and in developing therapeutics targeting the protein-protein interaction sites. One of the general features of PPI sites is that the core regions from the two interacting protein surfaces are complementary to each other, similar to the interior of proteins in packing density and in the physicochemical nature of the amino acid composition. In this work, we simulated the physicochemical complementarities by constructing three-dimensional probability density maps of non-covalent interacting atoms on the protein surfaces. The interacting probabilities were derived from the interior of known structures. Machine learning algorithms were applied to learn the characteristic patterns of the probability density maps specific to the PPI sites. The trained predictors for PPI sites were cross-validated with the training cases (consisting of 432 proteins) and were tested on an independent dataset (consisting of 142 proteins). The residue-based Matthews correlation coefficient for the independent test set was 0.423; the accuracy, precision, sensitivity, specificity were 0.753, 0.519, 0.677, and 0.779 respectively. The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces. In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence. The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors.

Show MeSH