Limits...
Protein-protein interaction site predictions with three-dimensional probability distributions of interacting atoms on protein surfaces.

Chen CT, Peng HP, Jian JW, Tsai KC, Chang JY, Yang EW, Chen JB, Ho SY, Hsu WL, Yang AS - PLoS ONE (2012)

Bottom Line: The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces.In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence.The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors.

View Article: PubMed Central - PubMed

Affiliation: Genomics Research Center, Academia Sinica, Taipei, Taiwan.

ABSTRACT
Protein-protein interactions are key to many biological processes. Computational methodologies devised to predict protein-protein interaction (PPI) sites on protein surfaces are important tools in providing insights into the biological functions of proteins and in developing therapeutics targeting the protein-protein interaction sites. One of the general features of PPI sites is that the core regions from the two interacting protein surfaces are complementary to each other, similar to the interior of proteins in packing density and in the physicochemical nature of the amino acid composition. In this work, we simulated the physicochemical complementarities by constructing three-dimensional probability density maps of non-covalent interacting atoms on the protein surfaces. The interacting probabilities were derived from the interior of known structures. Machine learning algorithms were applied to learn the characteristic patterns of the probability density maps specific to the PPI sites. The trained predictors for PPI sites were cross-validated with the training cases (consisting of 432 proteins) and were tested on an independent dataset (consisting of 142 proteins). The residue-based Matthews correlation coefficient for the independent test set was 0.423; the accuracy, precision, sensitivity, specificity were 0.753, 0.519, 0.677, and 0.779 respectively. The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces. In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence. The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors.

Show MeSH
Mann-Whitney U-tests for the distributions of numerical attributes around protein surface atoms.The y-axis of matrix shows the atom type index (i = 30 protein atom types shown in Table 1) and the x-axis shows the j index for the 32 Ai,j features, where j = 1,31 represents the 31 interacting atom types shown in Table 1 and the 32nd feature reflects the local geometry of the protein surface. The matrix element (j,i) shows the Mann-Whitney U-test p-value in color-code for the two groups of Ai,j : one group of Ai,j was calculated for the attribute type j around the surface atom type i in the known PPI sites on proteins in the S432 dataset and the other group was calculated for the same attribute type around the non-PPI site atom type i in the same dataset. The p-values were calculated with the Mann-Whitney U-test implemented as the function ranksum in MATLAB. Two sets of data were input to the function and the output p-value is the probability for the two distributions of data to be statistically indistinguishable. The plus(+) sign in the matrix element indicates that the averaged feature value for the PPI site atoms is larger than the averaged feature value for the non-PPI site atoms and the negative(−) is the opposite. The panel on the right-hand-side of the matrix shows the distributions of protein surface atoms in PPI sites (blue) and non-PPI protein surfaces (red) against protein atom type. The data were derived from proteins in S432.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3368894&req=5

pone-0037706-g001: Mann-Whitney U-tests for the distributions of numerical attributes around protein surface atoms.The y-axis of matrix shows the atom type index (i = 30 protein atom types shown in Table 1) and the x-axis shows the j index for the 32 Ai,j features, where j = 1,31 represents the 31 interacting atom types shown in Table 1 and the 32nd feature reflects the local geometry of the protein surface. The matrix element (j,i) shows the Mann-Whitney U-test p-value in color-code for the two groups of Ai,j : one group of Ai,j was calculated for the attribute type j around the surface atom type i in the known PPI sites on proteins in the S432 dataset and the other group was calculated for the same attribute type around the non-PPI site atom type i in the same dataset. The p-values were calculated with the Mann-Whitney U-test implemented as the function ranksum in MATLAB. Two sets of data were input to the function and the output p-value is the probability for the two distributions of data to be statistically indistinguishable. The plus(+) sign in the matrix element indicates that the averaged feature value for the PPI site atoms is larger than the averaged feature value for the non-PPI site atoms and the negative(−) is the opposite. The panel on the right-hand-side of the matrix shows the distributions of protein surface atoms in PPI sites (blue) and non-PPI protein surfaces (red) against protein atom type. The data were derived from proteins in S432.

Mentions: Figure 1 demonstrates the validity of the hypothesis above. The physicochemical complementarities around the protein surface atom i were simulated with the PDMs of non-covalent interacting atoms and were described with the 32 numerical features calculated with Equation (2) (i.e., Ai,j for interacting atom type j = 1∼31 as shown in Table 1; j = 32 derived from protein surface geometry). The matrix element (j,i) in Figure 1 shows the Mann-Whitney U-test result for the two groups of Ai,j: one group of Ai,j was calculated for the interacting atom type j around the surface atom type i in the known PPI sites on proteins in the S432 dataset and the other group was calculated for the same interacting atom type around the non-PPI site atom type i in the same dataset. The matrix elements showing decreasing p-value substantially less than the statistical threshold of 0.025 are colored in red with increasing depth. These U-test p-values reflect the significant statistical differences in the attributes calculated from the PDMs or surface geometry between the protein surface atoms in known PPI sites and the atoms outside known PPI sites.


Protein-protein interaction site predictions with three-dimensional probability distributions of interacting atoms on protein surfaces.

Chen CT, Peng HP, Jian JW, Tsai KC, Chang JY, Yang EW, Chen JB, Ho SY, Hsu WL, Yang AS - PLoS ONE (2012)

Mann-Whitney U-tests for the distributions of numerical attributes around protein surface atoms.The y-axis of matrix shows the atom type index (i = 30 protein atom types shown in Table 1) and the x-axis shows the j index for the 32 Ai,j features, where j = 1,31 represents the 31 interacting atom types shown in Table 1 and the 32nd feature reflects the local geometry of the protein surface. The matrix element (j,i) shows the Mann-Whitney U-test p-value in color-code for the two groups of Ai,j : one group of Ai,j was calculated for the attribute type j around the surface atom type i in the known PPI sites on proteins in the S432 dataset and the other group was calculated for the same attribute type around the non-PPI site atom type i in the same dataset. The p-values were calculated with the Mann-Whitney U-test implemented as the function ranksum in MATLAB. Two sets of data were input to the function and the output p-value is the probability for the two distributions of data to be statistically indistinguishable. The plus(+) sign in the matrix element indicates that the averaged feature value for the PPI site atoms is larger than the averaged feature value for the non-PPI site atoms and the negative(−) is the opposite. The panel on the right-hand-side of the matrix shows the distributions of protein surface atoms in PPI sites (blue) and non-PPI protein surfaces (red) against protein atom type. The data were derived from proteins in S432.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3368894&req=5

pone-0037706-g001: Mann-Whitney U-tests for the distributions of numerical attributes around protein surface atoms.The y-axis of matrix shows the atom type index (i = 30 protein atom types shown in Table 1) and the x-axis shows the j index for the 32 Ai,j features, where j = 1,31 represents the 31 interacting atom types shown in Table 1 and the 32nd feature reflects the local geometry of the protein surface. The matrix element (j,i) shows the Mann-Whitney U-test p-value in color-code for the two groups of Ai,j : one group of Ai,j was calculated for the attribute type j around the surface atom type i in the known PPI sites on proteins in the S432 dataset and the other group was calculated for the same attribute type around the non-PPI site atom type i in the same dataset. The p-values were calculated with the Mann-Whitney U-test implemented as the function ranksum in MATLAB. Two sets of data were input to the function and the output p-value is the probability for the two distributions of data to be statistically indistinguishable. The plus(+) sign in the matrix element indicates that the averaged feature value for the PPI site atoms is larger than the averaged feature value for the non-PPI site atoms and the negative(−) is the opposite. The panel on the right-hand-side of the matrix shows the distributions of protein surface atoms in PPI sites (blue) and non-PPI protein surfaces (red) against protein atom type. The data were derived from proteins in S432.
Mentions: Figure 1 demonstrates the validity of the hypothesis above. The physicochemical complementarities around the protein surface atom i were simulated with the PDMs of non-covalent interacting atoms and were described with the 32 numerical features calculated with Equation (2) (i.e., Ai,j for interacting atom type j = 1∼31 as shown in Table 1; j = 32 derived from protein surface geometry). The matrix element (j,i) in Figure 1 shows the Mann-Whitney U-test result for the two groups of Ai,j: one group of Ai,j was calculated for the interacting atom type j around the surface atom type i in the known PPI sites on proteins in the S432 dataset and the other group was calculated for the same interacting atom type around the non-PPI site atom type i in the same dataset. The matrix elements showing decreasing p-value substantially less than the statistical threshold of 0.025 are colored in red with increasing depth. These U-test p-values reflect the significant statistical differences in the attributes calculated from the PDMs or surface geometry between the protein surface atoms in known PPI sites and the atoms outside known PPI sites.

Bottom Line: The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces.In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence.The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors.

View Article: PubMed Central - PubMed

Affiliation: Genomics Research Center, Academia Sinica, Taipei, Taiwan.

ABSTRACT
Protein-protein interactions are key to many biological processes. Computational methodologies devised to predict protein-protein interaction (PPI) sites on protein surfaces are important tools in providing insights into the biological functions of proteins and in developing therapeutics targeting the protein-protein interaction sites. One of the general features of PPI sites is that the core regions from the two interacting protein surfaces are complementary to each other, similar to the interior of proteins in packing density and in the physicochemical nature of the amino acid composition. In this work, we simulated the physicochemical complementarities by constructing three-dimensional probability density maps of non-covalent interacting atoms on the protein surfaces. The interacting probabilities were derived from the interior of known structures. Machine learning algorithms were applied to learn the characteristic patterns of the probability density maps specific to the PPI sites. The trained predictors for PPI sites were cross-validated with the training cases (consisting of 432 proteins) and were tested on an independent dataset (consisting of 142 proteins). The residue-based Matthews correlation coefficient for the independent test set was 0.423; the accuracy, precision, sensitivity, specificity were 0.753, 0.519, 0.677, and 0.779 respectively. The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces. In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence. The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors.

Show MeSH