Limits...
NPPD: A Protein-Protein Docking Scoring Function Based on Dyadic Differences in Networks of Hydrophobic and Hydrophilic Amino Acid Residues.

Shih ES, Hwang MJ - Biology (Basel) (2015)

Bottom Line: Here, we report a network-based PPD scoring function, the NPPD, in which the network consists of two types of network nodes, one for hydrophobic and the other for hydrophilic amino acid residues, and the nodes are connected when the residues they represent are within a certain contact distance.We showed that network parameters that compute dyadic interactions and those that compute heterophilic interactions of the amino acid networks thus constructed allowed NPPD to perform well in a benchmark evaluation of 115 PPD scoring functions, most of which, unlike NPPD, are based on some sort of protein-protein interaction energy.We also showed that NPPD was highly complementary to these energy-based scoring functions, suggesting that the combined use of conventional scoring functions and NPPD might significantly improve the accuracy of current PPD predictions.

View Article: PubMed Central - PubMed

Affiliation: Institute of Biomedical Sciences, Academia Sinica, Nankang, Taipei 115, Taiwan. shihds@gate.sinica.edu.tw.

ABSTRACT
Protein-protein docking (PPD) predictions usually rely on the use of a scoring function to rank docking models generated by exhaustive sampling. To rank good models higher than bad ones, a large number of scoring functions have been developed and evaluated, but the methods used for the computation of PPD predictions remain largely unsatisfactory. Here, we report a network-based PPD scoring function, the NPPD, in which the network consists of two types of network nodes, one for hydrophobic and the other for hydrophilic amino acid residues, and the nodes are connected when the residues they represent are within a certain contact distance. We showed that network parameters that compute dyadic interactions and those that compute heterophilic interactions of the amino acid networks thus constructed allowed NPPD to perform well in a benchmark evaluation of 115 PPD scoring functions, most of which, unlike NPPD, are based on some sort of protein-protein interaction energy. We also showed that NPPD was highly complementary to these energy-based scoring functions, suggesting that the combined use of conventional scoring functions and NPPD might significantly improve the accuracy of current PPD predictions.

No MeSH data available.


Number of positive poses and Dp-p plotted against unbound/bound IRMSD. The 176 benchmark complexes of ZDOCK are ordered in increasing unbound/bound IRMSD, the best RMSD of interface residues superimposed between the unbound form and the bound form of the complex, with the PDB ID of every 5th complex indicated on the X-axis. Dashed line denotes a number of 300 positive poses. In the top half of the figure are the averages and standard deviations of the parameter Dp-p computed from the positive poses of each complex; all other attributes used by NPPD, and for negative poses, showed a similar random distribution [64].
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4498300&req=5

biology-04-00282-f004: Number of positive poses and Dp-p plotted against unbound/bound IRMSD. The 176 benchmark complexes of ZDOCK are ordered in increasing unbound/bound IRMSD, the best RMSD of interface residues superimposed between the unbound form and the bound form of the complex, with the PDB ID of every 5th complex indicated on the X-axis. Dashed line denotes a number of 300 positive poses. In the top half of the figure are the averages and standard deviations of the parameter Dp-p computed from the positive poses of each complex; all other attributes used by NPPD, and for negative poses, showed a similar random distribution [64].

Mentions: Without the ability to handle large conformational change induced by complex formation, PPD methods would perform badly for such complexes [2]. Indeed, both NPPD and IRAD failed to produce a Top100 success hit for those in the benchmark set with the largest unbound/bound IRMSDs, indicative of a significant change in conformation between the unbound and bound form of the complex (Figure 4). However, conformational change is not the only culprit for failures in PPD predictions. Figure 4 shows that if sampling could not produce a sufficient number, say 300, of positive (good) poses as defined by IRMSD < 2.5 Å (see Figure 1b) to score upon, the likelihood for either NPPD or IRAD to succeed was drastically decreased, even for complexes considered as “rigid” [65]. Further analysis indicated that some of these “rigid” complexes had a particularly small interface and hence might be difficult to sample and predict [77]. Since the best current scoring functions all performed similarly (Figure 3), we speculate that the same two factors, conformational change and insufficient sampling of good poses, also limit the success of other PPD methods. Note that while the sampling of good poses among different complexes was unbalanced, the distribution of the attributes used by NPPD was not (Figure 4), suggesting that sampling bias would not significantly affect training of the Bayesian model. While it is not entirely clear to us what gave rise to the apparently poor correlation between the number of good poses sampled and unbound/bound IRMSD as observed in Figure 4, it is notable that NPPD was better than IRAD for a few of those with the smallest unbound/bound IRMSDs and poor sampling, whereas IRAD did much better than NPPD for those ranked next in unbound/bound IRMSD (roughly between complex 1PPE and 2QFW in Figure 4), thereby contributing partly to the high complementarity between the two methods (Table 1). Taken all these results together, we can conclude that while it is still likely to significantly improve PPD performance by combining all the different scoring functions, the main barriers to overcome remain those arising from sampling and conformational change.


NPPD: A Protein-Protein Docking Scoring Function Based on Dyadic Differences in Networks of Hydrophobic and Hydrophilic Amino Acid Residues.

Shih ES, Hwang MJ - Biology (Basel) (2015)

Number of positive poses and Dp-p plotted against unbound/bound IRMSD. The 176 benchmark complexes of ZDOCK are ordered in increasing unbound/bound IRMSD, the best RMSD of interface residues superimposed between the unbound form and the bound form of the complex, with the PDB ID of every 5th complex indicated on the X-axis. Dashed line denotes a number of 300 positive poses. In the top half of the figure are the averages and standard deviations of the parameter Dp-p computed from the positive poses of each complex; all other attributes used by NPPD, and for negative poses, showed a similar random distribution [64].
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4498300&req=5

biology-04-00282-f004: Number of positive poses and Dp-p plotted against unbound/bound IRMSD. The 176 benchmark complexes of ZDOCK are ordered in increasing unbound/bound IRMSD, the best RMSD of interface residues superimposed between the unbound form and the bound form of the complex, with the PDB ID of every 5th complex indicated on the X-axis. Dashed line denotes a number of 300 positive poses. In the top half of the figure are the averages and standard deviations of the parameter Dp-p computed from the positive poses of each complex; all other attributes used by NPPD, and for negative poses, showed a similar random distribution [64].
Mentions: Without the ability to handle large conformational change induced by complex formation, PPD methods would perform badly for such complexes [2]. Indeed, both NPPD and IRAD failed to produce a Top100 success hit for those in the benchmark set with the largest unbound/bound IRMSDs, indicative of a significant change in conformation between the unbound and bound form of the complex (Figure 4). However, conformational change is not the only culprit for failures in PPD predictions. Figure 4 shows that if sampling could not produce a sufficient number, say 300, of positive (good) poses as defined by IRMSD < 2.5 Å (see Figure 1b) to score upon, the likelihood for either NPPD or IRAD to succeed was drastically decreased, even for complexes considered as “rigid” [65]. Further analysis indicated that some of these “rigid” complexes had a particularly small interface and hence might be difficult to sample and predict [77]. Since the best current scoring functions all performed similarly (Figure 3), we speculate that the same two factors, conformational change and insufficient sampling of good poses, also limit the success of other PPD methods. Note that while the sampling of good poses among different complexes was unbalanced, the distribution of the attributes used by NPPD was not (Figure 4), suggesting that sampling bias would not significantly affect training of the Bayesian model. While it is not entirely clear to us what gave rise to the apparently poor correlation between the number of good poses sampled and unbound/bound IRMSD as observed in Figure 4, it is notable that NPPD was better than IRAD for a few of those with the smallest unbound/bound IRMSDs and poor sampling, whereas IRAD did much better than NPPD for those ranked next in unbound/bound IRMSD (roughly between complex 1PPE and 2QFW in Figure 4), thereby contributing partly to the high complementarity between the two methods (Table 1). Taken all these results together, we can conclude that while it is still likely to significantly improve PPD performance by combining all the different scoring functions, the main barriers to overcome remain those arising from sampling and conformational change.

Bottom Line: Here, we report a network-based PPD scoring function, the NPPD, in which the network consists of two types of network nodes, one for hydrophobic and the other for hydrophilic amino acid residues, and the nodes are connected when the residues they represent are within a certain contact distance.We showed that network parameters that compute dyadic interactions and those that compute heterophilic interactions of the amino acid networks thus constructed allowed NPPD to perform well in a benchmark evaluation of 115 PPD scoring functions, most of which, unlike NPPD, are based on some sort of protein-protein interaction energy.We also showed that NPPD was highly complementary to these energy-based scoring functions, suggesting that the combined use of conventional scoring functions and NPPD might significantly improve the accuracy of current PPD predictions.

View Article: PubMed Central - PubMed

Affiliation: Institute of Biomedical Sciences, Academia Sinica, Nankang, Taipei 115, Taiwan. shihds@gate.sinica.edu.tw.

ABSTRACT
Protein-protein docking (PPD) predictions usually rely on the use of a scoring function to rank docking models generated by exhaustive sampling. To rank good models higher than bad ones, a large number of scoring functions have been developed and evaluated, but the methods used for the computation of PPD predictions remain largely unsatisfactory. Here, we report a network-based PPD scoring function, the NPPD, in which the network consists of two types of network nodes, one for hydrophobic and the other for hydrophilic amino acid residues, and the nodes are connected when the residues they represent are within a certain contact distance. We showed that network parameters that compute dyadic interactions and those that compute heterophilic interactions of the amino acid networks thus constructed allowed NPPD to perform well in a benchmark evaluation of 115 PPD scoring functions, most of which, unlike NPPD, are based on some sort of protein-protein interaction energy. We also showed that NPPD was highly complementary to these energy-based scoring functions, suggesting that the combined use of conventional scoring functions and NPPD might significantly improve the accuracy of current PPD predictions.

No MeSH data available.