Limits...
Predicting DNA-binding proteins and binding residues by complex structure prediction and application to human proteome.

Zhao H, Wang J, Zhou Y, Yang Y - PLoS ONE (2014)

Bottom Line: We further found 51% sensitivity for 82 newly determined structures of DNA-binding proteins and 56% sensitivity for the human proteome.In addition, the method provides a reasonably accurate prediction of DNA-binding residues in proteins based on predicted DNA-binding complex structures.Its application to human proteome leads to more than 300 novel DNA-binding proteins; some of these predicted structures were validated by known structures of homologous proteins in APO forms.

View Article: PubMed Central - PubMed

Affiliation: School of Informatics, Indiana University Purdue University, Indianapolis, Indiana, United States of America; Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana, United States of America; QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia.

ABSTRACT
As more and more protein sequences are uncovered from increasingly inexpensive sequencing techniques, an urgent task is to find their functions. This work presents a highly reliable computational technique for predicting DNA-binding function at the level of protein-DNA complex structures, rather than low-resolution two-state prediction of DNA-binding as most existing techniques do. The method first predicts protein-DNA complex structure by utilizing the template-based structure prediction technique HHblits, followed by binding affinity prediction based on a knowledge-based energy function (Distance-scaled finite ideal-gas reference state for protein-DNA interactions). A leave-one-out cross validation of the method based on 179 DNA-binding and 3797 non-binding protein domains achieves a Matthews correlation coefficient (MCC) of 0.77 with high precision (94%) and high sensitivity (65%). We further found 51% sensitivity for 82 newly determined structures of DNA-binding proteins and 56% sensitivity for the human proteome. In addition, the method provides a reasonably accurate prediction of DNA-binding residues in proteins based on predicted DNA-binding complex structures. Its application to human proteome leads to more than 300 novel DNA-binding proteins; some of these predicted structures were validated by known structures of homologous proteins in APO forms. The method [SPOT-Seq (DNA)] is available as an on-line server at http://sparks-lab.org.

Show MeSH
Matthews correlation coefficient for predicted binding residues versus the structural similarity SP-score between predicted and known structures of 116 targets.The correlation coefficient is 0.38.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4008587&req=5

pone-0096694-g002: Matthews correlation coefficient for predicted binding residues versus the structural similarity SP-score between predicted and known structures of 116 targets.The correlation coefficient is 0.38.

Mentions: The quality of predicted binding residues is directly related to the quality of predicted structures as expected. Figure 2 shows the MCC for binding residue prediction as a function of predicted structural accuracy according to structural similarity between predicted and actual structures by SPscore. There is a trend that the higher accuracy for predicted structures, the higher the MCC value is. The correlation coefficient is 0.38. We noticed that there are a few cases of highly accurate structures but with poorly predicted binding regions (low MCC values). In those cases, accurate structures were limited to non-binding regions.


Predicting DNA-binding proteins and binding residues by complex structure prediction and application to human proteome.

Zhao H, Wang J, Zhou Y, Yang Y - PLoS ONE (2014)

Matthews correlation coefficient for predicted binding residues versus the structural similarity SP-score between predicted and known structures of 116 targets.The correlation coefficient is 0.38.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4008587&req=5

pone-0096694-g002: Matthews correlation coefficient for predicted binding residues versus the structural similarity SP-score between predicted and known structures of 116 targets.The correlation coefficient is 0.38.
Mentions: The quality of predicted binding residues is directly related to the quality of predicted structures as expected. Figure 2 shows the MCC for binding residue prediction as a function of predicted structural accuracy according to structural similarity between predicted and actual structures by SPscore. There is a trend that the higher accuracy for predicted structures, the higher the MCC value is. The correlation coefficient is 0.38. We noticed that there are a few cases of highly accurate structures but with poorly predicted binding regions (low MCC values). In those cases, accurate structures were limited to non-binding regions.

Bottom Line: We further found 51% sensitivity for 82 newly determined structures of DNA-binding proteins and 56% sensitivity for the human proteome.In addition, the method provides a reasonably accurate prediction of DNA-binding residues in proteins based on predicted DNA-binding complex structures.Its application to human proteome leads to more than 300 novel DNA-binding proteins; some of these predicted structures were validated by known structures of homologous proteins in APO forms.

View Article: PubMed Central - PubMed

Affiliation: School of Informatics, Indiana University Purdue University, Indianapolis, Indiana, United States of America; Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana, United States of America; QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia.

ABSTRACT
As more and more protein sequences are uncovered from increasingly inexpensive sequencing techniques, an urgent task is to find their functions. This work presents a highly reliable computational technique for predicting DNA-binding function at the level of protein-DNA complex structures, rather than low-resolution two-state prediction of DNA-binding as most existing techniques do. The method first predicts protein-DNA complex structure by utilizing the template-based structure prediction technique HHblits, followed by binding affinity prediction based on a knowledge-based energy function (Distance-scaled finite ideal-gas reference state for protein-DNA interactions). A leave-one-out cross validation of the method based on 179 DNA-binding and 3797 non-binding protein domains achieves a Matthews correlation coefficient (MCC) of 0.77 with high precision (94%) and high sensitivity (65%). We further found 51% sensitivity for 82 newly determined structures of DNA-binding proteins and 56% sensitivity for the human proteome. In addition, the method provides a reasonably accurate prediction of DNA-binding residues in proteins based on predicted DNA-binding complex structures. Its application to human proteome leads to more than 300 novel DNA-binding proteins; some of these predicted structures were validated by known structures of homologous proteins in APO forms. The method [SPOT-Seq (DNA)] is available as an on-line server at http://sparks-lab.org.

Show MeSH