Limits...
Phosphate binding sites identification in protein structures.

Parca L, Gherardini PF, Helmer-Citterich M, Ausiello G - Nucleic Acids Res. (2010)

Bottom Line: Pfinder has been tested on a data set of 52 proteins for which both the apo and holo forms were available.We obtained at least one correct prediction in 63% of the holo structures and in 62% of the apo.The ability of Pfinder to recognize a phosphate-binding site in unbound protein structures makes it an ideal tool for functional annotation and for complementing docking and drug design methods.

View Article: PubMed Central - PubMed

Affiliation: Department of Biology, Centre for Molecular Bioinformatics, University of Rome Tor Vergata, Via della Ricerca Scientifica snc, 00133 Rome, Italy.

ABSTRACT
Nearly half of known protein structures interact with phosphate-containing ligands, such as nucleotides and other cofactors. Many methods have been developed for the identification of metal ions-binding sites and some for bigger ligands such as carbohydrates, but none is yet available for the prediction of phosphate-binding sites. Here we describe Pfinder, a method that predicts binding sites for phosphate groups, both in the form of ions or as parts of other non-peptide ligands, in proteins of known structure. Pfinder uses the Query3D local structural comparison algorithm to scan a protein structure for the presence of a number of structural motifs identified for their ability to bind the phosphate chemical group. Pfinder has been tested on a data set of 52 proteins for which both the apo and holo forms were available. We obtained at least one correct prediction in 63% of the holo structures and in 62% of the apo. The ability of Pfinder to recognize a phosphate-binding site in unbound protein structures makes it an ideal tool for functional annotation and for complementing docking and drug design methods. The Pfinder program is available at http://pdbfun.uniroma2.it/pfinder.

Show MeSH
Results obtained on the protein structures of the training set. Each bar in the graphs represents the results for a different combination of RMSD and substitution parameters used. The RMSD threshold is reported on the X-axis, while different colors show the BLOSUM62 threshold. (A) Percentage of analyzed structures having at least one correctly predicted PbS. (B) Average number of FP predictions produced per structure. (C) Matthews Correlation Coefficient (MCC). (D) Final score. The score is the fraction of identified proteins divided by the average number of FP predictions per structure.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3045618&req=5

Figure 2: Results obtained on the protein structures of the training set. Each bar in the graphs represents the results for a different combination of RMSD and substitution parameters used. The RMSD threshold is reported on the X-axis, while different colors show the BLOSUM62 threshold. (A) Percentage of analyzed structures having at least one correctly predicted PbS. (B) Average number of FP predictions produced per structure. (C) Matthews Correlation Coefficient (MCC). (D) Final score. The score is the fraction of identified proteins divided by the average number of FP predictions per structure.

Mentions: Figure 2 shows the complete results for all the combinations of parameters we tested. Using the less stringent RMSD and substitution matrix threshold values (1.1 Å and −1, respectively) the method was able to correctly identify at least one TP phosphate group in 51 of the 59 structures of the training set. The conservation threshold, with the highest MCC for this parameters combination, was 74.3 and produced an average of 30.7 ± 2.9 FP per structure. Using the most stringent parameter values (0.7 Å and 1) and a conservation threshold of 70 the method produced a much lower number of FP predictions (1.0 ± 0.1 per structure) but the number of protein structures of the training set without any correct prediction raises from 1 to 26 out of 59. We determined the set of parameters that results in the maximum percentage of protein structures with at least one correctly predicted phosphate group, and the lowest average value of FP predictions. Figure 2D shows the optimized score for all the parameters combinations tested. The best performance was obtained with a RMSD of 0.7 Å and a residue substitution threshold of 1. Although this set of parameters has the highest score, the number of proteins without at least one correct prediction is too high (44%) to make this parameters combination usable. The second best performance value was obtained with both the parameters 0.7 (RMSD)/0 (substitution) and 0.9/1. However the latter parameters resulted in a higher MCC (0.39 versus 0.35). The 0.9/1 combination allows the method to identify at least one correct prediction in 69% of the proteins, with a conservation threshold of 66, an average FP predictions number of 3.7 ± 0.4 and a high conservation threshold area under curve (AUC) value (0.81). The TP predictions (i.e. the ones closest to a crystallized phosphate group) made on the 59 proteins of the training set are evenly spread in a 5 Å radius from the crystallized phosphate group. The distribution of the distances between the crystallographic positions and the best predictions in the training set is reported in Figure 3.Figure 2.


Phosphate binding sites identification in protein structures.

Parca L, Gherardini PF, Helmer-Citterich M, Ausiello G - Nucleic Acids Res. (2010)

Results obtained on the protein structures of the training set. Each bar in the graphs represents the results for a different combination of RMSD and substitution parameters used. The RMSD threshold is reported on the X-axis, while different colors show the BLOSUM62 threshold. (A) Percentage of analyzed structures having at least one correctly predicted PbS. (B) Average number of FP predictions produced per structure. (C) Matthews Correlation Coefficient (MCC). (D) Final score. The score is the fraction of identified proteins divided by the average number of FP predictions per structure.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3045618&req=5

Figure 2: Results obtained on the protein structures of the training set. Each bar in the graphs represents the results for a different combination of RMSD and substitution parameters used. The RMSD threshold is reported on the X-axis, while different colors show the BLOSUM62 threshold. (A) Percentage of analyzed structures having at least one correctly predicted PbS. (B) Average number of FP predictions produced per structure. (C) Matthews Correlation Coefficient (MCC). (D) Final score. The score is the fraction of identified proteins divided by the average number of FP predictions per structure.
Mentions: Figure 2 shows the complete results for all the combinations of parameters we tested. Using the less stringent RMSD and substitution matrix threshold values (1.1 Å and −1, respectively) the method was able to correctly identify at least one TP phosphate group in 51 of the 59 structures of the training set. The conservation threshold, with the highest MCC for this parameters combination, was 74.3 and produced an average of 30.7 ± 2.9 FP per structure. Using the most stringent parameter values (0.7 Å and 1) and a conservation threshold of 70 the method produced a much lower number of FP predictions (1.0 ± 0.1 per structure) but the number of protein structures of the training set without any correct prediction raises from 1 to 26 out of 59. We determined the set of parameters that results in the maximum percentage of protein structures with at least one correctly predicted phosphate group, and the lowest average value of FP predictions. Figure 2D shows the optimized score for all the parameters combinations tested. The best performance was obtained with a RMSD of 0.7 Å and a residue substitution threshold of 1. Although this set of parameters has the highest score, the number of proteins without at least one correct prediction is too high (44%) to make this parameters combination usable. The second best performance value was obtained with both the parameters 0.7 (RMSD)/0 (substitution) and 0.9/1. However the latter parameters resulted in a higher MCC (0.39 versus 0.35). The 0.9/1 combination allows the method to identify at least one correct prediction in 69% of the proteins, with a conservation threshold of 66, an average FP predictions number of 3.7 ± 0.4 and a high conservation threshold area under curve (AUC) value (0.81). The TP predictions (i.e. the ones closest to a crystallized phosphate group) made on the 59 proteins of the training set are evenly spread in a 5 Å radius from the crystallized phosphate group. The distribution of the distances between the crystallographic positions and the best predictions in the training set is reported in Figure 3.Figure 2.

Bottom Line: Pfinder has been tested on a data set of 52 proteins for which both the apo and holo forms were available.We obtained at least one correct prediction in 63% of the holo structures and in 62% of the apo.The ability of Pfinder to recognize a phosphate-binding site in unbound protein structures makes it an ideal tool for functional annotation and for complementing docking and drug design methods.

View Article: PubMed Central - PubMed

Affiliation: Department of Biology, Centre for Molecular Bioinformatics, University of Rome Tor Vergata, Via della Ricerca Scientifica snc, 00133 Rome, Italy.

ABSTRACT
Nearly half of known protein structures interact with phosphate-containing ligands, such as nucleotides and other cofactors. Many methods have been developed for the identification of metal ions-binding sites and some for bigger ligands such as carbohydrates, but none is yet available for the prediction of phosphate-binding sites. Here we describe Pfinder, a method that predicts binding sites for phosphate groups, both in the form of ions or as parts of other non-peptide ligands, in proteins of known structure. Pfinder uses the Query3D local structural comparison algorithm to scan a protein structure for the presence of a number of structural motifs identified for their ability to bind the phosphate chemical group. Pfinder has been tested on a data set of 52 proteins for which both the apo and holo forms were available. We obtained at least one correct prediction in 63% of the holo structures and in 62% of the apo. The ability of Pfinder to recognize a phosphate-binding site in unbound protein structures makes it an ideal tool for functional annotation and for complementing docking and drug design methods. The Pfinder program is available at http://pdbfun.uniroma2.it/pfinder.

Show MeSH