Limits...
CRHunter: integrating multifaceted information to predict catalytic residues in enzymes

View Article: PubMed Central - PubMed

ABSTRACT

A variety of algorithms have been developed for catalytic residue prediction based on either feature- or template-based methodology. However, no studies have systematically compared these two strategies and further considered whether their combination could improve the prediction performance. Herein, we developed an integrative algorithm named CRHunter by simultaneously using the complementarity between feature- and template-based methodologies and that between structural and sequence information. Several novel structural features were generated by the Delaunay triangulation and Laplacian transformation of enzyme structures. Combining these features with traditional descriptors, we invented two support vector machine feature predictors based on both structural and sequence information. Furthermore, we established two template predictors using structure and profile alignments. Evaluated on datasets with different levels of homology, our feature predictors achieve relatively stable performance, whereas our template predictors yield poor results when the homological relationships become weak. Nevertheless, the hybrid algorithm CRHunter consistently achieves optimal performance among all our predictors. We also illustrate that our methodology can be applied to the predicted structures of enzymes. Compared with state-of-the-art methods, CRHunter yields comparable or better performance on various datasets. Finally, the application of this algorithm to structural genomics targets sheds light on solved protein structures with unknown functions.

No MeSH data available.


Similarity scores of the optimal templates for the primary dataset.(A) Distribution of SPscores and HHscores for CSA223. (B) Comparison of SPscores and HHscores for CSA223. The red dot suggests that our structural and sequence template predictors detect the same template for the query protein.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5036049&req=5

f5: Similarity scores of the optimal templates for the primary dataset.(A) Distribution of SPscores and HHscores for CSA223. (B) Comparison of SPscores and HHscores for CSA223. The red dot suggests that our structural and sequence template predictors detect the same template for the query protein.

Mentions: Most previous algorithms have used feature-based strategies to recognize catalytic residues. Herein, we developed a structural template predictor using SPalign and a sequence template predictor using HHblits, both of which were also tested on CSA223. The SPscore and HHscore distributions of the best templates are shown in Fig. 5A. For the structural method, 111 (49.7%) queries can achieve a reliable template with an SP-score of greater than 0.6, implying that the enzyme and its template probably share the same SCOP fold. In contrast, the sequence method exhibits a disjunctive distribution, in which 93 (41.7%) queries retrieve a good template, with HHscores of greater than 0.9. It is further shown in Fig. 5B that these two methods detect the same template for 64 (28.7%) proteins, most of which highly resemble the template in terms of both their structural and profile perspectives. After achieving the best template, the putative catalytic residues of each query can be annotated based on experimentally verified residues. As shown in Table 1, StrTemplate yields an F1-score of 0.277 and MCC of 0.274, whereas SeqTemplate achieves an F1-score of 0.292 and MCC of 0.314. These results confirm that both structural and sequence template predictors can recognize catalytic residues. More interestingly, the sequence predictor yields even better performance than the structural predictor, suggesting that sequence profile similarity is a powerful indicator to find remote templates of enzymes for which no structures are available.


CRHunter: integrating multifaceted information to predict catalytic residues in enzymes
Similarity scores of the optimal templates for the primary dataset.(A) Distribution of SPscores and HHscores for CSA223. (B) Comparison of SPscores and HHscores for CSA223. The red dot suggests that our structural and sequence template predictors detect the same template for the query protein.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5036049&req=5

f5: Similarity scores of the optimal templates for the primary dataset.(A) Distribution of SPscores and HHscores for CSA223. (B) Comparison of SPscores and HHscores for CSA223. The red dot suggests that our structural and sequence template predictors detect the same template for the query protein.
Mentions: Most previous algorithms have used feature-based strategies to recognize catalytic residues. Herein, we developed a structural template predictor using SPalign and a sequence template predictor using HHblits, both of which were also tested on CSA223. The SPscore and HHscore distributions of the best templates are shown in Fig. 5A. For the structural method, 111 (49.7%) queries can achieve a reliable template with an SP-score of greater than 0.6, implying that the enzyme and its template probably share the same SCOP fold. In contrast, the sequence method exhibits a disjunctive distribution, in which 93 (41.7%) queries retrieve a good template, with HHscores of greater than 0.9. It is further shown in Fig. 5B that these two methods detect the same template for 64 (28.7%) proteins, most of which highly resemble the template in terms of both their structural and profile perspectives. After achieving the best template, the putative catalytic residues of each query can be annotated based on experimentally verified residues. As shown in Table 1, StrTemplate yields an F1-score of 0.277 and MCC of 0.274, whereas SeqTemplate achieves an F1-score of 0.292 and MCC of 0.314. These results confirm that both structural and sequence template predictors can recognize catalytic residues. More interestingly, the sequence predictor yields even better performance than the structural predictor, suggesting that sequence profile similarity is a powerful indicator to find remote templates of enzymes for which no structures are available.

View Article: PubMed Central - PubMed

ABSTRACT

A variety of algorithms have been developed for catalytic residue prediction based on either feature- or template-based methodology. However, no studies have systematically compared these two strategies and further considered whether their combination could improve the prediction performance. Herein, we developed an integrative algorithm named CRHunter by simultaneously using the complementarity between feature- and template-based methodologies and that between structural and sequence information. Several novel structural features were generated by the Delaunay triangulation and Laplacian transformation of enzyme structures. Combining these features with traditional descriptors, we invented two support vector machine feature predictors based on both structural and sequence information. Furthermore, we established two template predictors using structure and profile alignments. Evaluated on datasets with different levels of homology, our feature predictors achieve relatively stable performance, whereas our template predictors yield poor results when the homological relationships become weak. Nevertheless, the hybrid algorithm CRHunter consistently achieves optimal performance among all our predictors. We also illustrate that our methodology can be applied to the predicted structures of enzymes. Compared with state-of-the-art methods, CRHunter yields comparable or better performance on various datasets. Finally, the application of this algorithm to structural genomics targets sheds light on solved protein structures with unknown functions.

No MeSH data available.