Limits...
CRHunter: integrating multifaceted information to predict catalytic residues in enzymes

View Article: PubMed Central - PubMed

ABSTRACT

A variety of algorithms have been developed for catalytic residue prediction based on either feature- or template-based methodology. However, no studies have systematically compared these two strategies and further considered whether their combination could improve the prediction performance. Herein, we developed an integrative algorithm named CRHunter by simultaneously using the complementarity between feature- and template-based methodologies and that between structural and sequence information. Several novel structural features were generated by the Delaunay triangulation and Laplacian transformation of enzyme structures. Combining these features with traditional descriptors, we invented two support vector machine feature predictors based on both structural and sequence information. Furthermore, we established two template predictors using structure and profile alignments. Evaluated on datasets with different levels of homology, our feature predictors achieve relatively stable performance, whereas our template predictors yield poor results when the homological relationships become weak. Nevertheless, the hybrid algorithm CRHunter consistently achieves optimal performance among all our predictors. We also illustrate that our methodology can be applied to the predicted structures of enzymes. Compared with state-of-the-art methods, CRHunter yields comparable or better performance on various datasets. Finally, the application of this algorithm to structural genomics targets sheds light on solved protein structures with unknown functions.

No MeSH data available.


Distribution of the similarity scores of top-ranked templates for the alternative datasets.These six datasets have different levels of structural homology.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5036049&req=5

f6: Distribution of the similarity scores of top-ranked templates for the alternative datasets.These six datasets have different levels of structural homology.

Mentions: We checked our approach using the CSA223 dataset without redundancy at the sequence level. Furthermore, we evaluated our predictors by conducting 10-fold cross-validation on six datasets with different levels of structural homology. Focusing on the three datasets (EF series) collected by Youn et al.13, we can observe that our feature predictors achieve relatively stable performances and yield AUCs of greater than 0.915 (Table 2). As shown in Fig. 6, the number of enzymes that can retrieve an effective template decreases remarkably as the homological relationships between the entries in the dataset become weaker. Accordingly, the template methods yield poor results for the EF_superfamily and EF_fold datasets, which clearly indicates the weakness of template-based prediction when effective templates are lacking. As expected, StrHunter and SeqHunter do not provide better performance for these two datasets. In contrast, an obvious improvement is observed for the EF_family dataset due to the contribution from our template approaches. For the remaining three datasets, our template predictors retrieve reliable templates for a small group of all queries and correctly predict their catalytic residues, resulting in a slight improvement in the AUCs of StrHunter and SeqHunter compared to the feature-based methods. These results suggest that our algorithm has a strong adaptive capacity to the template quality of query proteins. More importantly, CRHunter continues to show optimal performance among all our predictors for different datasets and yields AUCs ranging from 0.926 to 0.952. Therefore, CRHunter is an intelligent prediction system that can automatically exploit the advantages of individual predictors.


CRHunter: integrating multifaceted information to predict catalytic residues in enzymes
Distribution of the similarity scores of top-ranked templates for the alternative datasets.These six datasets have different levels of structural homology.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5036049&req=5

f6: Distribution of the similarity scores of top-ranked templates for the alternative datasets.These six datasets have different levels of structural homology.
Mentions: We checked our approach using the CSA223 dataset without redundancy at the sequence level. Furthermore, we evaluated our predictors by conducting 10-fold cross-validation on six datasets with different levels of structural homology. Focusing on the three datasets (EF series) collected by Youn et al.13, we can observe that our feature predictors achieve relatively stable performances and yield AUCs of greater than 0.915 (Table 2). As shown in Fig. 6, the number of enzymes that can retrieve an effective template decreases remarkably as the homological relationships between the entries in the dataset become weaker. Accordingly, the template methods yield poor results for the EF_superfamily and EF_fold datasets, which clearly indicates the weakness of template-based prediction when effective templates are lacking. As expected, StrHunter and SeqHunter do not provide better performance for these two datasets. In contrast, an obvious improvement is observed for the EF_family dataset due to the contribution from our template approaches. For the remaining three datasets, our template predictors retrieve reliable templates for a small group of all queries and correctly predict their catalytic residues, resulting in a slight improvement in the AUCs of StrHunter and SeqHunter compared to the feature-based methods. These results suggest that our algorithm has a strong adaptive capacity to the template quality of query proteins. More importantly, CRHunter continues to show optimal performance among all our predictors for different datasets and yields AUCs ranging from 0.926 to 0.952. Therefore, CRHunter is an intelligent prediction system that can automatically exploit the advantages of individual predictors.

View Article: PubMed Central - PubMed

ABSTRACT

A variety of algorithms have been developed for catalytic residue prediction based on either feature- or template-based methodology. However, no studies have systematically compared these two strategies and further considered whether their combination could improve the prediction performance. Herein, we developed an integrative algorithm named CRHunter by simultaneously using the complementarity between feature- and template-based methodologies and that between structural and sequence information. Several novel structural features were generated by the Delaunay triangulation and Laplacian transformation of enzyme structures. Combining these features with traditional descriptors, we invented two support vector machine feature predictors based on both structural and sequence information. Furthermore, we established two template predictors using structure and profile alignments. Evaluated on datasets with different levels of homology, our feature predictors achieve relatively stable performance, whereas our template predictors yield poor results when the homological relationships become weak. Nevertheless, the hybrid algorithm CRHunter consistently achieves optimal performance among all our predictors. We also illustrate that our methodology can be applied to the predicted structures of enzymes. Compared with state-of-the-art methods, CRHunter yields comparable or better performance on various datasets. Finally, the application of this algorithm to structural genomics targets sheds light on solved protein structures with unknown functions.

No MeSH data available.