Limits...
CRHunter: integrating multifaceted information to predict catalytic residues in enzymes

View Article: PubMed Central - PubMed

ABSTRACT

A variety of algorithms have been developed for catalytic residue prediction based on either feature- or template-based methodology. However, no studies have systematically compared these two strategies and further considered whether their combination could improve the prediction performance. Herein, we developed an integrative algorithm named CRHunter by simultaneously using the complementarity between feature- and template-based methodologies and that between structural and sequence information. Several novel structural features were generated by the Delaunay triangulation and Laplacian transformation of enzyme structures. Combining these features with traditional descriptors, we invented two support vector machine feature predictors based on both structural and sequence information. Furthermore, we established two template predictors using structure and profile alignments. Evaluated on datasets with different levels of homology, our feature predictors achieve relatively stable performance, whereas our template predictors yield poor results when the homological relationships become weak. Nevertheless, the hybrid algorithm CRHunter consistently achieves optimal performance among all our predictors. We also illustrate that our methodology can be applied to the predicted structures of enzymes. Compared with state-of-the-art methods, CRHunter yields comparable or better performance on various datasets. Finally, the application of this algorithm to structural genomics targets sheds light on solved protein structures with unknown functions.

No MeSH data available.


Prediction results obtained for the BioH protein (PDB ID: 1M33_A) by our different predictors.The positions in red in the bars denote the predicted catalytic residues.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5036049&req=5

f7: Prediction results obtained for the BioH protein (PDB ID: 1M33_A) by our different predictors.The positions in red in the bars denote the predicted catalytic residues.

Mentions: Finally, we applied our approach to the SG2332 dataset comprising 2332 protein structures with unknown functions. Note that all of the predictors were trained on CSA223, and only three entries shared more than 30% sequence identity with CSA223. As shown in Supplementary Fig. S6, we achieve at least one positive prediction in 1704 protein structures, which include 6746 putative catalytic residues. We also notice that the distribution of putative catalytic residues in SG2332 strongly correlates with the distribution of validated catalytic residues in CSA223 (Pearson’s correlation coefficient = 0.958), indicating that our predictions are generally reliable. To further show the power of our method, we selected the BioH protein (PDB ID: 1M33_A) as a representative structure. The original reference annotated a putative catalytic triad (Ser82, His235, and Asp207) in BioH by aligning this protein against active site templates with TESS and experimentally validated that Ser82 probably plays an important role in the enzymatic activity31. As revealed in Fig. 7, our component predictors all output several positive predictions in BioH, which generally cover the potential catalytic triad. Both StrTemplate and SeqTemplate retrieve the same template for BioH (SCOP ID: d1ehya_), which has high structural but low sequence similarity with this query (SPscore = 0.97 and sequence identity = 21%). Through merging the outputs of different predictors, CRHunter eliminates the potential false positives and returns five possible catalytic residues (Trp22, Ser82, Leu83, Asp207, and His235). The possible catalytic functions of Trp22 and Leu83 are especially worthy of further study. The precompiled results for SG2332 are provided in the dataset page of our server.


CRHunter: integrating multifaceted information to predict catalytic residues in enzymes
Prediction results obtained for the BioH protein (PDB ID: 1M33_A) by our different predictors.The positions in red in the bars denote the predicted catalytic residues.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5036049&req=5

f7: Prediction results obtained for the BioH protein (PDB ID: 1M33_A) by our different predictors.The positions in red in the bars denote the predicted catalytic residues.
Mentions: Finally, we applied our approach to the SG2332 dataset comprising 2332 protein structures with unknown functions. Note that all of the predictors were trained on CSA223, and only three entries shared more than 30% sequence identity with CSA223. As shown in Supplementary Fig. S6, we achieve at least one positive prediction in 1704 protein structures, which include 6746 putative catalytic residues. We also notice that the distribution of putative catalytic residues in SG2332 strongly correlates with the distribution of validated catalytic residues in CSA223 (Pearson’s correlation coefficient = 0.958), indicating that our predictions are generally reliable. To further show the power of our method, we selected the BioH protein (PDB ID: 1M33_A) as a representative structure. The original reference annotated a putative catalytic triad (Ser82, His235, and Asp207) in BioH by aligning this protein against active site templates with TESS and experimentally validated that Ser82 probably plays an important role in the enzymatic activity31. As revealed in Fig. 7, our component predictors all output several positive predictions in BioH, which generally cover the potential catalytic triad. Both StrTemplate and SeqTemplate retrieve the same template for BioH (SCOP ID: d1ehya_), which has high structural but low sequence similarity with this query (SPscore = 0.97 and sequence identity = 21%). Through merging the outputs of different predictors, CRHunter eliminates the potential false positives and returns five possible catalytic residues (Trp22, Ser82, Leu83, Asp207, and His235). The possible catalytic functions of Trp22 and Leu83 are especially worthy of further study. The precompiled results for SG2332 are provided in the dataset page of our server.

View Article: PubMed Central - PubMed

ABSTRACT

A variety of algorithms have been developed for catalytic residue prediction based on either feature- or template-based methodology. However, no studies have systematically compared these two strategies and further considered whether their combination could improve the prediction performance. Herein, we developed an integrative algorithm named CRHunter by simultaneously using the complementarity between feature- and template-based methodologies and that between structural and sequence information. Several novel structural features were generated by the Delaunay triangulation and Laplacian transformation of enzyme structures. Combining these features with traditional descriptors, we invented two support vector machine feature predictors based on both structural and sequence information. Furthermore, we established two template predictors using structure and profile alignments. Evaluated on datasets with different levels of homology, our feature predictors achieve relatively stable performance, whereas our template predictors yield poor results when the homological relationships become weak. Nevertheless, the hybrid algorithm CRHunter consistently achieves optimal performance among all our predictors. We also illustrate that our methodology can be applied to the predicted structures of enzymes. Compared with state-of-the-art methods, CRHunter yields comparable or better performance on various datasets. Finally, the application of this algorithm to structural genomics targets sheds light on solved protein structures with unknown functions.

No MeSH data available.