Limits...
CRHunter: integrating multifaceted information to predict catalytic residues in enzymes

View Article: PubMed Central - PubMed

ABSTRACT

A variety of algorithms have been developed for catalytic residue prediction based on either feature- or template-based methodology. However, no studies have systematically compared these two strategies and further considered whether their combination could improve the prediction performance. Herein, we developed an integrative algorithm named CRHunter by simultaneously using the complementarity between feature- and template-based methodologies and that between structural and sequence information. Several novel structural features were generated by the Delaunay triangulation and Laplacian transformation of enzyme structures. Combining these features with traditional descriptors, we invented two support vector machine feature predictors based on both structural and sequence information. Furthermore, we established two template predictors using structure and profile alignments. Evaluated on datasets with different levels of homology, our feature predictors achieve relatively stable performance, whereas our template predictors yield poor results when the homological relationships become weak. Nevertheless, the hybrid algorithm CRHunter consistently achieves optimal performance among all our predictors. We also illustrate that our methodology can be applied to the predicted structures of enzymes. Compared with state-of-the-art methods, CRHunter yields comparable or better performance on various datasets. Finally, the application of this algorithm to structural genomics targets sheds light on solved protein structures with unknown functions.

No MeSH data available.


Statistical analysis of novel structural features.(A) Weight coefficients of residue pairs in the DT-based microenvironment of catalytic residues. Catalytic residues are sorted according to the percentages of different residue types. (B) A comparison of the distribution of DT-based features for catalytic and non-catalytic residues. ME: microenvironment score, DG: degree, CL: closeness, BW: betweenness, and CC: clustering coefficient. (C) A comparison of the distribution of LN-based geometric features for catalytic and non-catalytic residues. (D) Characterization of an enzyme structure (SCOP ID: d1qq5a_) by LNs at different scales. The positions marked in orange in the grey bar denote validated catalytic residues.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5036049&req=5

f3: Statistical analysis of novel structural features.(A) Weight coefficients of residue pairs in the DT-based microenvironment of catalytic residues. Catalytic residues are sorted according to the percentages of different residue types. (B) A comparison of the distribution of DT-based features for catalytic and non-catalytic residues. ME: microenvironment score, DG: degree, CL: closeness, BW: betweenness, and CC: clustering coefficient. (C) A comparison of the distribution of LN-based geometric features for catalytic and non-catalytic residues. (D) Characterization of an enzyme structure (SCOP ID: d1qq5a_) by LNs at different scales. The positions marked in orange in the grey bar denote validated catalytic residues.

Mentions: To obtain the DT-based microenvironment score (MEscoreDT), we first computed an overall weight vector WDT based on the CSA223 dataset18, which can be transformed into a 20 × 20 matrix. As shown in Fig. 3A, catalytic residues are enriched in charged and hydrophilic residues (H, D, R, E, K, C, and Y) but are depleted in hydrophobic residues. Our cluster analysis shows that neighboring residues have different preferences in the DT-based microenvironment. For instance, the neighbors of dominant catalytic residue types generally have greater weights and the neighborhoods of these residue types share more similar patterns. Han et al.18 recently utilized the distance-based criterion to generate the residue neighborhood. A high correlation exists between their weight vector and ours (Pearson’s correlation coefficient = 0.940). We further calculated the value of MEscoreDT for each residue in CSA223. Figure 3B shows that catalytic residues have significantly higher microenvironment scores than non-catalytic residues (p-value = 9.8E-123), implying that MEscoreDT can serve as an effective feature.


CRHunter: integrating multifaceted information to predict catalytic residues in enzymes
Statistical analysis of novel structural features.(A) Weight coefficients of residue pairs in the DT-based microenvironment of catalytic residues. Catalytic residues are sorted according to the percentages of different residue types. (B) A comparison of the distribution of DT-based features for catalytic and non-catalytic residues. ME: microenvironment score, DG: degree, CL: closeness, BW: betweenness, and CC: clustering coefficient. (C) A comparison of the distribution of LN-based geometric features for catalytic and non-catalytic residues. (D) Characterization of an enzyme structure (SCOP ID: d1qq5a_) by LNs at different scales. The positions marked in orange in the grey bar denote validated catalytic residues.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5036049&req=5

f3: Statistical analysis of novel structural features.(A) Weight coefficients of residue pairs in the DT-based microenvironment of catalytic residues. Catalytic residues are sorted according to the percentages of different residue types. (B) A comparison of the distribution of DT-based features for catalytic and non-catalytic residues. ME: microenvironment score, DG: degree, CL: closeness, BW: betweenness, and CC: clustering coefficient. (C) A comparison of the distribution of LN-based geometric features for catalytic and non-catalytic residues. (D) Characterization of an enzyme structure (SCOP ID: d1qq5a_) by LNs at different scales. The positions marked in orange in the grey bar denote validated catalytic residues.
Mentions: To obtain the DT-based microenvironment score (MEscoreDT), we first computed an overall weight vector WDT based on the CSA223 dataset18, which can be transformed into a 20 × 20 matrix. As shown in Fig. 3A, catalytic residues are enriched in charged and hydrophilic residues (H, D, R, E, K, C, and Y) but are depleted in hydrophobic residues. Our cluster analysis shows that neighboring residues have different preferences in the DT-based microenvironment. For instance, the neighbors of dominant catalytic residue types generally have greater weights and the neighborhoods of these residue types share more similar patterns. Han et al.18 recently utilized the distance-based criterion to generate the residue neighborhood. A high correlation exists between their weight vector and ours (Pearson’s correlation coefficient = 0.940). We further calculated the value of MEscoreDT for each residue in CSA223. Figure 3B shows that catalytic residues have significantly higher microenvironment scores than non-catalytic residues (p-value = 9.8E-123), implying that MEscoreDT can serve as an effective feature.

View Article: PubMed Central - PubMed

ABSTRACT

A variety of algorithms have been developed for catalytic residue prediction based on either feature- or template-based methodology. However, no studies have systematically compared these two strategies and further considered whether their combination could improve the prediction performance. Herein, we developed an integrative algorithm named CRHunter by simultaneously using the complementarity between feature- and template-based methodologies and that between structural and sequence information. Several novel structural features were generated by the Delaunay triangulation and Laplacian transformation of enzyme structures. Combining these features with traditional descriptors, we invented two support vector machine feature predictors based on both structural and sequence information. Furthermore, we established two template predictors using structure and profile alignments. Evaluated on datasets with different levels of homology, our feature predictors achieve relatively stable performance, whereas our template predictors yield poor results when the homological relationships become weak. Nevertheless, the hybrid algorithm CRHunter consistently achieves optimal performance among all our predictors. We also illustrate that our methodology can be applied to the predicted structures of enzymes. Compared with state-of-the-art methods, CRHunter yields comparable or better performance on various datasets. Finally, the application of this algorithm to structural genomics targets sheds light on solved protein structures with unknown functions.

No MeSH data available.