Limits...
CRHunter: integrating multifaceted information to predict catalytic residues in enzymes

View Article: PubMed Central - PubMed

ABSTRACT

A variety of algorithms have been developed for catalytic residue prediction based on either feature- or template-based methodology. However, no studies have systematically compared these two strategies and further considered whether their combination could improve the prediction performance. Herein, we developed an integrative algorithm named CRHunter by simultaneously using the complementarity between feature- and template-based methodologies and that between structural and sequence information. Several novel structural features were generated by the Delaunay triangulation and Laplacian transformation of enzyme structures. Combining these features with traditional descriptors, we invented two support vector machine feature predictors based on both structural and sequence information. Furthermore, we established two template predictors using structure and profile alignments. Evaluated on datasets with different levels of homology, our feature predictors achieve relatively stable performance, whereas our template predictors yield poor results when the homological relationships become weak. Nevertheless, the hybrid algorithm CRHunter consistently achieves optimal performance among all our predictors. We also illustrate that our methodology can be applied to the predicted structures of enzymes. Compared with state-of-the-art methods, CRHunter yields comparable or better performance on various datasets. Finally, the application of this algorithm to structural genomics targets sheds light on solved protein structures with unknown functions.

No MeSH data available.


Related in: MedlinePlus

Performance of single attributes evaluated on CSA223.PSSM: position-specific scoring matrix, CL: closeness, ME: microenvironment score, LN: Laplacian norm, BW: betweenness, SA: solvent accessibility, DG: degree, PK: pocket, CC: clustering coefficient, CX: protrusion index, HB: hydrogen bonds, BF: B-factor, DPX: depth index, SS: secondary structure, RE: relative entropy, JSD: Jensen-Shannon divergence score, SE: Shannon entropy, VNE: von Neumann entropy, PE: property entropy, PS: predicted structural features, PP: physicochemical properties, CP: catalytic residue propensity, RT: residue type, SP: sequential position, LT: length, and AAC: amino acid composition.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5036049&req=5

f4: Performance of single attributes evaluated on CSA223.PSSM: position-specific scoring matrix, CL: closeness, ME: microenvironment score, LN: Laplacian norm, BW: betweenness, SA: solvent accessibility, DG: degree, PK: pocket, CC: clustering coefficient, CX: protrusion index, HB: hydrogen bonds, BF: B-factor, DPX: depth index, SS: secondary structure, RE: relative entropy, JSD: Jensen-Shannon divergence score, SE: Shannon entropy, VNE: von Neumann entropy, PE: property entropy, PS: predicted structural features, PP: physicochemical properties, CP: catalytic residue propensity, RT: residue type, SP: sequential position, LT: length, and AAC: amino acid composition.

Mentions: In this section, we first built a group of SVM-based predictors using each single feature coupled with the structural or sequence microenvironment and evaluated these predictors using 5-fold cross-validation on the CSA223 dataset. Figure 4 shows that the PSSM feature achieves the best performance among the structure- and sequence-based predictors, yielding AUCs of 0.919 and 0.920, respectively. At the sequence level, the well-defined residue conservation scores provide competitive performances compared to PSSM. Although our novel descriptors are not as useful as evolutionary conservation features, they rank at the top of all structural features. For instance, the AUCs of DT-based closeness (CL), MEscoreDT (ME), and LN on the global scale (LN(1)) are 0.840, 0.833, and 0.802, respectively. The AUCs of the remaining structural and sequence descriptors generally range from 0.55 to 0.85, indicating that conventional features can detect catalytic signatures at different levels. In addition, we compared DT-based features with their distance-based counterparts. Supplementary Fig. S4 shows that the DT-based method achieves greater performance for degree and closeness measures but slightly weaker performance for the other measures. Overall, DT-based features can thus be considered as alternatives to distance-based features.


CRHunter: integrating multifaceted information to predict catalytic residues in enzymes
Performance of single attributes evaluated on CSA223.PSSM: position-specific scoring matrix, CL: closeness, ME: microenvironment score, LN: Laplacian norm, BW: betweenness, SA: solvent accessibility, DG: degree, PK: pocket, CC: clustering coefficient, CX: protrusion index, HB: hydrogen bonds, BF: B-factor, DPX: depth index, SS: secondary structure, RE: relative entropy, JSD: Jensen-Shannon divergence score, SE: Shannon entropy, VNE: von Neumann entropy, PE: property entropy, PS: predicted structural features, PP: physicochemical properties, CP: catalytic residue propensity, RT: residue type, SP: sequential position, LT: length, and AAC: amino acid composition.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5036049&req=5

f4: Performance of single attributes evaluated on CSA223.PSSM: position-specific scoring matrix, CL: closeness, ME: microenvironment score, LN: Laplacian norm, BW: betweenness, SA: solvent accessibility, DG: degree, PK: pocket, CC: clustering coefficient, CX: protrusion index, HB: hydrogen bonds, BF: B-factor, DPX: depth index, SS: secondary structure, RE: relative entropy, JSD: Jensen-Shannon divergence score, SE: Shannon entropy, VNE: von Neumann entropy, PE: property entropy, PS: predicted structural features, PP: physicochemical properties, CP: catalytic residue propensity, RT: residue type, SP: sequential position, LT: length, and AAC: amino acid composition.
Mentions: In this section, we first built a group of SVM-based predictors using each single feature coupled with the structural or sequence microenvironment and evaluated these predictors using 5-fold cross-validation on the CSA223 dataset. Figure 4 shows that the PSSM feature achieves the best performance among the structure- and sequence-based predictors, yielding AUCs of 0.919 and 0.920, respectively. At the sequence level, the well-defined residue conservation scores provide competitive performances compared to PSSM. Although our novel descriptors are not as useful as evolutionary conservation features, they rank at the top of all structural features. For instance, the AUCs of DT-based closeness (CL), MEscoreDT (ME), and LN on the global scale (LN(1)) are 0.840, 0.833, and 0.802, respectively. The AUCs of the remaining structural and sequence descriptors generally range from 0.55 to 0.85, indicating that conventional features can detect catalytic signatures at different levels. In addition, we compared DT-based features with their distance-based counterparts. Supplementary Fig. S4 shows that the DT-based method achieves greater performance for degree and closeness measures but slightly weaker performance for the other measures. Overall, DT-based features can thus be considered as alternatives to distance-based features.

View Article: PubMed Central - PubMed

ABSTRACT

A variety of algorithms have been developed for catalytic residue prediction based on either feature- or template-based methodology. However, no studies have systematically compared these two strategies and further considered whether their combination could improve the prediction performance. Herein, we developed an integrative algorithm named CRHunter by simultaneously using the complementarity between feature- and template-based methodologies and that between structural and sequence information. Several novel structural features were generated by the Delaunay triangulation and Laplacian transformation of enzyme structures. Combining these features with traditional descriptors, we invented two support vector machine feature predictors based on both structural and sequence information. Furthermore, we established two template predictors using structure and profile alignments. Evaluated on datasets with different levels of homology, our feature predictors achieve relatively stable performance, whereas our template predictors yield poor results when the homological relationships become weak. Nevertheless, the hybrid algorithm CRHunter consistently achieves optimal performance among all our predictors. We also illustrate that our methodology can be applied to the predicted structures of enzymes. Compared with state-of-the-art methods, CRHunter yields comparable or better performance on various datasets. Finally, the application of this algorithm to structural genomics targets sheds light on solved protein structures with unknown functions.

No MeSH data available.


Related in: MedlinePlus