Limits...
ResBoost: characterizing and predicting catalytic residues in enzymes.

Alterovitz R, Arvey A, Sankararaman S, Dallett C, Freund Y, Sjölander K - BMC Bioinformatics (2009)

Bottom Line: The method effectively selects and combines rules of thumb into a simple, easily interpretable logical expression that can be used for prediction.ResBoost reduces the number of false positives by up to 56% compared to the use of evolutionary conservation scoring alone.We also illustrate the ability of ResBoost to identify recently validated catalytic residues not listed in the CSA.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, University of North Carolina at Chapel Hill, USA. ron@cs.unc.edu

ABSTRACT

Background: Identifying the catalytic residues in enzymes can aid in understanding the molecular basis of an enzyme's function and has significant implications for designing new drugs, identifying genetic disorders, and engineering proteins with novel functions. Since experimentally determining catalytic sites is expensive, better computational methods for identifying catalytic residues are needed.

Results: We propose ResBoost, a new computational method to learn characteristics of catalytic residues. The method effectively selects and combines rules of thumb into a simple, easily interpretable logical expression that can be used for prediction. We formally define the rules of thumb that are often used to narrow the list of candidate residues, including residue evolutionary conservation, 3D clustering, solvent accessibility, and hydrophilicity. ResBoost builds on two methods from machine learning, the AdaBoost algorithm and Alternating Decision Trees, and provides precise control over the inherent trade-off between sensitivity and specificity. We evaluated ResBoost using cross-validation on a dataset of 100 enzymes from the hand-curated Catalytic Site Atlas (CSA).

Conclusion: ResBoost achieved 85% sensitivity for a 9.8% false positive rate and 73% sensitivity for a 5.7% false positive rate. ResBoost reduces the number of false positives by up to 56% compared to the use of evolutionary conservation scoring alone. We also illustrate the ability of ResBoost to identify recently validated catalytic residues not listed in the CSA.

Show MeSH

Related in: MedlinePlus

We demonstrate ResBoost's control over the sensitivity/specificity trade-off using the enzyme 7,8-dihydroneopterin aldolase, a bacterial and plant enzyme needed for folate production that is an important target for antibiotics [28]. ResBoost predictions for this enzyme from Staphylococcus aureus (PDB ID: 2dhn) for two values of the trade-off parameter k, k = 256 (top) and k = 350 (bottom), are shown. ResBoost detected the main reaction center E22 and K100. In addition, at k = 350, ResBoost detected another cleft that includes Y54, a newly discovered catalytic residue not yet in the CSA that has been found to be important in orienting the substrate and stabilizing the intermediate.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2713229&req=5

Figure 1: We demonstrate ResBoost's control over the sensitivity/specificity trade-off using the enzyme 7,8-dihydroneopterin aldolase, a bacterial and plant enzyme needed for folate production that is an important target for antibiotics [28]. ResBoost predictions for this enzyme from Staphylococcus aureus (PDB ID: 2dhn) for two values of the trade-off parameter k, k = 256 (top) and k = 350 (bottom), are shown. ResBoost detected the main reaction center E22 and K100. In addition, at k = 350, ResBoost detected another cleft that includes Y54, a newly discovered catalytic residue not yet in the CSA that has been found to be important in orienting the substrate and stabilizing the intermediate.

Mentions: All protein quantitative data is inherently noisy, thus any method to predict catalytic residues is subject to an inherent trade-off between sensitivity (the number of correct catalytic residue predictions relative to the total number of catalytic residues) and specificity (the number of residues correctly identified as non-catalytic relative to the total number of non-catalytic residues). We provide the user with control over this trade-off: the user specifies an input parameter k and the method maximizes sensitivity while maintaining the desired specificity (or false positive rate (FPR)). The result for an example enzyme, 7,8-dihydroneopterin aldolase from Staphylococcus aureus (PDB ID: 2dhn), is shown in Figure 1.


ResBoost: characterizing and predicting catalytic residues in enzymes.

Alterovitz R, Arvey A, Sankararaman S, Dallett C, Freund Y, Sjölander K - BMC Bioinformatics (2009)

We demonstrate ResBoost's control over the sensitivity/specificity trade-off using the enzyme 7,8-dihydroneopterin aldolase, a bacterial and plant enzyme needed for folate production that is an important target for antibiotics [28]. ResBoost predictions for this enzyme from Staphylococcus aureus (PDB ID: 2dhn) for two values of the trade-off parameter k, k = 256 (top) and k = 350 (bottom), are shown. ResBoost detected the main reaction center E22 and K100. In addition, at k = 350, ResBoost detected another cleft that includes Y54, a newly discovered catalytic residue not yet in the CSA that has been found to be important in orienting the substrate and stabilizing the intermediate.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2713229&req=5

Figure 1: We demonstrate ResBoost's control over the sensitivity/specificity trade-off using the enzyme 7,8-dihydroneopterin aldolase, a bacterial and plant enzyme needed for folate production that is an important target for antibiotics [28]. ResBoost predictions for this enzyme from Staphylococcus aureus (PDB ID: 2dhn) for two values of the trade-off parameter k, k = 256 (top) and k = 350 (bottom), are shown. ResBoost detected the main reaction center E22 and K100. In addition, at k = 350, ResBoost detected another cleft that includes Y54, a newly discovered catalytic residue not yet in the CSA that has been found to be important in orienting the substrate and stabilizing the intermediate.
Mentions: All protein quantitative data is inherently noisy, thus any method to predict catalytic residues is subject to an inherent trade-off between sensitivity (the number of correct catalytic residue predictions relative to the total number of catalytic residues) and specificity (the number of residues correctly identified as non-catalytic relative to the total number of non-catalytic residues). We provide the user with control over this trade-off: the user specifies an input parameter k and the method maximizes sensitivity while maintaining the desired specificity (or false positive rate (FPR)). The result for an example enzyme, 7,8-dihydroneopterin aldolase from Staphylococcus aureus (PDB ID: 2dhn), is shown in Figure 1.

Bottom Line: The method effectively selects and combines rules of thumb into a simple, easily interpretable logical expression that can be used for prediction.ResBoost reduces the number of false positives by up to 56% compared to the use of evolutionary conservation scoring alone.We also illustrate the ability of ResBoost to identify recently validated catalytic residues not listed in the CSA.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Computer Science, University of North Carolina at Chapel Hill, USA. ron@cs.unc.edu

ABSTRACT

Background: Identifying the catalytic residues in enzymes can aid in understanding the molecular basis of an enzyme's function and has significant implications for designing new drugs, identifying genetic disorders, and engineering proteins with novel functions. Since experimentally determining catalytic sites is expensive, better computational methods for identifying catalytic residues are needed.

Results: We propose ResBoost, a new computational method to learn characteristics of catalytic residues. The method effectively selects and combines rules of thumb into a simple, easily interpretable logical expression that can be used for prediction. We formally define the rules of thumb that are often used to narrow the list of candidate residues, including residue evolutionary conservation, 3D clustering, solvent accessibility, and hydrophilicity. ResBoost builds on two methods from machine learning, the AdaBoost algorithm and Alternating Decision Trees, and provides precise control over the inherent trade-off between sensitivity and specificity. We evaluated ResBoost using cross-validation on a dataset of 100 enzymes from the hand-curated Catalytic Site Atlas (CSA).

Conclusion: ResBoost achieved 85% sensitivity for a 9.8% false positive rate and 73% sensitivity for a 5.7% false positive rate. ResBoost reduces the number of false positives by up to 56% compared to the use of evolutionary conservation scoring alone. We also illustrate the ability of ResBoost to identify recently validated catalytic residues not listed in the CSA.

Show MeSH
Related in: MedlinePlus