Limits...
Structure-based predictive models for allosteric hot spots.

Demerdash ON, Daily MD, Mitchell JC - PLoS Comput. Biol. (2009)

Bottom Line: Each residue had an associated set of calculated features.We combined the features from each set that produced models with optimal predictive performance.The top 10 models using this hybrid feature set had R = 73-81% and P = 64-71%, the best overall performance of any of the sets of models.

View Article: PubMed Central - PubMed

Affiliation: Biophysics Program, University of Wisconsin-Madison, Madison, Wisconsin, United States of America.

ABSTRACT
In allostery, a binding event at one site in a protein modulates the behavior of a distant site. Identifying residues that relay the signal between sites remains a challenge. We have developed predictive models using support-vector machines, a widely used machine-learning method. The training data set consisted of residues classified as either hotspots or non-hotspots based on experimental characterization of point mutations from a diverse set of allosteric proteins. Each residue had an associated set of calculated features. Two sets of features were used, one consisting of dynamical, structural, network, and informatic measures, and another of structural measures defined by Daily and Gray. The resulting models performed well on an independent data set consisting of hotspots and non-hotspots from five allosteric proteins. For the independent data set, our top 10 models using Feature Set 1 recalled 68-81% of known hotspots, and among total hotspot predictions, 58-67% were actual hotspots. Hence, these models have precision P = 58-67% and recall R = 68-81%. The corresponding models for Feature Set 2 had P = 55-59% and R = 81-92%. We combined the features from each set that produced models with optimal predictive performance. The top 10 models using this hybrid feature set had R = 73-81% and P = 64-71%, the best overall performance of any of the sets of models. Our methods identified hotspots in structural regions of known allosteric significance. Moreover, our predicted hotspots form a network of contiguous residues in the interior of the structures, in agreement with previous work. In conclusion, we have developed models that discriminate between known allosteric hotspots and non-hotspots with high accuracy and sensitivity. Moreover, the pattern of predicted hotspots corresponds to known functional motifs implicated in allostery, and is consistent with previous work describing sparse networks of allosterically important residues.

Show MeSH

Related in: MedlinePlus

Feature usage in the top 300 SVM models using Feature Set 2.For each feature, the number of models (frequency) in the top 300, as ranked by F1 performance on the training data, that used that particular feature was tabulated.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2748687&req=5

pcbi-1000531-g002: Feature usage in the top 300 SVM models using Feature Set 2.For each feature, the number of models (frequency) in the top 300, as ranked by F1 performance on the training data, that used that particular feature was tabulated.

Mentions: Identifying the features that were used most frequently in the top 300 feature/kernel degree combinations can yield insights into properties that may, when taken together, indicate signatures of an allosteric hotspot residue. Dominant features in the top 300 feature combinations of Set 1 were mean squared fluctuation in the inactive and active conformers; difference in atomic density between inactive and active conformers; deformation energy of the inactive state; difference in the number of hydrogen bonds between inactive and active states; B-factor in the active state; difference in B-factor between the inactive and active states; and local structural entropy (Figure 1). Features from Set 2 that were dominant when considering the top 300 different combinations were as follows: alpha-carbon displacement; total residue solvent-accessible surface area in the inactive and active states, and the average of the two; side-chain solvent-accessible surface area in both states, and the average; backbone solvent-accessible surface area in the active state, and the average of this value in the inactive and active states (Figure 2).


Structure-based predictive models for allosteric hot spots.

Demerdash ON, Daily MD, Mitchell JC - PLoS Comput. Biol. (2009)

Feature usage in the top 300 SVM models using Feature Set 2.For each feature, the number of models (frequency) in the top 300, as ranked by F1 performance on the training data, that used that particular feature was tabulated.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2748687&req=5

pcbi-1000531-g002: Feature usage in the top 300 SVM models using Feature Set 2.For each feature, the number of models (frequency) in the top 300, as ranked by F1 performance on the training data, that used that particular feature was tabulated.
Mentions: Identifying the features that were used most frequently in the top 300 feature/kernel degree combinations can yield insights into properties that may, when taken together, indicate signatures of an allosteric hotspot residue. Dominant features in the top 300 feature combinations of Set 1 were mean squared fluctuation in the inactive and active conformers; difference in atomic density between inactive and active conformers; deformation energy of the inactive state; difference in the number of hydrogen bonds between inactive and active states; B-factor in the active state; difference in B-factor between the inactive and active states; and local structural entropy (Figure 1). Features from Set 2 that were dominant when considering the top 300 different combinations were as follows: alpha-carbon displacement; total residue solvent-accessible surface area in the inactive and active states, and the average of the two; side-chain solvent-accessible surface area in both states, and the average; backbone solvent-accessible surface area in the active state, and the average of this value in the inactive and active states (Figure 2).

Bottom Line: Each residue had an associated set of calculated features.We combined the features from each set that produced models with optimal predictive performance.The top 10 models using this hybrid feature set had R = 73-81% and P = 64-71%, the best overall performance of any of the sets of models.

View Article: PubMed Central - PubMed

Affiliation: Biophysics Program, University of Wisconsin-Madison, Madison, Wisconsin, United States of America.

ABSTRACT
In allostery, a binding event at one site in a protein modulates the behavior of a distant site. Identifying residues that relay the signal between sites remains a challenge. We have developed predictive models using support-vector machines, a widely used machine-learning method. The training data set consisted of residues classified as either hotspots or non-hotspots based on experimental characterization of point mutations from a diverse set of allosteric proteins. Each residue had an associated set of calculated features. Two sets of features were used, one consisting of dynamical, structural, network, and informatic measures, and another of structural measures defined by Daily and Gray. The resulting models performed well on an independent data set consisting of hotspots and non-hotspots from five allosteric proteins. For the independent data set, our top 10 models using Feature Set 1 recalled 68-81% of known hotspots, and among total hotspot predictions, 58-67% were actual hotspots. Hence, these models have precision P = 58-67% and recall R = 68-81%. The corresponding models for Feature Set 2 had P = 55-59% and R = 81-92%. We combined the features from each set that produced models with optimal predictive performance. The top 10 models using this hybrid feature set had R = 73-81% and P = 64-71%, the best overall performance of any of the sets of models. Our methods identified hotspots in structural regions of known allosteric significance. Moreover, our predicted hotspots form a network of contiguous residues in the interior of the structures, in agreement with previous work. In conclusion, we have developed models that discriminate between known allosteric hotspots and non-hotspots with high accuracy and sensitivity. Moreover, the pattern of predicted hotspots corresponds to known functional motifs implicated in allostery, and is consistent with previous work describing sparse networks of allosterically important residues.

Show MeSH
Related in: MedlinePlus