Limits...
Structure-based predictive models for allosteric hot spots.

Demerdash ON, Daily MD, Mitchell JC - PLoS Comput. Biol. (2009)

Bottom Line: Each residue had an associated set of calculated features.We combined the features from each set that produced models with optimal predictive performance.The top 10 models using this hybrid feature set had R = 73-81% and P = 64-71%, the best overall performance of any of the sets of models.

View Article: PubMed Central - PubMed

Affiliation: Biophysics Program, University of Wisconsin-Madison, Madison, Wisconsin, United States of America.

ABSTRACT
In allostery, a binding event at one site in a protein modulates the behavior of a distant site. Identifying residues that relay the signal between sites remains a challenge. We have developed predictive models using support-vector machines, a widely used machine-learning method. The training data set consisted of residues classified as either hotspots or non-hotspots based on experimental characterization of point mutations from a diverse set of allosteric proteins. Each residue had an associated set of calculated features. Two sets of features were used, one consisting of dynamical, structural, network, and informatic measures, and another of structural measures defined by Daily and Gray. The resulting models performed well on an independent data set consisting of hotspots and non-hotspots from five allosteric proteins. For the independent data set, our top 10 models using Feature Set 1 recalled 68-81% of known hotspots, and among total hotspot predictions, 58-67% were actual hotspots. Hence, these models have precision P = 58-67% and recall R = 68-81%. The corresponding models for Feature Set 2 had P = 55-59% and R = 81-92%. We combined the features from each set that produced models with optimal predictive performance. The top 10 models using this hybrid feature set had R = 73-81% and P = 64-71%, the best overall performance of any of the sets of models. Our methods identified hotspots in structural regions of known allosteric significance. Moreover, our predicted hotspots form a network of contiguous residues in the interior of the structures, in agreement with previous work. In conclusion, we have developed models that discriminate between known allosteric hotspots and non-hotspots with high accuracy and sensitivity. Moreover, the pattern of predicted hotspots corresponds to known functional motifs implicated in allostery, and is consistent with previous work describing sparse networks of allosterically important residues.

Show MeSH

Related in: MedlinePlus

Improvement of F1 upon successive feature addition.The bar on the far right represents a feature combination from the top 10 models. Preceding bars represent feature combinations where each bar contains one feature fewer than the bar to its right.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2748687&req=5

pcbi-1000531-g003: Improvement of F1 upon successive feature addition.The bar on the far right represents a feature combination from the top 10 models. Preceding bars represent feature combinations where each bar contains one feature fewer than the bar to its right.

Mentions: To assess how much each feature contributes to the predictive ability of a given feature/kernel degree combination, we considered a feature combination from the top 300 that also performed well on the independent data set and analyzed the effect of successive feature addition. In this analysis, the starting point is one feature contained in a top-300 feature/kernel degree combination, followed by a 2-feature model, etc. (Figure 3). The greatest improvement in F1 occurred with the combination of two features (mean-squared fluctuation in the active and inactive states), followed by a modest improvement after the addition of some third feature. Additional features did not appreciably improve the F1 scores. This suggests that mean-squared fluctuations in the two states are “anchor” features for this particular model, and successive features finely tune the performance.


Structure-based predictive models for allosteric hot spots.

Demerdash ON, Daily MD, Mitchell JC - PLoS Comput. Biol. (2009)

Improvement of F1 upon successive feature addition.The bar on the far right represents a feature combination from the top 10 models. Preceding bars represent feature combinations where each bar contains one feature fewer than the bar to its right.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2748687&req=5

pcbi-1000531-g003: Improvement of F1 upon successive feature addition.The bar on the far right represents a feature combination from the top 10 models. Preceding bars represent feature combinations where each bar contains one feature fewer than the bar to its right.
Mentions: To assess how much each feature contributes to the predictive ability of a given feature/kernel degree combination, we considered a feature combination from the top 300 that also performed well on the independent data set and analyzed the effect of successive feature addition. In this analysis, the starting point is one feature contained in a top-300 feature/kernel degree combination, followed by a 2-feature model, etc. (Figure 3). The greatest improvement in F1 occurred with the combination of two features (mean-squared fluctuation in the active and inactive states), followed by a modest improvement after the addition of some third feature. Additional features did not appreciably improve the F1 scores. This suggests that mean-squared fluctuations in the two states are “anchor” features for this particular model, and successive features finely tune the performance.

Bottom Line: Each residue had an associated set of calculated features.We combined the features from each set that produced models with optimal predictive performance.The top 10 models using this hybrid feature set had R = 73-81% and P = 64-71%, the best overall performance of any of the sets of models.

View Article: PubMed Central - PubMed

Affiliation: Biophysics Program, University of Wisconsin-Madison, Madison, Wisconsin, United States of America.

ABSTRACT
In allostery, a binding event at one site in a protein modulates the behavior of a distant site. Identifying residues that relay the signal between sites remains a challenge. We have developed predictive models using support-vector machines, a widely used machine-learning method. The training data set consisted of residues classified as either hotspots or non-hotspots based on experimental characterization of point mutations from a diverse set of allosteric proteins. Each residue had an associated set of calculated features. Two sets of features were used, one consisting of dynamical, structural, network, and informatic measures, and another of structural measures defined by Daily and Gray. The resulting models performed well on an independent data set consisting of hotspots and non-hotspots from five allosteric proteins. For the independent data set, our top 10 models using Feature Set 1 recalled 68-81% of known hotspots, and among total hotspot predictions, 58-67% were actual hotspots. Hence, these models have precision P = 58-67% and recall R = 68-81%. The corresponding models for Feature Set 2 had P = 55-59% and R = 81-92%. We combined the features from each set that produced models with optimal predictive performance. The top 10 models using this hybrid feature set had R = 73-81% and P = 64-71%, the best overall performance of any of the sets of models. Our methods identified hotspots in structural regions of known allosteric significance. Moreover, our predicted hotspots form a network of contiguous residues in the interior of the structures, in agreement with previous work. In conclusion, we have developed models that discriminate between known allosteric hotspots and non-hotspots with high accuracy and sensitivity. Moreover, the pattern of predicted hotspots corresponds to known functional motifs implicated in allostery, and is consistent with previous work describing sparse networks of allosterically important residues.

Show MeSH
Related in: MedlinePlus