Limits...
Optimizing structural modeling for a specific protein scaffold: knottins or inhibitor cystine knots.

Gracy J, Chiche L - BMC Bioinformatics (2010)

Bottom Line: This important variability is likely to arise from the highly diverse loops which connect the successive knotted cysteines.These average model deviations represent an improvement varying between 0.74 and 1.17 Å over a basic homology modeling derived from a unique template.In particular, we have shown that the accuracy of the models constructed at a low level of sequence identity can be improved by 1) a careful optimization of the modeling procedure, 2) the combination of multiple structural templates and 3) the use of conserved structural features as modeling restraints.

View Article: PubMed Central - HTML - PubMed

Affiliation: CNRS, UMR5048, Université Montpellier 1 et 2, Centre de Biochimie Structurale, 34090 Montpellier, France. Jerome.Gracy@cbs.cnrs.fr

ABSTRACT

Background: Knottins are small, diverse and stable proteins with important drug design potential. They can be classified in 30 families which cover a wide range of sequences (1621 sequenced), three-dimensional structures (155 solved) and functions (> 10). Inter knottin similarity lies mainly between 15% and 40% sequence identity and 1.5 to 4.5 Å backbone deviations although they all share a tightly knotted disulfide core. This important variability is likely to arise from the highly diverse loops which connect the successive knotted cysteines. The prediction of structural models for all knottin sequences would open new directions for the analysis of interaction sites and to provide a better understanding of the structural and functional organization of proteins sharing this scaffold.

Results: We have designed an automated modeling procedure for predicting the three-dimensionnal structure of knottins. The different steps of the homology modeling pipeline were carefully optimized relatively to a test set of knottins with known structures: template selection and alignment, extraction of structural constraints and model building, model evaluation and refinement. After optimization, the accuracy of predicted models was shown to lie between 1.50 and 1.96 Å from native structures at 50% and 10% maximum sequence identity levels, respectively. These average model deviations represent an improvement varying between 0.74 and 1.17 Å over a basic homology modeling derived from a unique template. A database of 1621 structural models for all known knottin sequences was generated and is freely accessible from our web server at http://knottin.cbs.cnrs.fr. Models can also be interactively constructed from any knottin sequence using the structure prediction module Knoter1D3D available from our protein analysis toolkit PAT at http://pat.cbs.cnrs.fr.

Conclusions: This work explores different directions for a systematic homology modeling of a diverse family of protein sequences. In particular, we have shown that the accuracy of the models constructed at a low level of sequence identity can be improved by 1) a careful optimization of the modeling procedure, 2) the combination of multiple structural templates and 3) the use of conserved structural features as modeling restraints.

Show MeSH
Median model - native main chain RMSD variation when scoring the models with MM_GBSA instead of SC3 versus the model - native main chain RMSD.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2984590&req=5

Figure 11: Median model - native main chain RMSD variation when scoring the models with MM_GBSA instead of SC3 versus the model - native main chain RMSD.

Mentions: Figure 11 displays variations of the model - native structure RMSDs when the models are energy minimized using the Amber suite then selected using the MM_GBSA energy as the evaluation criterion. A recent study has shown that energy minimization with implicit solvent (GBSA) provides greater improvement for some proteins than with a knowledge based potential [38]. Unfortunately, on our data set, while requiring more computing time, this refinement and evaluation method suffers globally from a slight loss in accuracy compared to the SC3 criterion, resulting in a RMSD variation below 0.1 Å between the two criteria. It is however worth noting that the MM_GBSA criterion is slightly better than SC3 when models are close to the native structure (RMSD < 1.5 Å)but worse than SC3 when models are farther from the native structure (Figure 11). This result tends to indicate that physics-based force fields with implicit solvation (MM_GBSA) are better in assessing quality of models close to the native state while knowledge-based potentials are more accurate predictors when deformations are higher. This tendency is consistent with the preferential uses of statistical potentials for threading or folding prediction at low sequence identity and of physics-based force fields for the refinement of models close to native conformations. This dichotomy suggests that model selection could be improved if we could predict which criterion to use, either MM_GBSA for models closer than ~1.5 Å to native structure or SC3 for more distant models. However, such a close - distant model classifier would need to be quite accurate since misclassifications would rapidly cancel the small gain obtained using MM_GBSA for close models.


Optimizing structural modeling for a specific protein scaffold: knottins or inhibitor cystine knots.

Gracy J, Chiche L - BMC Bioinformatics (2010)

Median model - native main chain RMSD variation when scoring the models with MM_GBSA instead of SC3 versus the model - native main chain RMSD.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2984590&req=5

Figure 11: Median model - native main chain RMSD variation when scoring the models with MM_GBSA instead of SC3 versus the model - native main chain RMSD.
Mentions: Figure 11 displays variations of the model - native structure RMSDs when the models are energy minimized using the Amber suite then selected using the MM_GBSA energy as the evaluation criterion. A recent study has shown that energy minimization with implicit solvent (GBSA) provides greater improvement for some proteins than with a knowledge based potential [38]. Unfortunately, on our data set, while requiring more computing time, this refinement and evaluation method suffers globally from a slight loss in accuracy compared to the SC3 criterion, resulting in a RMSD variation below 0.1 Å between the two criteria. It is however worth noting that the MM_GBSA criterion is slightly better than SC3 when models are close to the native structure (RMSD < 1.5 Å)but worse than SC3 when models are farther from the native structure (Figure 11). This result tends to indicate that physics-based force fields with implicit solvation (MM_GBSA) are better in assessing quality of models close to the native state while knowledge-based potentials are more accurate predictors when deformations are higher. This tendency is consistent with the preferential uses of statistical potentials for threading or folding prediction at low sequence identity and of physics-based force fields for the refinement of models close to native conformations. This dichotomy suggests that model selection could be improved if we could predict which criterion to use, either MM_GBSA for models closer than ~1.5 Å to native structure or SC3 for more distant models. However, such a close - distant model classifier would need to be quite accurate since misclassifications would rapidly cancel the small gain obtained using MM_GBSA for close models.

Bottom Line: This important variability is likely to arise from the highly diverse loops which connect the successive knotted cysteines.These average model deviations represent an improvement varying between 0.74 and 1.17 Å over a basic homology modeling derived from a unique template.In particular, we have shown that the accuracy of the models constructed at a low level of sequence identity can be improved by 1) a careful optimization of the modeling procedure, 2) the combination of multiple structural templates and 3) the use of conserved structural features as modeling restraints.

View Article: PubMed Central - HTML - PubMed

Affiliation: CNRS, UMR5048, Université Montpellier 1 et 2, Centre de Biochimie Structurale, 34090 Montpellier, France. Jerome.Gracy@cbs.cnrs.fr

ABSTRACT

Background: Knottins are small, diverse and stable proteins with important drug design potential. They can be classified in 30 families which cover a wide range of sequences (1621 sequenced), three-dimensional structures (155 solved) and functions (> 10). Inter knottin similarity lies mainly between 15% and 40% sequence identity and 1.5 to 4.5 Å backbone deviations although they all share a tightly knotted disulfide core. This important variability is likely to arise from the highly diverse loops which connect the successive knotted cysteines. The prediction of structural models for all knottin sequences would open new directions for the analysis of interaction sites and to provide a better understanding of the structural and functional organization of proteins sharing this scaffold.

Results: We have designed an automated modeling procedure for predicting the three-dimensionnal structure of knottins. The different steps of the homology modeling pipeline were carefully optimized relatively to a test set of knottins with known structures: template selection and alignment, extraction of structural constraints and model building, model evaluation and refinement. After optimization, the accuracy of predicted models was shown to lie between 1.50 and 1.96 Å from native structures at 50% and 10% maximum sequence identity levels, respectively. These average model deviations represent an improvement varying between 0.74 and 1.17 Å over a basic homology modeling derived from a unique template. A database of 1621 structural models for all known knottin sequences was generated and is freely accessible from our web server at http://knottin.cbs.cnrs.fr. Models can also be interactively constructed from any knottin sequence using the structure prediction module Knoter1D3D available from our protein analysis toolkit PAT at http://pat.cbs.cnrs.fr.

Conclusions: This work explores different directions for a systematic homology modeling of a diverse family of protein sequences. In particular, we have shown that the accuracy of the models constructed at a low level of sequence identity can be improved by 1) a careful optimization of the modeling procedure, 2) the combination of multiple structural templates and 3) the use of conserved structural features as modeling restraints.

Show MeSH