Limits...
Optimizing structural modeling for a specific protein scaffold: knottins or inhibitor cystine knots.

Gracy J, Chiche L - BMC Bioinformatics (2010)

Bottom Line: This important variability is likely to arise from the highly diverse loops which connect the successive knotted cysteines.These average model deviations represent an improvement varying between 0.74 and 1.17 Å over a basic homology modeling derived from a unique template.In particular, we have shown that the accuracy of the models constructed at a low level of sequence identity can be improved by 1) a careful optimization of the modeling procedure, 2) the combination of multiple structural templates and 3) the use of conserved structural features as modeling restraints.

View Article: PubMed Central - HTML - PubMed

Affiliation: CNRS, UMR5048, Université Montpellier 1 et 2, Centre de Biochimie Structurale, 34090 Montpellier, France. Jerome.Gracy@cbs.cnrs.fr

ABSTRACT

Background: Knottins are small, diverse and stable proteins with important drug design potential. They can be classified in 30 families which cover a wide range of sequences (1621 sequenced), three-dimensional structures (155 solved) and functions (> 10). Inter knottin similarity lies mainly between 15% and 40% sequence identity and 1.5 to 4.5 Å backbone deviations although they all share a tightly knotted disulfide core. This important variability is likely to arise from the highly diverse loops which connect the successive knotted cysteines. The prediction of structural models for all knottin sequences would open new directions for the analysis of interaction sites and to provide a better understanding of the structural and functional organization of proteins sharing this scaffold.

Results: We have designed an automated modeling procedure for predicting the three-dimensionnal structure of knottins. The different steps of the homology modeling pipeline were carefully optimized relatively to a test set of knottins with known structures: template selection and alignment, extraction of structural constraints and model building, model evaluation and refinement. After optimization, the accuracy of predicted models was shown to lie between 1.50 and 1.96 Å from native structures at 50% and 10% maximum sequence identity levels, respectively. These average model deviations represent an improvement varying between 0.74 and 1.17 Å over a basic homology modeling derived from a unique template. A database of 1621 structural models for all known knottin sequences was generated and is freely accessible from our web server at http://knottin.cbs.cnrs.fr. Models can also be interactively constructed from any knottin sequence using the structure prediction module Knoter1D3D available from our protein analysis toolkit PAT at http://pat.cbs.cnrs.fr.

Conclusions: This work explores different directions for a systematic homology modeling of a diverse family of protein sequences. In particular, we have shown that the accuracy of the models constructed at a low level of sequence identity can be improved by 1) a careful optimization of the modeling procedure, 2) the combination of multiple structural templates and 3) the use of conserved structural features as modeling restraints.

Show MeSH
Median query - model main chain RMSD for different modeling methods. The grey end of each horizontal bar indicates the RMSD of the closest model to native structure while the black end of the bar indicates the RMSD of the best model according to SC3. Each test is described by five concatenated fields on the vertical axis: 1st field indicates the maximum allowed query - template sequence identity percentage, 2nd field indicates which template selection criterion was used (PID, DC4 or RMS) and 3rd field indicates with query - templates alignment method was used (KNT or TMA), 4th field indicates how many templates were used, 5th field indicates how many models were generated at each Modeller run.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2984590&req=5

Figure 5: Median query - model main chain RMSD for different modeling methods. The grey end of each horizontal bar indicates the RMSD of the closest model to native structure while the black end of the bar indicates the RMSD of the best model according to SC3. Each test is described by five concatenated fields on the vertical axis: 1st field indicates the maximum allowed query - template sequence identity percentage, 2nd field indicates which template selection criterion was used (PID, DC4 or RMS) and 3rd field indicates with query - templates alignment method was used (KNT or TMA), 4th field indicates how many templates were used, 5th field indicates how many models were generated at each Modeller run.

Mentions: Figure 5 displays the median RMSD between native knottin queries and their corresponding best model built using Modeller and selected using the optimal linear combination of evaluation score SC3. As in figure 4, the median query - model RMSD is improving as templates are selected using 1) PID, 2) DC4, 3) RMS criteria. RMSD is further improved when the template sequences are multiply aligned using TMA rather than KNT. RMSD is also reduced when more templates are selected and when more models are produced by Modeller. The overall gain between the worst (PID.KNT.T01.M01) and best (RMS.TMA.T20.M05) modeling procedures varies from 1.18 Å to 0.70 Å median RMSD improvement when the selected templates share less than respectively 10% to 50% sequence identity with the query knottin. These gains in query/model RMSD are slightly higher than those observed in query/template RMSD (Figure 4). This spectacular model improvement indicates that the basic but frequently used modeling procedure using one template selected according to the percent identity relatively to the query sequence is far from optimal and could be greatly improved by combining multiple structural templates and by optimizing selections and alignments.


Optimizing structural modeling for a specific protein scaffold: knottins or inhibitor cystine knots.

Gracy J, Chiche L - BMC Bioinformatics (2010)

Median query - model main chain RMSD for different modeling methods. The grey end of each horizontal bar indicates the RMSD of the closest model to native structure while the black end of the bar indicates the RMSD of the best model according to SC3. Each test is described by five concatenated fields on the vertical axis: 1st field indicates the maximum allowed query - template sequence identity percentage, 2nd field indicates which template selection criterion was used (PID, DC4 or RMS) and 3rd field indicates with query - templates alignment method was used (KNT or TMA), 4th field indicates how many templates were used, 5th field indicates how many models were generated at each Modeller run.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2984590&req=5

Figure 5: Median query - model main chain RMSD for different modeling methods. The grey end of each horizontal bar indicates the RMSD of the closest model to native structure while the black end of the bar indicates the RMSD of the best model according to SC3. Each test is described by five concatenated fields on the vertical axis: 1st field indicates the maximum allowed query - template sequence identity percentage, 2nd field indicates which template selection criterion was used (PID, DC4 or RMS) and 3rd field indicates with query - templates alignment method was used (KNT or TMA), 4th field indicates how many templates were used, 5th field indicates how many models were generated at each Modeller run.
Mentions: Figure 5 displays the median RMSD between native knottin queries and their corresponding best model built using Modeller and selected using the optimal linear combination of evaluation score SC3. As in figure 4, the median query - model RMSD is improving as templates are selected using 1) PID, 2) DC4, 3) RMS criteria. RMSD is further improved when the template sequences are multiply aligned using TMA rather than KNT. RMSD is also reduced when more templates are selected and when more models are produced by Modeller. The overall gain between the worst (PID.KNT.T01.M01) and best (RMS.TMA.T20.M05) modeling procedures varies from 1.18 Å to 0.70 Å median RMSD improvement when the selected templates share less than respectively 10% to 50% sequence identity with the query knottin. These gains in query/model RMSD are slightly higher than those observed in query/template RMSD (Figure 4). This spectacular model improvement indicates that the basic but frequently used modeling procedure using one template selected according to the percent identity relatively to the query sequence is far from optimal and could be greatly improved by combining multiple structural templates and by optimizing selections and alignments.

Bottom Line: This important variability is likely to arise from the highly diverse loops which connect the successive knotted cysteines.These average model deviations represent an improvement varying between 0.74 and 1.17 Å over a basic homology modeling derived from a unique template.In particular, we have shown that the accuracy of the models constructed at a low level of sequence identity can be improved by 1) a careful optimization of the modeling procedure, 2) the combination of multiple structural templates and 3) the use of conserved structural features as modeling restraints.

View Article: PubMed Central - HTML - PubMed

Affiliation: CNRS, UMR5048, Université Montpellier 1 et 2, Centre de Biochimie Structurale, 34090 Montpellier, France. Jerome.Gracy@cbs.cnrs.fr

ABSTRACT

Background: Knottins are small, diverse and stable proteins with important drug design potential. They can be classified in 30 families which cover a wide range of sequences (1621 sequenced), three-dimensional structures (155 solved) and functions (> 10). Inter knottin similarity lies mainly between 15% and 40% sequence identity and 1.5 to 4.5 Å backbone deviations although they all share a tightly knotted disulfide core. This important variability is likely to arise from the highly diverse loops which connect the successive knotted cysteines. The prediction of structural models for all knottin sequences would open new directions for the analysis of interaction sites and to provide a better understanding of the structural and functional organization of proteins sharing this scaffold.

Results: We have designed an automated modeling procedure for predicting the three-dimensionnal structure of knottins. The different steps of the homology modeling pipeline were carefully optimized relatively to a test set of knottins with known structures: template selection and alignment, extraction of structural constraints and model building, model evaluation and refinement. After optimization, the accuracy of predicted models was shown to lie between 1.50 and 1.96 Å from native structures at 50% and 10% maximum sequence identity levels, respectively. These average model deviations represent an improvement varying between 0.74 and 1.17 Å over a basic homology modeling derived from a unique template. A database of 1621 structural models for all known knottin sequences was generated and is freely accessible from our web server at http://knottin.cbs.cnrs.fr. Models can also be interactively constructed from any knottin sequence using the structure prediction module Knoter1D3D available from our protein analysis toolkit PAT at http://pat.cbs.cnrs.fr.

Conclusions: This work explores different directions for a systematic homology modeling of a diverse family of protein sequences. In particular, we have shown that the accuracy of the models constructed at a low level of sequence identity can be improved by 1) a careful optimization of the modeling procedure, 2) the combination of multiple structural templates and 3) the use of conserved structural features as modeling restraints.

Show MeSH