Limits...
Interpretation of ensembles created by multiple iterative rebuilding of macromolecular models.

Terwilliger TC, Grosse-Kunstleve RW, Afonine PV, Adams PD, Moriarty NW, Zwart P, Read RJ, Turk D, Hung LW - Acta Crystallogr. D Biol. Crystallogr. (2007)

Bottom Line: Most of the heterogeneity among models produced in this way is in the side chains and loops on the protein surface.Synthetic data were created in which a crystal structure was modelled as the average of a set of ;perfect' structures and the range of models obtained by rebuilding a single starting model was examined.Instead, the group of structures obtained by repetitive rebuilding reflects the precision of the models, and the standard deviation of coordinates of these structures is a lower bound estimate of the uncertainty in coordinates of the individual models.

View Article: PubMed Central - HTML - PubMed

Affiliation: Los Alamos National Laboratory, Mailstop M888, Los Alamos, NM 87545, USA. terwilliger@lanl.gov

ABSTRACT
Automation of iterative model building, density modification and refinement in macromolecular crystallography has made it feasible to carry out this entire process multiple times. By using different random seeds in the process, a number of different models compatible with experimental data can be created. Sets of models were generated in this way using real data for ten protein structures from the Protein Data Bank and using synthetic data generated at various resolutions. Most of the heterogeneity among models produced in this way is in the side chains and loops on the protein surface. Possible interpretations of the variation among models created by repetitive rebuilding were investigated. Synthetic data were created in which a crystal structure was modelled as the average of a set of ;perfect' structures and the range of models obtained by rebuilding a single starting model was examined. The standard deviations of coordinates in models obtained by repetitive rebuilding at high resolution are small, while those obtained for the same synthetic crystal structure at low resolution are large, so that the diversity within a group of models cannot generally be a quantitative reflection of the actual structures in a crystal. Instead, the group of structures obtained by repetitive rebuilding reflects the precision of the models, and the standard deviation of coordinates of these structures is a lower bound estimate of the uncertainty in coordinates of the individual models.

Show MeSH

Related in: MedlinePlus

Comparison of free R values of models built at varying resolutions with models built at a resolution of 1.75 Å. The open diamonds indicate the mean of the free R values of the models built at varying resolutions as described in Table 3 ▶. The closed diamonds show the mean of the free R values of the models built at a resolution of 1.75 Å, but only using data to the indicated resolutions to calculate the free R values. The error bars are ±1 SD. The open circles indicate the free R value of composite models constructed as described in the text from the ensemble of models built at varying resolutions.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2483474&req=5

fig8: Comparison of free R values of models built at varying resolutions with models built at a resolution of 1.75 Å. The open diamonds indicate the mean of the free R values of the models built at varying resolutions as described in Table 3 ▶. The closed diamonds show the mean of the free R values of the models built at a resolution of 1.75 Å, but only using data to the indicated resolutions to calculate the free R values. The error bars are ±1 SD. The open circles indicate the free R value of composite models constructed as described in the text from the ensemble of models built at varying resolutions.

Mentions: The precision of a set of models does not, however, necessarily have anything to do with the accuracy of a model that can be produced by some other procedure based on the same data. To illustrate this point, an analysis of the ensembles of models produced from synthetic data (Table 3 ▶) was carried out. Fig. 8 ▶ compares the free R values of the models produced using the data truncated at various resolutions with those of models produced by using all the data but then only considering the data to these various resolutions in calculating the free R value. Fig. 8 ▶ shows that the models produced at a resolution of 1.75 Å have much better free R values at low resolution than the models that were built at low resolution. This is not particularly surprising, as it is well known that the fit to X-ray data at moderate resolution can be improved by obtaining higher resolution data and using it to improve the model and its fit at moderate (as well as high) resolution (Lattman, 1996 ▶). Fig. 8 ▶ confirms, however, that the models that are obtained using data to low resolution (e.g. 4.0 Å) are not the best possible models that could be obtained using this data. The 1.75 Å models all have lower free R values than any of the 4.0 Å models, considering just the data to 4.0 Å. This means that the quality of the models in the ensemble generated at a resolution of 4.0 Å is in part a sampling problem in which the model-building algorithm is not able to test all possible models and some of the best ones are never examined. None of the 1.75 Å models were ever considered during the generation of the 4.0 Å models. If they had been, then they would have been identified (based on R or free R values) as clearly superior to the 4.0 Å models that the procedure generated. We examined this point further by determining whether the ensemble of models generated using data to various resolutions contained accurately placed atoms, but simply never together in the same model. For each ensemble represented in Fig. 8 ▶, we created a composite ‘structure’ by breaking each structure in the ensemble into segments five residues long and choosing for each segment the one that had the lowest r.m.s.d. to the mean true structure. The dotted line in Fig. 8 ▶ shows that the free R values of these composite models are consistently somewhat lower than the mean free R values of the individual models in the ensembles. This suggests that the sampling problem might be partially overcome by recombination among multiple models of a structure, provided a method for choosing the best example of each segment can be developed.


Interpretation of ensembles created by multiple iterative rebuilding of macromolecular models.

Terwilliger TC, Grosse-Kunstleve RW, Afonine PV, Adams PD, Moriarty NW, Zwart P, Read RJ, Turk D, Hung LW - Acta Crystallogr. D Biol. Crystallogr. (2007)

Comparison of free R values of models built at varying resolutions with models built at a resolution of 1.75 Å. The open diamonds indicate the mean of the free R values of the models built at varying resolutions as described in Table 3 ▶. The closed diamonds show the mean of the free R values of the models built at a resolution of 1.75 Å, but only using data to the indicated resolutions to calculate the free R values. The error bars are ±1 SD. The open circles indicate the free R value of composite models constructed as described in the text from the ensemble of models built at varying resolutions.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2483474&req=5

fig8: Comparison of free R values of models built at varying resolutions with models built at a resolution of 1.75 Å. The open diamonds indicate the mean of the free R values of the models built at varying resolutions as described in Table 3 ▶. The closed diamonds show the mean of the free R values of the models built at a resolution of 1.75 Å, but only using data to the indicated resolutions to calculate the free R values. The error bars are ±1 SD. The open circles indicate the free R value of composite models constructed as described in the text from the ensemble of models built at varying resolutions.
Mentions: The precision of a set of models does not, however, necessarily have anything to do with the accuracy of a model that can be produced by some other procedure based on the same data. To illustrate this point, an analysis of the ensembles of models produced from synthetic data (Table 3 ▶) was carried out. Fig. 8 ▶ compares the free R values of the models produced using the data truncated at various resolutions with those of models produced by using all the data but then only considering the data to these various resolutions in calculating the free R value. Fig. 8 ▶ shows that the models produced at a resolution of 1.75 Å have much better free R values at low resolution than the models that were built at low resolution. This is not particularly surprising, as it is well known that the fit to X-ray data at moderate resolution can be improved by obtaining higher resolution data and using it to improve the model and its fit at moderate (as well as high) resolution (Lattman, 1996 ▶). Fig. 8 ▶ confirms, however, that the models that are obtained using data to low resolution (e.g. 4.0 Å) are not the best possible models that could be obtained using this data. The 1.75 Å models all have lower free R values than any of the 4.0 Å models, considering just the data to 4.0 Å. This means that the quality of the models in the ensemble generated at a resolution of 4.0 Å is in part a sampling problem in which the model-building algorithm is not able to test all possible models and some of the best ones are never examined. None of the 1.75 Å models were ever considered during the generation of the 4.0 Å models. If they had been, then they would have been identified (based on R or free R values) as clearly superior to the 4.0 Å models that the procedure generated. We examined this point further by determining whether the ensemble of models generated using data to various resolutions contained accurately placed atoms, but simply never together in the same model. For each ensemble represented in Fig. 8 ▶, we created a composite ‘structure’ by breaking each structure in the ensemble into segments five residues long and choosing for each segment the one that had the lowest r.m.s.d. to the mean true structure. The dotted line in Fig. 8 ▶ shows that the free R values of these composite models are consistently somewhat lower than the mean free R values of the individual models in the ensembles. This suggests that the sampling problem might be partially overcome by recombination among multiple models of a structure, provided a method for choosing the best example of each segment can be developed.

Bottom Line: Most of the heterogeneity among models produced in this way is in the side chains and loops on the protein surface.Synthetic data were created in which a crystal structure was modelled as the average of a set of ;perfect' structures and the range of models obtained by rebuilding a single starting model was examined.Instead, the group of structures obtained by repetitive rebuilding reflects the precision of the models, and the standard deviation of coordinates of these structures is a lower bound estimate of the uncertainty in coordinates of the individual models.

View Article: PubMed Central - HTML - PubMed

Affiliation: Los Alamos National Laboratory, Mailstop M888, Los Alamos, NM 87545, USA. terwilliger@lanl.gov

ABSTRACT
Automation of iterative model building, density modification and refinement in macromolecular crystallography has made it feasible to carry out this entire process multiple times. By using different random seeds in the process, a number of different models compatible with experimental data can be created. Sets of models were generated in this way using real data for ten protein structures from the Protein Data Bank and using synthetic data generated at various resolutions. Most of the heterogeneity among models produced in this way is in the side chains and loops on the protein surface. Possible interpretations of the variation among models created by repetitive rebuilding were investigated. Synthetic data were created in which a crystal structure was modelled as the average of a set of ;perfect' structures and the range of models obtained by rebuilding a single starting model was examined. The standard deviations of coordinates in models obtained by repetitive rebuilding at high resolution are small, while those obtained for the same synthetic crystal structure at low resolution are large, so that the diversity within a group of models cannot generally be a quantitative reflection of the actual structures in a crystal. Instead, the group of structures obtained by repetitive rebuilding reflects the precision of the models, and the standard deviation of coordinates of these structures is a lower bound estimate of the uncertainty in coordinates of the individual models.

Show MeSH
Related in: MedlinePlus