Limits...
Free kick instead of cross-validation in maximum-likelihood refinement of macromolecular crystal structures.

Pražnikar J, Turk D - Acta Crystallogr. D Biol. Crystallogr. (2014)

Bottom Line: They utilize phase-error estimates that are calculated from a small fraction of diffraction data, called the test set, that are not used to fit the model.It is called ML free-kick refinement as it uses the ML formulation of the target function and is based on the idea of freeing the model from the model bias imposed by the chemical energy restraints used in refinement.This approach for the calculation of error estimates is superior to the cross-validation approach: it reduces the phase error and increases the accuracy of molecular models, is more robust, provides clearer maps and may use a smaller portion of data for the test set for the calculation of Rfree or may leave it out completely.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biochemistry and Molecular and Structural Biology, Institute Joǽef Stefan, Jamova 39, 1000 Ljubljana, Slovenia.

ABSTRACT
The refinement of a molecular model is a computational procedure by which the atomic model is fitted to the diffraction data. The commonly used target in the refinement of macromolecular structures is the maximum-likelihood (ML) function, which relies on the assessment of model errors. The current ML functions rely on cross-validation. They utilize phase-error estimates that are calculated from a small fraction of diffraction data, called the test set, that are not used to fit the model. An approach has been developed that uses the work set to calculate the phase-error estimates in the ML refinement from simulating the model errors via the random displacement of atomic coordinates. It is called ML free-kick refinement as it uses the ML formulation of the target function and is based on the idea of freeing the model from the model bias imposed by the chemical energy restraints used in refinement. This approach for the calculation of error estimates is superior to the cross-validation approach: it reduces the phase error and increases the accuracy of molecular models, is more robust, provides clearer maps and may use a smaller portion of data for the test set for the calculation of Rfree or may leave it out completely.

Show MeSH
Electron-density analysis of crambin refined at truncated resolution. The 2mFo − DFc electron density at the 1.0σ contour level of polyalanine models around residues Val15, Cys15, Arg17, Lys18 and Cys26 of the deposited structure of crambin. The map R factor along the crambin chain was calculated residue-by-residue between the Fc map of the final deposited model and the 2mFo − DFc map of the refined polyalanine model. The electron-density R factor ranges from 0.28 (blue) to 0.67 (red). (a) ML CV electron-density map using a 5% test set. (b) ML FK electron-density map using a 1% test set. (c) Residue-by-residue map R factor of ML using a 5% test set. (d) Residue-by-residue map R factor of ML FK using a 1% test set.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4257616&req=5

fig2: Electron-density analysis of crambin refined at truncated resolution. The 2mFo − DFc electron density at the 1.0σ contour level of polyalanine models around residues Val15, Cys15, Arg17, Lys18 and Cys26 of the deposited structure of crambin. The map R factor along the crambin chain was calculated residue-by-residue between the Fc map of the final deposited model and the 2mFo − DFc map of the refined polyalanine model. The electron-density R factor ranges from 0.28 (blue) to 0.67 (red). (a) ML CV electron-density map using a 5% test set. (b) ML FK electron-density map using a 1% test set. (c) Residue-by-residue map R factor of ML using a 5% test set. (d) Residue-by-residue map R factor of ML FK using a 1% test set.

Mentions: To analyze the accuracy of refinement of the three target functions, we compared the displacement of Cα atoms of the refined structures with those of the true structure. For this example, we chose the crambin structure PDB entry 1ejg (Jelsch et al., 2000 ▶) refined at a resolution of 0.54 Å, which makes it the macromolecule with the highest resolution in the entire PDB. The crambin amino-acid chain (Figs. 1 ▶a and 1 ▶c) and its polyalanine (Figs. 1 ▶b and 1 ▶d) model were refined using the ML CV, ML noCV and ML FK target functions against data truncated to 2.0 Å resolution with different fractions of test data. An overview of the coordinate (Figs. 1 ▶a and 1 ▶b) and phase (Figs. 1 ▶c and 1 ▶d) errors demonstrates that the ML CV target function strongly depends on the size of the test portion of data and that the lowest deviations from the reference structure are exhibited by the structures refined using the ML FK target. Coordinate errors were calculated by the root-mean-square distance (r.m.s.d.), whereas the phase errors were calculated by comparing the structure factors from the reference structure with the refined models. Among the structures refined using the ML CV target the smallest coordinate and phase errors were provided when the test portion contained at least 15% of the data (Figs. 1 ▶a and 1 ▶c). In contrast, the ML FK target does not exhibit such a strong test-set size dependence. When the whole crambin model (Fig. 1 ▶a) was tested, the ML FK refinement yielded an r.m.s.d. on Cα atoms of between 0.12 and 0.14 Å, whereas the deviations of the ML CV target ranged from 0.15 to 0.42 Å. For the polyalanine model, all target functions behaved worse than for the correct poly­peptide sequence (Figs. 1 ▶b and 1 ▶d). The ML FK refinements yielded an r.m.s.d. that ranged between 0.29 and 0.34 Å and phase errors that ranged from 47 to 48°. With the ML CV target, the refined structures yielded r.m.s.d.s from 0.32 to 0.36 Å and the phase errors ranged from 46 to 58°. Additionally, refinement of the polyalanine model shows the highest deviations for the ML noCV target. Interestingly, in this experiment the lowest coordinate error does not fully coincide with the lowest phase error; nevertheless, we felt that we should use the phase error in further analysis owing to its widespread use. To make the numerical analysis understandable in terms of the three-dimensional structure, two σA-weighted electron-density maps were calculated with the polyalanine model around residues Val15, Cys15, Arg17, Leu18 and Cys26 and were displayed on the background of the deposited structure of crambin (Figs. 2 ▶a and 2 ▶b). The chain trace of crambin is shown with the colour-coded real-space R factor of the maps (Figs. 2 ▶c and 2 ▶d). Evidently, the maps resulting from ML FK refinement and ML FK phase-error estimates for the weights of the structure factors in the maps are better connected, less noisy and have a lower real-space R factor, as indicated by the blue shift of Fig. 2 ▶(d) in comparison to Fig. 2 ▶(c).


Free kick instead of cross-validation in maximum-likelihood refinement of macromolecular crystal structures.

Pražnikar J, Turk D - Acta Crystallogr. D Biol. Crystallogr. (2014)

Electron-density analysis of crambin refined at truncated resolution. The 2mFo − DFc electron density at the 1.0σ contour level of polyalanine models around residues Val15, Cys15, Arg17, Lys18 and Cys26 of the deposited structure of crambin. The map R factor along the crambin chain was calculated residue-by-residue between the Fc map of the final deposited model and the 2mFo − DFc map of the refined polyalanine model. The electron-density R factor ranges from 0.28 (blue) to 0.67 (red). (a) ML CV electron-density map using a 5% test set. (b) ML FK electron-density map using a 1% test set. (c) Residue-by-residue map R factor of ML using a 5% test set. (d) Residue-by-residue map R factor of ML FK using a 1% test set.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4257616&req=5

fig2: Electron-density analysis of crambin refined at truncated resolution. The 2mFo − DFc electron density at the 1.0σ contour level of polyalanine models around residues Val15, Cys15, Arg17, Lys18 and Cys26 of the deposited structure of crambin. The map R factor along the crambin chain was calculated residue-by-residue between the Fc map of the final deposited model and the 2mFo − DFc map of the refined polyalanine model. The electron-density R factor ranges from 0.28 (blue) to 0.67 (red). (a) ML CV electron-density map using a 5% test set. (b) ML FK electron-density map using a 1% test set. (c) Residue-by-residue map R factor of ML using a 5% test set. (d) Residue-by-residue map R factor of ML FK using a 1% test set.
Mentions: To analyze the accuracy of refinement of the three target functions, we compared the displacement of Cα atoms of the refined structures with those of the true structure. For this example, we chose the crambin structure PDB entry 1ejg (Jelsch et al., 2000 ▶) refined at a resolution of 0.54 Å, which makes it the macromolecule with the highest resolution in the entire PDB. The crambin amino-acid chain (Figs. 1 ▶a and 1 ▶c) and its polyalanine (Figs. 1 ▶b and 1 ▶d) model were refined using the ML CV, ML noCV and ML FK target functions against data truncated to 2.0 Å resolution with different fractions of test data. An overview of the coordinate (Figs. 1 ▶a and 1 ▶b) and phase (Figs. 1 ▶c and 1 ▶d) errors demonstrates that the ML CV target function strongly depends on the size of the test portion of data and that the lowest deviations from the reference structure are exhibited by the structures refined using the ML FK target. Coordinate errors were calculated by the root-mean-square distance (r.m.s.d.), whereas the phase errors were calculated by comparing the structure factors from the reference structure with the refined models. Among the structures refined using the ML CV target the smallest coordinate and phase errors were provided when the test portion contained at least 15% of the data (Figs. 1 ▶a and 1 ▶c). In contrast, the ML FK target does not exhibit such a strong test-set size dependence. When the whole crambin model (Fig. 1 ▶a) was tested, the ML FK refinement yielded an r.m.s.d. on Cα atoms of between 0.12 and 0.14 Å, whereas the deviations of the ML CV target ranged from 0.15 to 0.42 Å. For the polyalanine model, all target functions behaved worse than for the correct poly­peptide sequence (Figs. 1 ▶b and 1 ▶d). The ML FK refinements yielded an r.m.s.d. that ranged between 0.29 and 0.34 Å and phase errors that ranged from 47 to 48°. With the ML CV target, the refined structures yielded r.m.s.d.s from 0.32 to 0.36 Å and the phase errors ranged from 46 to 58°. Additionally, refinement of the polyalanine model shows the highest deviations for the ML noCV target. Interestingly, in this experiment the lowest coordinate error does not fully coincide with the lowest phase error; nevertheless, we felt that we should use the phase error in further analysis owing to its widespread use. To make the numerical analysis understandable in terms of the three-dimensional structure, two σA-weighted electron-density maps were calculated with the polyalanine model around residues Val15, Cys15, Arg17, Leu18 and Cys26 and were displayed on the background of the deposited structure of crambin (Figs. 2 ▶a and 2 ▶b). The chain trace of crambin is shown with the colour-coded real-space R factor of the maps (Figs. 2 ▶c and 2 ▶d). Evidently, the maps resulting from ML FK refinement and ML FK phase-error estimates for the weights of the structure factors in the maps are better connected, less noisy and have a lower real-space R factor, as indicated by the blue shift of Fig. 2 ▶(d) in comparison to Fig. 2 ▶(c).

Bottom Line: They utilize phase-error estimates that are calculated from a small fraction of diffraction data, called the test set, that are not used to fit the model.It is called ML free-kick refinement as it uses the ML formulation of the target function and is based on the idea of freeing the model from the model bias imposed by the chemical energy restraints used in refinement.This approach for the calculation of error estimates is superior to the cross-validation approach: it reduces the phase error and increases the accuracy of molecular models, is more robust, provides clearer maps and may use a smaller portion of data for the test set for the calculation of Rfree or may leave it out completely.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biochemistry and Molecular and Structural Biology, Institute Joǽef Stefan, Jamova 39, 1000 Ljubljana, Slovenia.

ABSTRACT
The refinement of a molecular model is a computational procedure by which the atomic model is fitted to the diffraction data. The commonly used target in the refinement of macromolecular structures is the maximum-likelihood (ML) function, which relies on the assessment of model errors. The current ML functions rely on cross-validation. They utilize phase-error estimates that are calculated from a small fraction of diffraction data, called the test set, that are not used to fit the model. An approach has been developed that uses the work set to calculate the phase-error estimates in the ML refinement from simulating the model errors via the random displacement of atomic coordinates. It is called ML free-kick refinement as it uses the ML formulation of the target function and is based on the idea of freeing the model from the model bias imposed by the chemical energy restraints used in refinement. This approach for the calculation of error estimates is superior to the cross-validation approach: it reduces the phase error and increases the accuracy of molecular models, is more robust, provides clearer maps and may use a smaller portion of data for the test set for the calculation of Rfree or may leave it out completely.

Show MeSH