Limits...
Free kick instead of cross-validation in maximum-likelihood refinement of macromolecular crystal structures.

Pražnikar J, Turk D - Acta Crystallogr. D Biol. Crystallogr. (2014)

Bottom Line: The refinement of a molecular model is a computational procedure by which the atomic model is fitted to the diffraction data.They utilize phase-error estimates that are calculated from a small fraction of diffraction data, called the test set, that are not used to fit the model.An approach has been developed that uses the work set to calculate the phase-error estimates in the ML refinement from simulating the model errors via the random displacement of atomic coordinates.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biochemistry and Molecular and Structural Biology, Institute Joǽef Stefan, Jamova 39, 1000 Ljubljana, Slovenia.

ABSTRACT
The refinement of a molecular model is a computational procedure by which the atomic model is fitted to the diffraction data. The commonly used target in the refinement of macromolecular structures is the maximum-likelihood (ML) function, which relies on the assessment of model errors. The current ML functions rely on cross-validation. They utilize phase-error estimates that are calculated from a small fraction of diffraction data, called the test set, that are not used to fit the model. An approach has been developed that uses the work set to calculate the phase-error estimates in the ML refinement from simulating the model errors via the random displacement of atomic coordinates. It is called ML free-kick refinement as it uses the ML formulation of the target function and is based on the idea of freeing the model from the model bias imposed by the chemical energy restraints used in refinement. This approach for the calculation of error estimates is superior to the cross-validation approach: it reduces the phase error and increases the accuracy of molecular models, is more robust, provides clearer maps and may use a smaller portion of data for the test set for the calculation of Rfree or may leave it out completely.

Show MeSH

Related in: MedlinePlus

Distribution of phase errors and of Rwork. The graphs show the distribution of phase errors and of Rwork after refinement at 3.0 Å resolution for 31 different test sets. Red dashed lines show the starting phase error of the model. ML CV (a–e) and ML FK (f–j) refinement target functions were used. The test-set sizes are 1, 2, 5 and 10%. On the graphs they are denoted T1, T2, T5 and T10, respectively. The cases used are cherry allergen (PDB entry 2ahn) (a, f), stefin B tetramer (2oct) (b, g), cathepsin H (8pch) (c, h), ammodytin L (3dih) (d, i) and choline acetyltransferase (2fy2) (e, j).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4257616&req=5

fig4: Distribution of phase errors and of Rwork. The graphs show the distribution of phase errors and of Rwork after refinement at 3.0 Å resolution for 31 different test sets. Red dashed lines show the starting phase error of the model. ML CV (a–e) and ML FK (f–j) refinement target functions were used. The test-set sizes are 1, 2, 5 and 10%. On the graphs they are denoted T1, T2, T5 and T10, respectively. The cases used are cherry allergen (PDB entry 2ahn) (a, f), stefin B tetramer (2oct) (b, g), cathepsin H (8pch) (c, h), ammodytin L (3dih) (d, i) and choline acetyltransferase (2fy2) (e, j).

Mentions: To analyze the robustness and convergence of the target functions in refinement, we chose five cases starting with molecular-replacement solutions. Analysis of the phase errors of the refined molecular-replacement models show that the phase errors and variability of structures refined with the ML FK approach are lower in all cases (Fig. 4 ▶). Fig. 4 ▶ also reveals the general trend of the ML FK function: the size of the work set negatively correlates with the phase error. This relationship is not evident for the ML CV approach, where a 10% size of the test set resulted in the lowest phase error in one instance (Fig. 4 ▶d). Concerning the distribution of the final phase errors, the small size of the test set, on which the scaling of the ML CV approach depends, evidently produces much variation. Comparison of Fig. 4 ▶ with Table 1 ▶ indicates that the spread of phase errors is larger with fewer data in the test set in the ML CV approach. This comparison also makes evident that the spreads of the phase errors of the largest test sets (10% of the data) of the ML CV cases are notably larger than for the ML FK cases. The narrowest spread of phase errors for the 2fy2 case with the largest test-set sizes also reflects the fact that in this case the starting molecular-replacement model was most similar in structure and sequence to the final structure.


Free kick instead of cross-validation in maximum-likelihood refinement of macromolecular crystal structures.

Pražnikar J, Turk D - Acta Crystallogr. D Biol. Crystallogr. (2014)

Distribution of phase errors and of Rwork. The graphs show the distribution of phase errors and of Rwork after refinement at 3.0 Å resolution for 31 different test sets. Red dashed lines show the starting phase error of the model. ML CV (a–e) and ML FK (f–j) refinement target functions were used. The test-set sizes are 1, 2, 5 and 10%. On the graphs they are denoted T1, T2, T5 and T10, respectively. The cases used are cherry allergen (PDB entry 2ahn) (a, f), stefin B tetramer (2oct) (b, g), cathepsin H (8pch) (c, h), ammodytin L (3dih) (d, i) and choline acetyltransferase (2fy2) (e, j).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4257616&req=5

fig4: Distribution of phase errors and of Rwork. The graphs show the distribution of phase errors and of Rwork after refinement at 3.0 Å resolution for 31 different test sets. Red dashed lines show the starting phase error of the model. ML CV (a–e) and ML FK (f–j) refinement target functions were used. The test-set sizes are 1, 2, 5 and 10%. On the graphs they are denoted T1, T2, T5 and T10, respectively. The cases used are cherry allergen (PDB entry 2ahn) (a, f), stefin B tetramer (2oct) (b, g), cathepsin H (8pch) (c, h), ammodytin L (3dih) (d, i) and choline acetyltransferase (2fy2) (e, j).
Mentions: To analyze the robustness and convergence of the target functions in refinement, we chose five cases starting with molecular-replacement solutions. Analysis of the phase errors of the refined molecular-replacement models show that the phase errors and variability of structures refined with the ML FK approach are lower in all cases (Fig. 4 ▶). Fig. 4 ▶ also reveals the general trend of the ML FK function: the size of the work set negatively correlates with the phase error. This relationship is not evident for the ML CV approach, where a 10% size of the test set resulted in the lowest phase error in one instance (Fig. 4 ▶d). Concerning the distribution of the final phase errors, the small size of the test set, on which the scaling of the ML CV approach depends, evidently produces much variation. Comparison of Fig. 4 ▶ with Table 1 ▶ indicates that the spread of phase errors is larger with fewer data in the test set in the ML CV approach. This comparison also makes evident that the spreads of the phase errors of the largest test sets (10% of the data) of the ML CV cases are notably larger than for the ML FK cases. The narrowest spread of phase errors for the 2fy2 case with the largest test-set sizes also reflects the fact that in this case the starting molecular-replacement model was most similar in structure and sequence to the final structure.

Bottom Line: The refinement of a molecular model is a computational procedure by which the atomic model is fitted to the diffraction data.They utilize phase-error estimates that are calculated from a small fraction of diffraction data, called the test set, that are not used to fit the model.An approach has been developed that uses the work set to calculate the phase-error estimates in the ML refinement from simulating the model errors via the random displacement of atomic coordinates.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biochemistry and Molecular and Structural Biology, Institute Joǽef Stefan, Jamova 39, 1000 Ljubljana, Slovenia.

ABSTRACT
The refinement of a molecular model is a computational procedure by which the atomic model is fitted to the diffraction data. The commonly used target in the refinement of macromolecular structures is the maximum-likelihood (ML) function, which relies on the assessment of model errors. The current ML functions rely on cross-validation. They utilize phase-error estimates that are calculated from a small fraction of diffraction data, called the test set, that are not used to fit the model. An approach has been developed that uses the work set to calculate the phase-error estimates in the ML refinement from simulating the model errors via the random displacement of atomic coordinates. It is called ML free-kick refinement as it uses the ML formulation of the target function and is based on the idea of freeing the model from the model bias imposed by the chemical energy restraints used in refinement. This approach for the calculation of error estimates is superior to the cross-validation approach: it reduces the phase error and increases the accuracy of molecular models, is more robust, provides clearer maps and may use a smaller portion of data for the test set for the calculation of Rfree or may leave it out completely.

Show MeSH
Related in: MedlinePlus