Limits...
Improving consensus structure by eliminating averaging artifacts.

Dukka BK - BMC Struct. Biol. (2009)

Bottom Line: Finally, compared to PULCHRA 1, our approach produces representative structure of similar RMSD quality, but with much fewer clashes.The benchmarking results demonstrate that our approach for removing averaging artifacts can be very beneficial for the structural biology community.Namely, structure averaging is also commonly performed in RNA secondary prediction 2, which could also benefit from our approach.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, USA. dbahadur@uncc.edu

ABSTRACT

Background: Common structural biology methods (i.e., NMR and molecular dynamics) often produce ensembles of molecular structures. Consequently, averaging of 3D coordinates of molecular structures (proteins and RNA) is a frequent approach to obtain a consensus structure that is representative of the ensemble. However, when the structures are averaged, artifacts can result in unrealistic local geometries, including unphysical bond lengths and angles.

Results: Herein, we describe a method to derive representative structures while limiting the number of artifacts. Our approach is based on a Monte Carlo simulation technique that drives a starting structure (an extended or a 'close-by' structure) towards the 'averaged structure' using a harmonic pseudo energy function. To assess the performance of the algorithm, we applied our approach to C alpha models of 1364 proteins generated by the TASSER structure prediction algorithm. The average RMSD of the refined model from the native structure for the set becomes worse by a mere 0.08 A compared to the average RMSD of the averaged structures from the native structure (3.28 A for refined structures and 3.36 A for the averaged structures). However, the percentage of atoms involved in clashes is greatly reduced (from 63% to 1%); in fact, the majority of the refined proteins had zero clashes. Moreover, a small number (38) of refined structures resulted in lower RMSD to the native protein versus the averaged structure. Finally, compared to PULCHRA 1, our approach produces representative structure of similar RMSD quality, but with much fewer clashes.

Conclusion: The benchmarking results demonstrate that our approach for removing averaging artifacts can be very beneficial for the structural biology community. Furthermore, the same approach can be applied to almost any problem where averaging of 3D coordinates is performed. Namely, structure averaging is also commonly performed in RNA secondary prediction 2, which could also benefit from our approach.

Show MeSH
Comparison of COMBO models and MCORE models. (a) Scatter plot of the Cα RMSD of combo models and respective MCORE refined models for a set of 1364 proteins compared to corresponding native structure (b) Density plot of distribution of RMSD deviations between MCORE structures and the corresponding COMBO structures. c) Scatter plot of the fraction of atoms involved clashes in COMBO models and corresponding refined models for the same set.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2662860&req=5

Figure 3: Comparison of COMBO models and MCORE models. (a) Scatter plot of the Cα RMSD of combo models and respective MCORE refined models for a set of 1364 proteins compared to corresponding native structure (b) Density plot of distribution of RMSD deviations between MCORE structures and the corresponding COMBO structures. c) Scatter plot of the fraction of atoms involved clashes in COMBO models and corresponding refined models for the same set.

Mentions: Upon application of the MCORE algorithm to the refine the COMBO models, it is found that the refined representative structures have similar RMSD values to native structures, but with far fewer unphysical characteristics. Figure 3(a) plots the RMSD values of the MCORE to native comparisons versus the COMBO to native comparisons, which demonstrates a strong linear correlation between the two methods. The RMSD values of the COMBO models are only slightly better than those from MCORE. This point is reinforced by Figure 3(b), which plots the density of RMSD differences between the methods. The majority of RMSD differences are slightly less than 0.5 Å. In addition, 38 MCORE refined structures had even better RMSD than their corresponding COMBO model. Figure 3(c) plots fraction of clashes within the MCORE vs. COMBO models. Clearly, the MCORE models have far fewer clashes than their COMBO counterparts. The average percentage of clashes in MCORE refined models is 1.09%, whereas the average percentage of clashes in COMBO models is 63.0%.


Improving consensus structure by eliminating averaging artifacts.

Dukka BK - BMC Struct. Biol. (2009)

Comparison of COMBO models and MCORE models. (a) Scatter plot of the Cα RMSD of combo models and respective MCORE refined models for a set of 1364 proteins compared to corresponding native structure (b) Density plot of distribution of RMSD deviations between MCORE structures and the corresponding COMBO structures. c) Scatter plot of the fraction of atoms involved clashes in COMBO models and corresponding refined models for the same set.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2662860&req=5

Figure 3: Comparison of COMBO models and MCORE models. (a) Scatter plot of the Cα RMSD of combo models and respective MCORE refined models for a set of 1364 proteins compared to corresponding native structure (b) Density plot of distribution of RMSD deviations between MCORE structures and the corresponding COMBO structures. c) Scatter plot of the fraction of atoms involved clashes in COMBO models and corresponding refined models for the same set.
Mentions: Upon application of the MCORE algorithm to the refine the COMBO models, it is found that the refined representative structures have similar RMSD values to native structures, but with far fewer unphysical characteristics. Figure 3(a) plots the RMSD values of the MCORE to native comparisons versus the COMBO to native comparisons, which demonstrates a strong linear correlation between the two methods. The RMSD values of the COMBO models are only slightly better than those from MCORE. This point is reinforced by Figure 3(b), which plots the density of RMSD differences between the methods. The majority of RMSD differences are slightly less than 0.5 Å. In addition, 38 MCORE refined structures had even better RMSD than their corresponding COMBO model. Figure 3(c) plots fraction of clashes within the MCORE vs. COMBO models. Clearly, the MCORE models have far fewer clashes than their COMBO counterparts. The average percentage of clashes in MCORE refined models is 1.09%, whereas the average percentage of clashes in COMBO models is 63.0%.

Bottom Line: Finally, compared to PULCHRA 1, our approach produces representative structure of similar RMSD quality, but with much fewer clashes.The benchmarking results demonstrate that our approach for removing averaging artifacts can be very beneficial for the structural biology community.Namely, structure averaging is also commonly performed in RNA secondary prediction 2, which could also benefit from our approach.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, USA. dbahadur@uncc.edu

ABSTRACT

Background: Common structural biology methods (i.e., NMR and molecular dynamics) often produce ensembles of molecular structures. Consequently, averaging of 3D coordinates of molecular structures (proteins and RNA) is a frequent approach to obtain a consensus structure that is representative of the ensemble. However, when the structures are averaged, artifacts can result in unrealistic local geometries, including unphysical bond lengths and angles.

Results: Herein, we describe a method to derive representative structures while limiting the number of artifacts. Our approach is based on a Monte Carlo simulation technique that drives a starting structure (an extended or a 'close-by' structure) towards the 'averaged structure' using a harmonic pseudo energy function. To assess the performance of the algorithm, we applied our approach to C alpha models of 1364 proteins generated by the TASSER structure prediction algorithm. The average RMSD of the refined model from the native structure for the set becomes worse by a mere 0.08 A compared to the average RMSD of the averaged structures from the native structure (3.28 A for refined structures and 3.36 A for the averaged structures). However, the percentage of atoms involved in clashes is greatly reduced (from 63% to 1%); in fact, the majority of the refined proteins had zero clashes. Moreover, a small number (38) of refined structures resulted in lower RMSD to the native protein versus the averaged structure. Finally, compared to PULCHRA 1, our approach produces representative structure of similar RMSD quality, but with much fewer clashes.

Conclusion: The benchmarking results demonstrate that our approach for removing averaging artifacts can be very beneficial for the structural biology community. Furthermore, the same approach can be applied to almost any problem where averaging of 3D coordinates is performed. Namely, structure averaging is also commonly performed in RNA secondary prediction 2, which could also benefit from our approach.

Show MeSH