Limits...
Coarse-graining protein structures with local multivariate features from molecular dynamics.

Zhang Z, Wriggers W - J Phys Chem B (2008)

Bottom Line: This allows for an efficient implementation, but the sequential algorithm does not guarantee the optimal mutual correlation of the sequentially assigned features.Tests on MD trajectories of two biological systems, bacteriophage T4 lysozyme and myosin II motor domain S1, demonstrate that the new algorithm provides statistically reproducible results and describes functionally relevant dynamics.In addition to its use in structure classification, the proposed coarse-graining thus provides a localized measure of MD sampling efficiency.

View Article: PubMed Central - PubMed

Affiliation: School of Health Information Sciences, University of Texas Health Science Center at Houston, Houston, TX 77030, USA.

ABSTRACT
A multivariate statistical theory, local feature analysis (LFA), extracts functionally relevant domains from molecular dynamics (MD) trajectories. The LFA representations, like those of principal component analysis (PCA), are low dimensional and provide a reduced basis set for collective motions of simulated proteins, but the local features are sparsely distributed and spatially localized, in contrast to global PCA modes. One key problem in the assignment of local features is the coarse-graining of redundant LFA output functions by means of seed atoms. One can solve the combinatorial problem by adding seed atoms one after another to a growing set, minimizing a reconstruction error at each addition. This allows for an efficient implementation, but the sequential algorithm does not guarantee the optimal mutual correlation of the sequentially assigned features. Here, we present a novel coarse-graining algorithm for proteins that directly minimizes the mutual correlation of seed atoms by Monte Carlo (MC) simulations. Tests on MD trajectories of two biological systems, bacteriophage T4 lysozyme and myosin II motor domain S1, demonstrate that the new algorithm provides statistically reproducible results and describes functionally relevant dynamics. The well-known undersampling of large-scale motion by short MD simulations is apparent also in our model, but the new coarse-graining offers a major advantage over PCA; converged features are invariant across multiple windows of the trajectory, dividing the protein into converged regions and a smaller number of localized, undersampled regions. In addition to its use in structure classification, the proposed coarse-graining thus provides a localized measure of MD sampling efficiency.

Show MeSH
Locations and dynamics of the eight local features (n = 8) in T4L during the MD simulation. (a) Initial structure of the simulation (t = 0 ns), (b) t = 4 ns, and (c) t = 8.25 ns. The local dynamic domains are colored as those in Figure 3. Seed atoms are shown as spheres, and the protein in is white cartoon representation. Molecular graphics renderings were created with VMD.(36)
© Copyright Policy - open-access - ccc-price
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2651751&req=5

fig4: Locations and dynamics of the eight local features (n = 8) in T4L during the MD simulation. (a) Initial structure of the simulation (t = 0 ns), (b) t = 4 ns, and (c) t = 8.25 ns. The local dynamic domains are colored as those in Figure 3. Seed atoms are shown as spheres, and the protein in is white cartoon representation. Molecular graphics renderings were created with VMD.(36)

Mentions: In Figure 4, the eight dynamic domains for n = 8 are visualized in the initial structure and in two selected snapshots of the simulation. The two terminal dynamic domains, Cα-1 and Cα-159, reflect the well-known flexibility of the open-ended chains. T4L has two major structural domains which are connected by a long α-helix (Figure 4). Both experimental and theoretical studies reveal that T4L exhibits prominent open−close and twist motions between these two structural domains.29−32 On the basis of RMSF values and output correlation peaks, the three nonterminal domains, Cα-71 (the linking α-helix), Cα-90, and Cα-116 are less significant (termed minor) compared to the other three (termed major) nonterminal dynamic domains, Cα-22, Cα-52, and Cα-109. The two major dynamic domains, Cα-22 and Cα-109, include the cross-domain active site of T4L, which is the binding pocket with substrates. There are two hinge bending regions in the protein, which are located at both sides of the long α-helix (Cα-71). The major dynamic domain Cα-52 includes the hinge bending region near the N-terminus, and the minor dynamic domain Cα-90 comprises the hinge bending region near the C-terminus. Cα-52 is biologically more significant than Cα-90 because the known twist motion of the structural N-terminal domain relative to the structural C-terminal domain29−32 originates near Cα-52.


Coarse-graining protein structures with local multivariate features from molecular dynamics.

Zhang Z, Wriggers W - J Phys Chem B (2008)

Locations and dynamics of the eight local features (n = 8) in T4L during the MD simulation. (a) Initial structure of the simulation (t = 0 ns), (b) t = 4 ns, and (c) t = 8.25 ns. The local dynamic domains are colored as those in Figure 3. Seed atoms are shown as spheres, and the protein in is white cartoon representation. Molecular graphics renderings were created with VMD.(36)
© Copyright Policy - open-access - ccc-price
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2651751&req=5

fig4: Locations and dynamics of the eight local features (n = 8) in T4L during the MD simulation. (a) Initial structure of the simulation (t = 0 ns), (b) t = 4 ns, and (c) t = 8.25 ns. The local dynamic domains are colored as those in Figure 3. Seed atoms are shown as spheres, and the protein in is white cartoon representation. Molecular graphics renderings were created with VMD.(36)
Mentions: In Figure 4, the eight dynamic domains for n = 8 are visualized in the initial structure and in two selected snapshots of the simulation. The two terminal dynamic domains, Cα-1 and Cα-159, reflect the well-known flexibility of the open-ended chains. T4L has two major structural domains which are connected by a long α-helix (Figure 4). Both experimental and theoretical studies reveal that T4L exhibits prominent open−close and twist motions between these two structural domains.29−32 On the basis of RMSF values and output correlation peaks, the three nonterminal domains, Cα-71 (the linking α-helix), Cα-90, and Cα-116 are less significant (termed minor) compared to the other three (termed major) nonterminal dynamic domains, Cα-22, Cα-52, and Cα-109. The two major dynamic domains, Cα-22 and Cα-109, include the cross-domain active site of T4L, which is the binding pocket with substrates. There are two hinge bending regions in the protein, which are located at both sides of the long α-helix (Cα-71). The major dynamic domain Cα-52 includes the hinge bending region near the N-terminus, and the minor dynamic domain Cα-90 comprises the hinge bending region near the C-terminus. Cα-52 is biologically more significant than Cα-90 because the known twist motion of the structural N-terminal domain relative to the structural C-terminal domain29−32 originates near Cα-52.

Bottom Line: This allows for an efficient implementation, but the sequential algorithm does not guarantee the optimal mutual correlation of the sequentially assigned features.Tests on MD trajectories of two biological systems, bacteriophage T4 lysozyme and myosin II motor domain S1, demonstrate that the new algorithm provides statistically reproducible results and describes functionally relevant dynamics.In addition to its use in structure classification, the proposed coarse-graining thus provides a localized measure of MD sampling efficiency.

View Article: PubMed Central - PubMed

Affiliation: School of Health Information Sciences, University of Texas Health Science Center at Houston, Houston, TX 77030, USA.

ABSTRACT
A multivariate statistical theory, local feature analysis (LFA), extracts functionally relevant domains from molecular dynamics (MD) trajectories. The LFA representations, like those of principal component analysis (PCA), are low dimensional and provide a reduced basis set for collective motions of simulated proteins, but the local features are sparsely distributed and spatially localized, in contrast to global PCA modes. One key problem in the assignment of local features is the coarse-graining of redundant LFA output functions by means of seed atoms. One can solve the combinatorial problem by adding seed atoms one after another to a growing set, minimizing a reconstruction error at each addition. This allows for an efficient implementation, but the sequential algorithm does not guarantee the optimal mutual correlation of the sequentially assigned features. Here, we present a novel coarse-graining algorithm for proteins that directly minimizes the mutual correlation of seed atoms by Monte Carlo (MC) simulations. Tests on MD trajectories of two biological systems, bacteriophage T4 lysozyme and myosin II motor domain S1, demonstrate that the new algorithm provides statistically reproducible results and describes functionally relevant dynamics. The well-known undersampling of large-scale motion by short MD simulations is apparent also in our model, but the new coarse-graining offers a major advantage over PCA; converged features are invariant across multiple windows of the trajectory, dividing the protein into converged regions and a smaller number of localized, undersampled regions. In addition to its use in structure classification, the proposed coarse-graining thus provides a localized measure of MD sampling efficiency.

Show MeSH