Limits...
QAARM: quasi-anharmonic autoregressive model reveals molecular recognition pathways in ubiquitin.

Savol AJ, Burger VM, Agarwal PK, Ramanathan A, Chennubhotla CS - Bioinformatics (2011)

Bottom Line: Molecular dynamics (MD) simulations have dramatically improved the atomistic understanding of protein motions, energetics and function.Observing that such events give rise to long-tailed spatial distributions, we recently developed a higher-order statistics based dimensionality reduction method, called quasi-anharmonic analysis (QAA), for identifying biophysically-relevant reaction coordinates and substates within MD simulations.We show the learned model can be extrapolated to synthesize trajectories of arbitrary length. ramanathana@ornl.gov; chakracs@pitt.edu.

View Article: PubMed Central - PubMed

Affiliation: Joint Carnegie Mellon University-University of Pittsburgh Ph.D. Program in Computational Biology, Department of Computational and Systems Biology, University of Pittsburgh, PA 15260, USA.

ABSTRACT

Motivation: Molecular dynamics (MD) simulations have dramatically improved the atomistic understanding of protein motions, energetics and function. These growing datasets have necessitated a corresponding emphasis on trajectory analysis methods for characterizing simulation data, particularly since functional protein motions and transitions are often rare and/or intricate events. Observing that such events give rise to long-tailed spatial distributions, we recently developed a higher-order statistics based dimensionality reduction method, called quasi-anharmonic analysis (QAA), for identifying biophysically-relevant reaction coordinates and substates within MD simulations. Further characterization of conformation space should consider the temporal dynamics specific to each identified substate.

Results: Our model uses hierarchical clustering to learn energetically coherent substates and dynamic modes of motion from a 0.5 μs ubiqutin simulation. Autoregressive (AR) modeling within and between states enables a compact and generative description of the conformational landscape as it relates to functional transitions between binding poses. Lacking a predictive component, QAA is extended here within a general AR model appreciative of the trajectory's temporal dependencies and the specific, local dynamics accessible to a protein within identified energy wells. These metastable states and their transition rates are extracted within a QAA-derived subspace using hierarchical Markov clustering to provide parameter sets for the second-order AR model. We show the learned model can be extrapolated to synthesize trajectories of arbitrary length.

Contact: ramanathana@ornl.gov; chakracs@pitt.edu.

Show MeSH
Representative transition matrices are highly diagonal: (A) A1 and A2 for the most populated cluster, cluster 2, which contained 29 108 structures or 5.84% of the entire 0.5 μs simulation. Cross correlations between QAA modes are highly reduced, yielding low off-diagonal elements. Distinctions between A1 and A2 indicate the constituent structures (from cluster 2) carried dynamic information across multiple frames. The lower two panels show less strongly diagonal transition matrices for a less populated cluster, 67, which contained 627 structures. Elements of A1 and A2 range from −0.84 to 0.72 over all clusters (−0.33 to 0.5597 over clusters 2 and 67). (B) Cluster memberships for MD training data (black) and AR-synthesized (red) ubiquitin conformations, 10 000 frames each.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3117343&req=5

Figure 4: Representative transition matrices are highly diagonal: (A) A1 and A2 for the most populated cluster, cluster 2, which contained 29 108 structures or 5.84% of the entire 0.5 μs simulation. Cross correlations between QAA modes are highly reduced, yielding low off-diagonal elements. Distinctions between A1 and A2 indicate the constituent structures (from cluster 2) carried dynamic information across multiple frames. The lower two panels show less strongly diagonal transition matrices for a less populated cluster, 67, which contained 627 structures. Elements of A1 and A2 range from −0.84 to 0.72 over all clusters (−0.33 to 0.5597 over clusters 2 and 67). (B) Cluster memberships for MD training data (black) and AR-synthesized (red) ubiquitin conformations, 10 000 frames each.

Mentions: The dynamical model, Equation (7), exploits our knowledge of past states (conformations) to propose a future state. Before we can compute transition matrices A1 and A2 from training data, we first project the ubiquitin simulation into the embedded 30-dimensional QAA-space to yield training states :(8)where columns of X, , are 3N vectors carrying the protein's coordinates (for N residues). Following the derivation put forward in Hyndman (2007) the AR model is defined sequentially over the weights:(9)with unknowns A1 and A2. We concatenate state vectors and transition matrices with the notation and A≡[A1  A2] to express the system in matrix form:(10)The total squared error between the true states and the predicted states is minimized with the Frobenius norm ‖·‖F:(11)Generally the state subspace is much smaller than the number of observations (training simulation frames), so Wi,j is rarely square. The solution to (11) then follows:(12)where F*≡FT(FFT)−1 denotes the pseudo-inverse of a matrix F. Representative A1 and A2 matrices are shown in Figure 4A. The stochastic term, , represents those dynamics that are inadequately captured by the second-order linear model, and is drawn from a Gaussian distribution with covariance equal to that of the prediction error averaged over the training sequence. That is,(13)Interpreted physically, each A1 and A2 pair encodes the local, time invariant dynamics. The eigen-decomposition of yields the exponential decay constants for these local dynamics, where λm<1 denotes any positive eigenvalue (Fig. 4B).Fig. 4.


QAARM: quasi-anharmonic autoregressive model reveals molecular recognition pathways in ubiquitin.

Savol AJ, Burger VM, Agarwal PK, Ramanathan A, Chennubhotla CS - Bioinformatics (2011)

Representative transition matrices are highly diagonal: (A) A1 and A2 for the most populated cluster, cluster 2, which contained 29 108 structures or 5.84% of the entire 0.5 μs simulation. Cross correlations between QAA modes are highly reduced, yielding low off-diagonal elements. Distinctions between A1 and A2 indicate the constituent structures (from cluster 2) carried dynamic information across multiple frames. The lower two panels show less strongly diagonal transition matrices for a less populated cluster, 67, which contained 627 structures. Elements of A1 and A2 range from −0.84 to 0.72 over all clusters (−0.33 to 0.5597 over clusters 2 and 67). (B) Cluster memberships for MD training data (black) and AR-synthesized (red) ubiquitin conformations, 10 000 frames each.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3117343&req=5

Figure 4: Representative transition matrices are highly diagonal: (A) A1 and A2 for the most populated cluster, cluster 2, which contained 29 108 structures or 5.84% of the entire 0.5 μs simulation. Cross correlations between QAA modes are highly reduced, yielding low off-diagonal elements. Distinctions between A1 and A2 indicate the constituent structures (from cluster 2) carried dynamic information across multiple frames. The lower two panels show less strongly diagonal transition matrices for a less populated cluster, 67, which contained 627 structures. Elements of A1 and A2 range from −0.84 to 0.72 over all clusters (−0.33 to 0.5597 over clusters 2 and 67). (B) Cluster memberships for MD training data (black) and AR-synthesized (red) ubiquitin conformations, 10 000 frames each.
Mentions: The dynamical model, Equation (7), exploits our knowledge of past states (conformations) to propose a future state. Before we can compute transition matrices A1 and A2 from training data, we first project the ubiquitin simulation into the embedded 30-dimensional QAA-space to yield training states :(8)where columns of X, , are 3N vectors carrying the protein's coordinates (for N residues). Following the derivation put forward in Hyndman (2007) the AR model is defined sequentially over the weights:(9)with unknowns A1 and A2. We concatenate state vectors and transition matrices with the notation and A≡[A1  A2] to express the system in matrix form:(10)The total squared error between the true states and the predicted states is minimized with the Frobenius norm ‖·‖F:(11)Generally the state subspace is much smaller than the number of observations (training simulation frames), so Wi,j is rarely square. The solution to (11) then follows:(12)where F*≡FT(FFT)−1 denotes the pseudo-inverse of a matrix F. Representative A1 and A2 matrices are shown in Figure 4A. The stochastic term, , represents those dynamics that are inadequately captured by the second-order linear model, and is drawn from a Gaussian distribution with covariance equal to that of the prediction error averaged over the training sequence. That is,(13)Interpreted physically, each A1 and A2 pair encodes the local, time invariant dynamics. The eigen-decomposition of yields the exponential decay constants for these local dynamics, where λm<1 denotes any positive eigenvalue (Fig. 4B).Fig. 4.

Bottom Line: Molecular dynamics (MD) simulations have dramatically improved the atomistic understanding of protein motions, energetics and function.Observing that such events give rise to long-tailed spatial distributions, we recently developed a higher-order statistics based dimensionality reduction method, called quasi-anharmonic analysis (QAA), for identifying biophysically-relevant reaction coordinates and substates within MD simulations.We show the learned model can be extrapolated to synthesize trajectories of arbitrary length. ramanathana@ornl.gov; chakracs@pitt.edu.

View Article: PubMed Central - PubMed

Affiliation: Joint Carnegie Mellon University-University of Pittsburgh Ph.D. Program in Computational Biology, Department of Computational and Systems Biology, University of Pittsburgh, PA 15260, USA.

ABSTRACT

Motivation: Molecular dynamics (MD) simulations have dramatically improved the atomistic understanding of protein motions, energetics and function. These growing datasets have necessitated a corresponding emphasis on trajectory analysis methods for characterizing simulation data, particularly since functional protein motions and transitions are often rare and/or intricate events. Observing that such events give rise to long-tailed spatial distributions, we recently developed a higher-order statistics based dimensionality reduction method, called quasi-anharmonic analysis (QAA), for identifying biophysically-relevant reaction coordinates and substates within MD simulations. Further characterization of conformation space should consider the temporal dynamics specific to each identified substate.

Results: Our model uses hierarchical clustering to learn energetically coherent substates and dynamic modes of motion from a 0.5 μs ubiqutin simulation. Autoregressive (AR) modeling within and between states enables a compact and generative description of the conformational landscape as it relates to functional transitions between binding poses. Lacking a predictive component, QAA is extended here within a general AR model appreciative of the trajectory's temporal dependencies and the specific, local dynamics accessible to a protein within identified energy wells. These metastable states and their transition rates are extracted within a QAA-derived subspace using hierarchical Markov clustering to provide parameter sets for the second-order AR model. We show the learned model can be extrapolated to synthesize trajectories of arbitrary length.

Contact: ramanathana@ornl.gov; chakracs@pitt.edu.

Show MeSH