Limits...
Automatic Selection of Order Parameters in the Analysis of Large Scale Molecular Dynamics Simulations.

Sultan MM, Kiss G, Shukla D, Pande VS - J Chem Theory Comput (2014)

Bottom Line: We address this challenge by introducing a method called clustering based feature selection (CB-FS) that employs a posterior analysis approach.It combines supervised machine learning (SML) and feature selection with Markov state models to automatically identify the relevant degrees of freedom that separate conformational states.We highlight the utility of the method in the evaluation of large-scale simulations and show that it can be used for the rapid and automated identification of relevant order parameters involved in the functional transitions of two exemplary cell-signaling proteins central to human disease states.

View Article: PubMed Central - PubMed

Affiliation: Department of Chemistry, Stanford University , 318 Campus Drive, Stanford, California 94305, United States.

ABSTRACT

Given the large number of crystal structures and NMR ensembles that have been solved to date, classical molecular dynamics (MD) simulations have become powerful tools in the atomistic study of the kinetics and thermodynamics of biomolecular systems on ever increasing time scales. By virtue of the high-dimensional conformational state space that is explored, the interpretation of large-scale simulations faces difficulties not unlike those in the big data community. We address this challenge by introducing a method called clustering based feature selection (CB-FS) that employs a posterior analysis approach. It combines supervised machine learning (SML) and feature selection with Markov state models to automatically identify the relevant degrees of freedom that separate conformational states. We highlight the utility of the method in the evaluation of large-scale simulations and show that it can be used for the rapid and automated identification of relevant order parameters involved in the functional transitions of two exemplary cell-signaling proteins central to human disease states.

No MeSH data available.


A) Results from buildinga random forest classifier on a two statehidden Markov model of human ubiquitin. The fifth and 12th dihedralscorrespond to up and down conformations of the loop and the errorbars are from the different DTs in the ensemble. B) Two state behaviorof fifth ψ dihedral in the two states. C) The two H-bonds thatstabilize the loop in the up (red) state. D) Histogram showing thelength of the H-Bond between Glu33 and Lys10.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4263461&req=5

fig3: A) Results from buildinga random forest classifier on a two statehidden Markov model of human ubiquitin. The fifth and 12th dihedralscorrespond to up and down conformations of the loop and the errorbars are from the different DTs in the ensemble. B) Two state behaviorof fifth ψ dihedral in the two states. C) The two H-bonds thatstabilize the loop in the up (red) state. D) Histogram showing thelength of the H-Bond between Glu33 and Lys10.

Mentions: We testedthe performance of CB-FS in thecontext of a biological system and analyzed the dynamics of humanubiquitin–a signaling hub protein that connects multiple cellularpathways.28,29 Its misregulation has been implicated innumerous pathologies, including neurodegeneration and tumor progression.A two state model was generated from an aggregate 100 μs ofsimulation data using a hidden Markov model formalism.30 It discerns two distinct conformations of afunctionally selective loop (Figure 3a inset)and provides us with insights into the degrees of freedom that correspondto this conformational change. 800 structures were randomly pulledfrom the two states and further analyzed with CB-FS. Two differentvectorized representations, dihedral angles and hydrogen bond networks,were used to break down the states. Two random forest classifierswith 40 trees each and a maximum depth of 4 for the dihedrals featuresand a maximum depth of 7 for hydrogen bond networks were trained.The results are shown in Figure 3. The H-bondrandom forest revealed two important interactions. A backbone hydrogenbond between K10 and T6 breaks as the system switches to state 2 (orange).The H-bond network (Figure 3c) also revealedthe functionally important interaction between the side chains ofK10 and E33. The finding is in line with previous work that experimentallyvalidated the significance of the K10-E33 contact.29 The mutation of K10 into a neutral residue gives a markedlyincreased pKa of E33. Further work byWickliffe et al.31 and Bremm et al.32 showed that this noncovalent interaction isimportant for orienting the K10 in a position suitable for selectionby the Ube2s enzyme via substrate-assisted catalysis.


Automatic Selection of Order Parameters in the Analysis of Large Scale Molecular Dynamics Simulations.

Sultan MM, Kiss G, Shukla D, Pande VS - J Chem Theory Comput (2014)

A) Results from buildinga random forest classifier on a two statehidden Markov model of human ubiquitin. The fifth and 12th dihedralscorrespond to up and down conformations of the loop and the errorbars are from the different DTs in the ensemble. B) Two state behaviorof fifth ψ dihedral in the two states. C) The two H-bonds thatstabilize the loop in the up (red) state. D) Histogram showing thelength of the H-Bond between Glu33 and Lys10.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4263461&req=5

fig3: A) Results from buildinga random forest classifier on a two statehidden Markov model of human ubiquitin. The fifth and 12th dihedralscorrespond to up and down conformations of the loop and the errorbars are from the different DTs in the ensemble. B) Two state behaviorof fifth ψ dihedral in the two states. C) The two H-bonds thatstabilize the loop in the up (red) state. D) Histogram showing thelength of the H-Bond between Glu33 and Lys10.
Mentions: We testedthe performance of CB-FS in thecontext of a biological system and analyzed the dynamics of humanubiquitin–a signaling hub protein that connects multiple cellularpathways.28,29 Its misregulation has been implicated innumerous pathologies, including neurodegeneration and tumor progression.A two state model was generated from an aggregate 100 μs ofsimulation data using a hidden Markov model formalism.30 It discerns two distinct conformations of afunctionally selective loop (Figure 3a inset)and provides us with insights into the degrees of freedom that correspondto this conformational change. 800 structures were randomly pulledfrom the two states and further analyzed with CB-FS. Two differentvectorized representations, dihedral angles and hydrogen bond networks,were used to break down the states. Two random forest classifierswith 40 trees each and a maximum depth of 4 for the dihedrals featuresand a maximum depth of 7 for hydrogen bond networks were trained.The results are shown in Figure 3. The H-bondrandom forest revealed two important interactions. A backbone hydrogenbond between K10 and T6 breaks as the system switches to state 2 (orange).The H-bond network (Figure 3c) also revealedthe functionally important interaction between the side chains ofK10 and E33. The finding is in line with previous work that experimentallyvalidated the significance of the K10-E33 contact.29 The mutation of K10 into a neutral residue gives a markedlyincreased pKa of E33. Further work byWickliffe et al.31 and Bremm et al.32 showed that this noncovalent interaction isimportant for orienting the K10 in a position suitable for selectionby the Ube2s enzyme via substrate-assisted catalysis.

Bottom Line: We address this challenge by introducing a method called clustering based feature selection (CB-FS) that employs a posterior analysis approach.It combines supervised machine learning (SML) and feature selection with Markov state models to automatically identify the relevant degrees of freedom that separate conformational states.We highlight the utility of the method in the evaluation of large-scale simulations and show that it can be used for the rapid and automated identification of relevant order parameters involved in the functional transitions of two exemplary cell-signaling proteins central to human disease states.

View Article: PubMed Central - PubMed

Affiliation: Department of Chemistry, Stanford University , 318 Campus Drive, Stanford, California 94305, United States.

ABSTRACT

Given the large number of crystal structures and NMR ensembles that have been solved to date, classical molecular dynamics (MD) simulations have become powerful tools in the atomistic study of the kinetics and thermodynamics of biomolecular systems on ever increasing time scales. By virtue of the high-dimensional conformational state space that is explored, the interpretation of large-scale simulations faces difficulties not unlike those in the big data community. We address this challenge by introducing a method called clustering based feature selection (CB-FS) that employs a posterior analysis approach. It combines supervised machine learning (SML) and feature selection with Markov state models to automatically identify the relevant degrees of freedom that separate conformational states. We highlight the utility of the method in the evaluation of large-scale simulations and show that it can be used for the rapid and automated identification of relevant order parameters involved in the functional transitions of two exemplary cell-signaling proteins central to human disease states.

No MeSH data available.