Limits...
Automatic Selection of Order Parameters in the Analysis of Large Scale Molecular Dynamics Simulations.

Sultan MM, Kiss G, Shukla D, Pande VS - J Chem Theory Comput (2014)

Bottom Line: We address this challenge by introducing a method called clustering based feature selection (CB-FS) that employs a posterior analysis approach.It combines supervised machine learning (SML) and feature selection with Markov state models to automatically identify the relevant degrees of freedom that separate conformational states.We highlight the utility of the method in the evaluation of large-scale simulations and show that it can be used for the rapid and automated identification of relevant order parameters involved in the functional transitions of two exemplary cell-signaling proteins central to human disease states.

View Article: PubMed Central - PubMed

Affiliation: Department of Chemistry, Stanford University , 318 Campus Drive, Stanford, California 94305, United States.

ABSTRACT

Given the large number of crystal structures and NMR ensembles that have been solved to date, classical molecular dynamics (MD) simulations have become powerful tools in the atomistic study of the kinetics and thermodynamics of biomolecular systems on ever increasing time scales. By virtue of the high-dimensional conformational state space that is explored, the interpretation of large-scale simulations faces difficulties not unlike those in the big data community. We address this challenge by introducing a method called clustering based feature selection (CB-FS) that employs a posterior analysis approach. It combines supervised machine learning (SML) and feature selection with Markov state models to automatically identify the relevant degrees of freedom that separate conformational states. We highlight the utility of the method in the evaluation of large-scale simulations and show that it can be used for the rapid and automated identification of relevant order parameters involved in the functional transitions of two exemplary cell-signaling proteins central to human disease states.

No MeSH data available.


Toy example that highlightsthe advantage of using a decision treeover correlation metrics (mutual information (MI) and Pearson correlation(r)). The image on the left depicts a two state model(red and blue dots) with the learned decision boundaries in blackand white. To the right are the equations that define the two comparisonmetrics. H(F) is the entropy ofa given feature, H(S) is the stateentropy, and H(F,S) is the joint entropy. N, F̅, and S̅ represent the total number of examples,the mean value of the feature, and the mean value of the state, respectively. The table lists the valuesobtained from both the Pearson correlation and the mutual informationon each of those features with the corresponding state. The detailsof these calculations are given in the SupportingInformation.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4263461&req=5

fig1: Toy example that highlightsthe advantage of using a decision treeover correlation metrics (mutual information (MI) and Pearson correlation(r)). The image on the left depicts a two state model(red and blue dots) with the learned decision boundaries in blackand white. To the right are the equations that define the two comparisonmetrics. H(F) is the entropy ofa given feature, H(S) is the stateentropy, and H(F,S) is the joint entropy. N, F̅, and S̅ represent the total number of examples,the mean value of the feature, and the mean value of the state, respectively. The table lists the valuesobtained from both the Pearson correlation and the mutual informationon each of those features with the corresponding state. The detailsof these calculations are given in the SupportingInformation.

Mentions: This procedure is repeatedrecursively until either all the datahas been divided into completely pure samples or only a single sampleis left at the leaf node. Alternatively, the process can be terminatedupon satisfaction of a threshold criterion. The information gain ineq 4 is equivalent to an entropy reduction inthe target state given the potential split and is comparable to themutual information (MI) between the target variable and the featureunder investigation. In contrast to MI calculations and due to a morecomplex objective function (eq 1), DTs are capableof modeling additive effects (Figure 1 andthe Supporting Information).


Automatic Selection of Order Parameters in the Analysis of Large Scale Molecular Dynamics Simulations.

Sultan MM, Kiss G, Shukla D, Pande VS - J Chem Theory Comput (2014)

Toy example that highlightsthe advantage of using a decision treeover correlation metrics (mutual information (MI) and Pearson correlation(r)). The image on the left depicts a two state model(red and blue dots) with the learned decision boundaries in blackand white. To the right are the equations that define the two comparisonmetrics. H(F) is the entropy ofa given feature, H(S) is the stateentropy, and H(F,S) is the joint entropy. N, F̅, and S̅ represent the total number of examples,the mean value of the feature, and the mean value of the state, respectively. The table lists the valuesobtained from both the Pearson correlation and the mutual informationon each of those features with the corresponding state. The detailsof these calculations are given in the SupportingInformation.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4263461&req=5

fig1: Toy example that highlightsthe advantage of using a decision treeover correlation metrics (mutual information (MI) and Pearson correlation(r)). The image on the left depicts a two state model(red and blue dots) with the learned decision boundaries in blackand white. To the right are the equations that define the two comparisonmetrics. H(F) is the entropy ofa given feature, H(S) is the stateentropy, and H(F,S) is the joint entropy. N, F̅, and S̅ represent the total number of examples,the mean value of the feature, and the mean value of the state, respectively. The table lists the valuesobtained from both the Pearson correlation and the mutual informationon each of those features with the corresponding state. The detailsof these calculations are given in the SupportingInformation.
Mentions: This procedure is repeatedrecursively until either all the datahas been divided into completely pure samples or only a single sampleis left at the leaf node. Alternatively, the process can be terminatedupon satisfaction of a threshold criterion. The information gain ineq 4 is equivalent to an entropy reduction inthe target state given the potential split and is comparable to themutual information (MI) between the target variable and the featureunder investigation. In contrast to MI calculations and due to a morecomplex objective function (eq 1), DTs are capableof modeling additive effects (Figure 1 andthe Supporting Information).

Bottom Line: We address this challenge by introducing a method called clustering based feature selection (CB-FS) that employs a posterior analysis approach.It combines supervised machine learning (SML) and feature selection with Markov state models to automatically identify the relevant degrees of freedom that separate conformational states.We highlight the utility of the method in the evaluation of large-scale simulations and show that it can be used for the rapid and automated identification of relevant order parameters involved in the functional transitions of two exemplary cell-signaling proteins central to human disease states.

View Article: PubMed Central - PubMed

Affiliation: Department of Chemistry, Stanford University , 318 Campus Drive, Stanford, California 94305, United States.

ABSTRACT

Given the large number of crystal structures and NMR ensembles that have been solved to date, classical molecular dynamics (MD) simulations have become powerful tools in the atomistic study of the kinetics and thermodynamics of biomolecular systems on ever increasing time scales. By virtue of the high-dimensional conformational state space that is explored, the interpretation of large-scale simulations faces difficulties not unlike those in the big data community. We address this challenge by introducing a method called clustering based feature selection (CB-FS) that employs a posterior analysis approach. It combines supervised machine learning (SML) and feature selection with Markov state models to automatically identify the relevant degrees of freedom that separate conformational states. We highlight the utility of the method in the evaluation of large-scale simulations and show that it can be used for the rapid and automated identification of relevant order parameters involved in the functional transitions of two exemplary cell-signaling proteins central to human disease states.

No MeSH data available.