Limits...
Inferring Pairwise Interactions from Biological Data Using Maximum-Entropy Probability Models.

Stein RR, Marks DS, Sander C - PLoS Comput. Biol. (2015)

Bottom Line: Maximum entropy-based inference methods have been successfully used to infer direct interactions from biological datasets such as gene expression data or sequence ensembles.These parameters reflect interactive couplings between observables, which can be used to predict global properties of the biological system.Such methods are applicable to the important problems of protein 3-D structure prediction and association of gene-gene networks, and they enable potential applications to the analysis of gene alteration patterns and to protein design.

View Article: PubMed Central - PubMed

Affiliation: Computational Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York, United States of America.

ABSTRACT
Maximum entropy-based inference methods have been successfully used to infer direct interactions from biological datasets such as gene expression data or sequence ensembles. Here, we review undirected pairwise maximum-entropy probability models in two categories of data types, those with continuous and categorical random variables. As a concrete example, we present recently developed inference methods from the field of protein contact prediction and show that a basic set of assumptions leads to similar solution strategies for inferring the model parameters in both variable types. These parameters reflect interactive couplings between observables, which can be used to predict global properties of the biological system. Such methods are applicable to the important problems of protein 3-D structure prediction and association of gene-gene networks, and they enable potential applications to the analysis of gene alteration patterns and to protein design.

No MeSH data available.


Reaction system reconstruction and protein contact prediction.Association results of correlation-based and maximum-entropy methods on biological data from an in silico reaction system (A) and protein contacts (B). (A) Analysis by Pearson’s correlation yields interactions associating all three compounds A, B, and C, in contrast to the partial correlation approach which omits the “false” link between A and C. (Fig 1A based on [21].) (B) Protein contact prediction for the human RAS protein using the correlation-based mutual information, MI, and the maximum-entropy based direct information, DI, (blue and red, respectively). The 150 highest scoring contacts from both methods are plotted on the protein contacts from experimentally determined structure in gray. (Fig 1B based on [6].)
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4520494&req=5

pcbi.1004182.g001: Reaction system reconstruction and protein contact prediction.Association results of correlation-based and maximum-entropy methods on biological data from an in silico reaction system (A) and protein contacts (B). (A) Analysis by Pearson’s correlation yields interactions associating all three compounds A, B, and C, in contrast to the partial correlation approach which omits the “false” link between A and C. (Fig 1A based on [21].) (B) Protein contact prediction for the human RAS protein using the correlation-based mutual information, MI, and the maximum-entropy based direct information, DI, (blue and red, respectively). The 150 highest scoring contacts from both methods are plotted on the protein contacts from experimentally determined structure in gray. (Fig 1B based on [6].)

Mentions: The latter equivalence by Cramer’s rule holds if the empirical covariance matrix, , is invertible. Krumsiek et al. [21] studied the Pearson correlations and partial correlations in data generated by an in silico reaction system consisting of three components A, B, C with reactions between A and B, and B and C (Fig 1A). A graphical comparison of Pearson’s correlations, rAB, rACrBC, versus the corresponding partial correlations, rAB·C, rAC·B, rBC·A, shows that variables A and C appear to be correlated when using Pearson’s correlation as a dependency measure since both are highly correlated with variable B, which results in a false inferred reaction rAC. The strength of the incorrectly inferred interaction can be numerically large and therefore particularly misleading if there are multiple intermediate variables B [22]. The partial correlation analysis removes the effect of the mediating variable(s) B and correctly recovers the underlying interaction structure. This is always true for variables following a multivariate Gaussian distribution, but also seems to work empirically on realistic systems as Krumsiek et al. [21] have shown for more complex reaction structures than the example presented here.


Inferring Pairwise Interactions from Biological Data Using Maximum-Entropy Probability Models.

Stein RR, Marks DS, Sander C - PLoS Comput. Biol. (2015)

Reaction system reconstruction and protein contact prediction.Association results of correlation-based and maximum-entropy methods on biological data from an in silico reaction system (A) and protein contacts (B). (A) Analysis by Pearson’s correlation yields interactions associating all three compounds A, B, and C, in contrast to the partial correlation approach which omits the “false” link between A and C. (Fig 1A based on [21].) (B) Protein contact prediction for the human RAS protein using the correlation-based mutual information, MI, and the maximum-entropy based direct information, DI, (blue and red, respectively). The 150 highest scoring contacts from both methods are plotted on the protein contacts from experimentally determined structure in gray. (Fig 1B based on [6].)
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4520494&req=5

pcbi.1004182.g001: Reaction system reconstruction and protein contact prediction.Association results of correlation-based and maximum-entropy methods on biological data from an in silico reaction system (A) and protein contacts (B). (A) Analysis by Pearson’s correlation yields interactions associating all three compounds A, B, and C, in contrast to the partial correlation approach which omits the “false” link between A and C. (Fig 1A based on [21].) (B) Protein contact prediction for the human RAS protein using the correlation-based mutual information, MI, and the maximum-entropy based direct information, DI, (blue and red, respectively). The 150 highest scoring contacts from both methods are plotted on the protein contacts from experimentally determined structure in gray. (Fig 1B based on [6].)
Mentions: The latter equivalence by Cramer’s rule holds if the empirical covariance matrix, , is invertible. Krumsiek et al. [21] studied the Pearson correlations and partial correlations in data generated by an in silico reaction system consisting of three components A, B, C with reactions between A and B, and B and C (Fig 1A). A graphical comparison of Pearson’s correlations, rAB, rACrBC, versus the corresponding partial correlations, rAB·C, rAC·B, rBC·A, shows that variables A and C appear to be correlated when using Pearson’s correlation as a dependency measure since both are highly correlated with variable B, which results in a false inferred reaction rAC. The strength of the incorrectly inferred interaction can be numerically large and therefore particularly misleading if there are multiple intermediate variables B [22]. The partial correlation analysis removes the effect of the mediating variable(s) B and correctly recovers the underlying interaction structure. This is always true for variables following a multivariate Gaussian distribution, but also seems to work empirically on realistic systems as Krumsiek et al. [21] have shown for more complex reaction structures than the example presented here.

Bottom Line: Maximum entropy-based inference methods have been successfully used to infer direct interactions from biological datasets such as gene expression data or sequence ensembles.These parameters reflect interactive couplings between observables, which can be used to predict global properties of the biological system.Such methods are applicable to the important problems of protein 3-D structure prediction and association of gene-gene networks, and they enable potential applications to the analysis of gene alteration patterns and to protein design.

View Article: PubMed Central - PubMed

Affiliation: Computational Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York, United States of America.

ABSTRACT
Maximum entropy-based inference methods have been successfully used to infer direct interactions from biological datasets such as gene expression data or sequence ensembles. Here, we review undirected pairwise maximum-entropy probability models in two categories of data types, those with continuous and categorical random variables. As a concrete example, we present recently developed inference methods from the field of protein contact prediction and show that a basic set of assumptions leads to similar solution strategies for inferring the model parameters in both variable types. These parameters reflect interactive couplings between observables, which can be used to predict global properties of the biological system. Such methods are applicable to the important problems of protein 3-D structure prediction and association of gene-gene networks, and they enable potential applications to the analysis of gene alteration patterns and to protein design.

No MeSH data available.