Limits...
Inferring Pairwise Interactions from Biological Data Using Maximum-Entropy Probability Models.

Stein RR, Marks DS, Sander C - PLoS Comput. Biol. (2015)

Bottom Line: Maximum entropy-based inference methods have been successfully used to infer direct interactions from biological datasets such as gene expression data or sequence ensembles.These parameters reflect interactive couplings between observables, which can be used to predict global properties of the biological system.Such methods are applicable to the important problems of protein 3-D structure prediction and association of gene-gene networks, and they enable potential applications to the analysis of gene alteration patterns and to protein design.

View Article: PubMed Central - PubMed

Affiliation: Computational Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York, United States of America.

ABSTRACT
Maximum entropy-based inference methods have been successfully used to infer direct interactions from biological datasets such as gene expression data or sequence ensembles. Here, we review undirected pairwise maximum-entropy probability models in two categories of data types, those with continuous and categorical random variables. As a concrete example, we present recently developed inference methods from the field of protein contact prediction and show that a basic set of assumptions leads to similar solution strategies for inferring the model parameters in both variable types. These parameters reflect interactive couplings between observables, which can be used to predict global properties of the biological system. Such methods are applicable to the important problems of protein 3-D structure prediction and association of gene-gene networks, and they enable potential applications to the analysis of gene alteration patterns and to protein design.

No MeSH data available.


Scheme of pairwise maximum-entropy probability models.The maximum-entropy probability distribution with pairwise constraints for continuous random variables is the multivariate Gaussian distribution (left column). For the maximum-entropy probability distribution in the categorical variable case (right column), various approximative solutions exist, e.g., the mean-field, the sparse maximum-likelihood, and the pseudolikelihood maximization solution. The mean-field and the sparse maximum-likelihood result can be derived from the Gaussian approximation of binarized categorical variables (thin arrow). Pair scoring functions for the continuous case are the partial correlations (left column). For the categorical variable case, the direct information, the Frobenius norm, and the average product-corrected Frobenius norm are used to score pair couplings from the inferred parameters (right column).
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4520494&req=5

pcbi.1004182.g003: Scheme of pairwise maximum-entropy probability models.The maximum-entropy probability distribution with pairwise constraints for continuous random variables is the multivariate Gaussian distribution (left column). For the maximum-entropy probability distribution in the categorical variable case (right column), various approximative solutions exist, e.g., the mean-field, the sparse maximum-likelihood, and the pseudolikelihood maximization solution. The mean-field and the sparse maximum-likelihood result can be derived from the Gaussian approximation of binarized categorical variables (thin arrow). Pair scoring functions for the continuous case are the partial correlations (left column). For the categorical variable case, the direct information, the Frobenius norm, and the average product-corrected Frobenius norm are used to score pair couplings from the inferred parameters (right column).

Mentions: Maximum entropy-based inference methods can help in estimating interactions underlying biological data. This class of models, combined with suitable methods for inferring their numerical parameters, has been shown to reveal—to a reasonable approximation—the direct interactions in many biological applications, such as gene expression or protein residue—residue coevolution studies. In this review, we have presented maximum-entropy models for the continuous and categorical random variable case. Both approaches can be integrated into a framework, which allows the use of solutions obtained for continuous variables as approximations for the categorical random variable case (Fig 3).


Inferring Pairwise Interactions from Biological Data Using Maximum-Entropy Probability Models.

Stein RR, Marks DS, Sander C - PLoS Comput. Biol. (2015)

Scheme of pairwise maximum-entropy probability models.The maximum-entropy probability distribution with pairwise constraints for continuous random variables is the multivariate Gaussian distribution (left column). For the maximum-entropy probability distribution in the categorical variable case (right column), various approximative solutions exist, e.g., the mean-field, the sparse maximum-likelihood, and the pseudolikelihood maximization solution. The mean-field and the sparse maximum-likelihood result can be derived from the Gaussian approximation of binarized categorical variables (thin arrow). Pair scoring functions for the continuous case are the partial correlations (left column). For the categorical variable case, the direct information, the Frobenius norm, and the average product-corrected Frobenius norm are used to score pair couplings from the inferred parameters (right column).
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4520494&req=5

pcbi.1004182.g003: Scheme of pairwise maximum-entropy probability models.The maximum-entropy probability distribution with pairwise constraints for continuous random variables is the multivariate Gaussian distribution (left column). For the maximum-entropy probability distribution in the categorical variable case (right column), various approximative solutions exist, e.g., the mean-field, the sparse maximum-likelihood, and the pseudolikelihood maximization solution. The mean-field and the sparse maximum-likelihood result can be derived from the Gaussian approximation of binarized categorical variables (thin arrow). Pair scoring functions for the continuous case are the partial correlations (left column). For the categorical variable case, the direct information, the Frobenius norm, and the average product-corrected Frobenius norm are used to score pair couplings from the inferred parameters (right column).
Mentions: Maximum entropy-based inference methods can help in estimating interactions underlying biological data. This class of models, combined with suitable methods for inferring their numerical parameters, has been shown to reveal—to a reasonable approximation—the direct interactions in many biological applications, such as gene expression or protein residue—residue coevolution studies. In this review, we have presented maximum-entropy models for the continuous and categorical random variable case. Both approaches can be integrated into a framework, which allows the use of solutions obtained for continuous variables as approximations for the categorical random variable case (Fig 3).

Bottom Line: Maximum entropy-based inference methods have been successfully used to infer direct interactions from biological datasets such as gene expression data or sequence ensembles.These parameters reflect interactive couplings between observables, which can be used to predict global properties of the biological system.Such methods are applicable to the important problems of protein 3-D structure prediction and association of gene-gene networks, and they enable potential applications to the analysis of gene alteration patterns and to protein design.

View Article: PubMed Central - PubMed

Affiliation: Computational Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York, United States of America.

ABSTRACT
Maximum entropy-based inference methods have been successfully used to infer direct interactions from biological datasets such as gene expression data or sequence ensembles. Here, we review undirected pairwise maximum-entropy probability models in two categories of data types, those with continuous and categorical random variables. As a concrete example, we present recently developed inference methods from the field of protein contact prediction and show that a basic set of assumptions leads to similar solution strategies for inferring the model parameters in both variable types. These parameters reflect interactive couplings between observables, which can be used to predict global properties of the biological system. Such methods are applicable to the important problems of protein 3-D structure prediction and association of gene-gene networks, and they enable potential applications to the analysis of gene alteration patterns and to protein design.

No MeSH data available.