Limits...
Inferring Pairwise Interactions from Biological Data Using Maximum-Entropy Probability Models.

Stein RR, Marks DS, Sander C - PLoS Comput. Biol. (2015)

Bottom Line: Maximum entropy-based inference methods have been successfully used to infer direct interactions from biological datasets such as gene expression data or sequence ensembles.These parameters reflect interactive couplings between observables, which can be used to predict global properties of the biological system.Such methods are applicable to the important problems of protein 3-D structure prediction and association of gene-gene networks, and they enable potential applications to the analysis of gene alteration patterns and to protein design.

View Article: PubMed Central - PubMed

Affiliation: Computational Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York, United States of America.

ABSTRACT
Maximum entropy-based inference methods have been successfully used to infer direct interactions from biological datasets such as gene expression data or sequence ensembles. Here, we review undirected pairwise maximum-entropy probability models in two categories of data types, those with continuous and categorical random variables. As a concrete example, we present recently developed inference methods from the field of protein contact prediction and show that a basic set of assumptions leads to similar solution strategies for inferring the model parameters in both variable types. These parameters reflect interactive couplings between observables, which can be used to predict global properties of the biological system. Such methods are applicable to the important problems of protein 3-D structure prediction and association of gene-gene networks, and they enable potential applications to the analysis of gene alteration patterns and to protein design.

No MeSH data available.


Illustration of binary embedding.The binary embedding 1σ: Ω → {0, 1}Lq maps each vector of categorical random variables, x∈ΩL, here represented by a sequence of amino acids from the amino acid alphabet (containing the 20 amino acids and one gap element), Ω = {A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y, −}, onto a unique binary representation, x(σ)∈{0, 1}Lq.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4520494&req=5

pcbi.1004182.g002: Illustration of binary embedding.The binary embedding 1σ: Ω → {0, 1}Lq maps each vector of categorical random variables, x∈ΩL, here represented by a sequence of amino acids from the amino acid alphabet (containing the 20 amino acids and one gap element), Ω = {A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y, −}, onto a unique binary representation, x(σ)∈{0, 1}Lq.

Mentions: To formalize the derivation of the pairwise maximum-entropy probability distribution on categorical variables, we use the approach of [8,30,48] and replace, as depicted in Fig 2, each variable xi defined on categorical variables by an indicator function of the amino acid σ ∈ Ω, 1σ: Ω → {0, 1}q,xi↦xi(σ):≡1σ(xi)={1if xi=σ,0otherwise.


Inferring Pairwise Interactions from Biological Data Using Maximum-Entropy Probability Models.

Stein RR, Marks DS, Sander C - PLoS Comput. Biol. (2015)

Illustration of binary embedding.The binary embedding 1σ: Ω → {0, 1}Lq maps each vector of categorical random variables, x∈ΩL, here represented by a sequence of amino acids from the amino acid alphabet (containing the 20 amino acids and one gap element), Ω = {A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y, −}, onto a unique binary representation, x(σ)∈{0, 1}Lq.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4520494&req=5

pcbi.1004182.g002: Illustration of binary embedding.The binary embedding 1σ: Ω → {0, 1}Lq maps each vector of categorical random variables, x∈ΩL, here represented by a sequence of amino acids from the amino acid alphabet (containing the 20 amino acids and one gap element), Ω = {A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y, −}, onto a unique binary representation, x(σ)∈{0, 1}Lq.
Mentions: To formalize the derivation of the pairwise maximum-entropy probability distribution on categorical variables, we use the approach of [8,30,48] and replace, as depicted in Fig 2, each variable xi defined on categorical variables by an indicator function of the amino acid σ ∈ Ω, 1σ: Ω → {0, 1}q,xi↦xi(σ):≡1σ(xi)={1if xi=σ,0otherwise.

Bottom Line: Maximum entropy-based inference methods have been successfully used to infer direct interactions from biological datasets such as gene expression data or sequence ensembles.These parameters reflect interactive couplings between observables, which can be used to predict global properties of the biological system.Such methods are applicable to the important problems of protein 3-D structure prediction and association of gene-gene networks, and they enable potential applications to the analysis of gene alteration patterns and to protein design.

View Article: PubMed Central - PubMed

Affiliation: Computational Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York, United States of America.

ABSTRACT
Maximum entropy-based inference methods have been successfully used to infer direct interactions from biological datasets such as gene expression data or sequence ensembles. Here, we review undirected pairwise maximum-entropy probability models in two categories of data types, those with continuous and categorical random variables. As a concrete example, we present recently developed inference methods from the field of protein contact prediction and show that a basic set of assumptions leads to similar solution strategies for inferring the model parameters in both variable types. These parameters reflect interactive couplings between observables, which can be used to predict global properties of the biological system. Such methods are applicable to the important problems of protein 3-D structure prediction and association of gene-gene networks, and they enable potential applications to the analysis of gene alteration patterns and to protein design.

No MeSH data available.