Limits...
A combinatorial approach to detect coevolved amino acid networks in protein families of variable divergence.

Baussand J, Carbone A - PLoS Comput. Biol. (2009)

Bottom Line: We propose a combinatorial method for mapping conserved networks of amino acid interactions in a protein which is based on the analysis of a set of aligned sequences, the associated distance tree and the combinatorics of its subtrees.The method drops the constraints on high sequence divergence limiting the range of applicability of the statistical approaches previously proposed.We apply the method to four protein families where we show an accurate detection of functional networks and the possibility to treat sets of protein sequences of variable divergence.

View Article: PubMed Central - PubMed

Affiliation: Génomique Analytique, Université Pierre et Marie Curie, Paris, France.

ABSTRACT
Communication between distant sites often defines the biological role of a protein: amino acid long-range interactions are as important in binding specificity, allosteric regulation and conformational change as residues directly contacting the substrate. The maintaining of functional and structural coupling of long-range interacting residues requires coevolution of these residues. Networks of interaction between coevolved residues can be reconstructed, and from the networks, one can possibly derive insights into functional mechanisms for the protein family. We propose a combinatorial method for mapping conserved networks of amino acid interactions in a protein which is based on the analysis of a set of aligned sequences, the associated distance tree and the combinatorics of its subtrees. The degree of coevolution of all pairs of coevolved residues is identified numerically, and networks are reconstructed with a dedicated clustering algorithm. The method drops the constraints on high sequence divergence limiting the range of applicability of the statistical approaches previously proposed. We apply the method to four protein families where we show an accurate detection of functional networks and the possibility to treat sets of protein sequences of variable divergence.

Show MeSH
Tree analysis of the residue distribution over two positions and the associated correspondence matrix.Each position is occupied by two residues, with ,  and ,  essentially mirroring each other (see tree, top left). Correspondence scores calculated for inner trees ( (1),  (2),  (3) and  (4)) are reported in the correspondence matrix (bottom left). Within an inner tree defined for a pair of residues, nodes (leaves excluded) conserving both residues are represented with filled black squares, and all others by unfilled squares. In this example, correspondence scores end up to be the ratio between the number of filled squares over the total number of squares (see formal definition in the text).
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2723916&req=5

pcbi-1000488-g006: Tree analysis of the residue distribution over two positions and the associated correspondence matrix.Each position is occupied by two residues, with , and , essentially mirroring each other (see tree, top left). Correspondence scores calculated for inner trees ( (1), (2), (3) and (4)) are reported in the correspondence matrix (bottom left). Within an inner tree defined for a pair of residues, nodes (leaves excluded) conserving both residues are represented with filled black squares, and all others by unfilled squares. In this example, correspondence scores end up to be the ratio between the number of filled squares over the total number of squares (see formal definition in the text).

Mentions: Coupling describes perfect coevolution between two positions. Since it is unlikely to be observed on real sequence data, the evaluation of coevolution between pairs of positions cannot be reduced to a simple assessment on the presence or absence of a perfect identity matrix. In particular, even for a pair of positions with a good overlap of MSTs, noise in the data caused by a single residue disrupting the maximality of the tree can lead to a diagonal matrix which is not an identity matrix. See Figure 6. Thus, we define a coevolution score between two seed positions by evaluating the “distance” between an ideal identity matrix (coupling) and the actual correspondence matrix which displays less regularity (issued by a possible combination of multi-overlapping and multi-inclusion), for all residues associated to the positions.


A combinatorial approach to detect coevolved amino acid networks in protein families of variable divergence.

Baussand J, Carbone A - PLoS Comput. Biol. (2009)

Tree analysis of the residue distribution over two positions and the associated correspondence matrix.Each position is occupied by two residues, with ,  and ,  essentially mirroring each other (see tree, top left). Correspondence scores calculated for inner trees ( (1),  (2),  (3) and  (4)) are reported in the correspondence matrix (bottom left). Within an inner tree defined for a pair of residues, nodes (leaves excluded) conserving both residues are represented with filled black squares, and all others by unfilled squares. In this example, correspondence scores end up to be the ratio between the number of filled squares over the total number of squares (see formal definition in the text).
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2723916&req=5

pcbi-1000488-g006: Tree analysis of the residue distribution over two positions and the associated correspondence matrix.Each position is occupied by two residues, with , and , essentially mirroring each other (see tree, top left). Correspondence scores calculated for inner trees ( (1), (2), (3) and (4)) are reported in the correspondence matrix (bottom left). Within an inner tree defined for a pair of residues, nodes (leaves excluded) conserving both residues are represented with filled black squares, and all others by unfilled squares. In this example, correspondence scores end up to be the ratio between the number of filled squares over the total number of squares (see formal definition in the text).
Mentions: Coupling describes perfect coevolution between two positions. Since it is unlikely to be observed on real sequence data, the evaluation of coevolution between pairs of positions cannot be reduced to a simple assessment on the presence or absence of a perfect identity matrix. In particular, even for a pair of positions with a good overlap of MSTs, noise in the data caused by a single residue disrupting the maximality of the tree can lead to a diagonal matrix which is not an identity matrix. See Figure 6. Thus, we define a coevolution score between two seed positions by evaluating the “distance” between an ideal identity matrix (coupling) and the actual correspondence matrix which displays less regularity (issued by a possible combination of multi-overlapping and multi-inclusion), for all residues associated to the positions.

Bottom Line: We propose a combinatorial method for mapping conserved networks of amino acid interactions in a protein which is based on the analysis of a set of aligned sequences, the associated distance tree and the combinatorics of its subtrees.The method drops the constraints on high sequence divergence limiting the range of applicability of the statistical approaches previously proposed.We apply the method to four protein families where we show an accurate detection of functional networks and the possibility to treat sets of protein sequences of variable divergence.

View Article: PubMed Central - PubMed

Affiliation: Génomique Analytique, Université Pierre et Marie Curie, Paris, France.

ABSTRACT
Communication between distant sites often defines the biological role of a protein: amino acid long-range interactions are as important in binding specificity, allosteric regulation and conformational change as residues directly contacting the substrate. The maintaining of functional and structural coupling of long-range interacting residues requires coevolution of these residues. Networks of interaction between coevolved residues can be reconstructed, and from the networks, one can possibly derive insights into functional mechanisms for the protein family. We propose a combinatorial method for mapping conserved networks of amino acid interactions in a protein which is based on the analysis of a set of aligned sequences, the associated distance tree and the combinatorics of its subtrees. The degree of coevolution of all pairs of coevolved residues is identified numerically, and networks are reconstructed with a dedicated clustering algorithm. The method drops the constraints on high sequence divergence limiting the range of applicability of the statistical approaches previously proposed. We apply the method to four protein families where we show an accurate detection of functional networks and the possibility to treat sets of protein sequences of variable divergence.

Show MeSH