Limits...
A combinatorial approach to detect coevolved amino acid networks in protein families of variable divergence.

Baussand J, Carbone A - PLoS Comput. Biol. (2009)

Bottom Line: We propose a combinatorial method for mapping conserved networks of amino acid interactions in a protein which is based on the analysis of a set of aligned sequences, the associated distance tree and the combinatorics of its subtrees.The method drops the constraints on high sequence divergence limiting the range of applicability of the statistical approaches previously proposed.We apply the method to four protein families where we show an accurate detection of functional networks and the possibility to treat sets of protein sequences of variable divergence.

View Article: PubMed Central - PubMed

Affiliation: Génomique Analytique, Université Pierre et Marie Curie, Paris, France.

ABSTRACT
Communication between distant sites often defines the biological role of a protein: amino acid long-range interactions are as important in binding specificity, allosteric regulation and conformational change as residues directly contacting the substrate. The maintaining of functional and structural coupling of long-range interacting residues requires coevolution of these residues. Networks of interaction between coevolved residues can be reconstructed, and from the networks, one can possibly derive insights into functional mechanisms for the protein family. We propose a combinatorial method for mapping conserved networks of amino acid interactions in a protein which is based on the analysis of a set of aligned sequences, the associated distance tree and the combinatorics of its subtrees. The degree of coevolution of all pairs of coevolved residues is identified numerically, and networks are reconstructed with a dedicated clustering algorithm. The method drops the constraints on high sequence divergence limiting the range of applicability of the statistical approaches previously proposed. We apply the method to four protein families where we show an accurate detection of functional networks and the possibility to treat sets of protein sequences of variable divergence.

Show MeSH
Leucine dehydrogenases.A: Matrix of relative coevolution scores  for the leucine dehydrogenase family. The 5 identified networks have been manually selected on the matrix. Signals for detection are noisy and errors in clustering positions are likely; due to red scores, the last position of the matrix, for instance, seem misplaced and better clustered with positions appeared before in the matrix. Despite the intrinsic difficulty in detection, the strong difference in signals among networks, globally justifies all five. The first and third networks display similar signals (see red scores along the associated columns and rows) but each of them shares different signals with the second network. The same is observed for the fourth and the fifth networks with respect to the third one. BC: Coevolved residues networks on the Bacillus sphaericus leucine dehydrogenase structure 1LEH (chain B). The catalytic site is illustrated on the front (left) and on the side (right); B: network associated to the catalytic function (red, first in A) and network associated to ligand specificity (blue, second in A); C: third (green), fourth (orange) and fifth (yellow) networks detected in A.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2723916&req=5

pcbi-1000488-g012: Leucine dehydrogenases.A: Matrix of relative coevolution scores for the leucine dehydrogenase family. The 5 identified networks have been manually selected on the matrix. Signals for detection are noisy and errors in clustering positions are likely; due to red scores, the last position of the matrix, for instance, seem misplaced and better clustered with positions appeared before in the matrix. Despite the intrinsic difficulty in detection, the strong difference in signals among networks, globally justifies all five. The first and third networks display similar signals (see red scores along the associated columns and rows) but each of them shares different signals with the second network. The same is observed for the fourth and the fifth networks with respect to the third one. BC: Coevolved residues networks on the Bacillus sphaericus leucine dehydrogenase structure 1LEH (chain B). The catalytic site is illustrated on the front (left) and on the side (right); B: network associated to the catalytic function (red, first in A) and network associated to ligand specificity (blue, second in A); C: third (green), fourth (orange) and fifth (yellow) networks detected in A.

Mentions: Among the 580 alignment positions of the 571 sequences of the amino acid dehydrogenase family, 169 (29% of the alignment positions) have been selected as seed positions. The MST method applied to this family lead to the (manual) identification of 5 networks on the relative coevolution score matrix (Figure 12A). Positions identified in the networks represent 22% of the residues in the structure 1LEH chain B. Notice that a noisy interference is observed between the different networks (this corresponds to red dots appearing in the strip just below the squares delimiting the networks).


A combinatorial approach to detect coevolved amino acid networks in protein families of variable divergence.

Baussand J, Carbone A - PLoS Comput. Biol. (2009)

Leucine dehydrogenases.A: Matrix of relative coevolution scores  for the leucine dehydrogenase family. The 5 identified networks have been manually selected on the matrix. Signals for detection are noisy and errors in clustering positions are likely; due to red scores, the last position of the matrix, for instance, seem misplaced and better clustered with positions appeared before in the matrix. Despite the intrinsic difficulty in detection, the strong difference in signals among networks, globally justifies all five. The first and third networks display similar signals (see red scores along the associated columns and rows) but each of them shares different signals with the second network. The same is observed for the fourth and the fifth networks with respect to the third one. BC: Coevolved residues networks on the Bacillus sphaericus leucine dehydrogenase structure 1LEH (chain B). The catalytic site is illustrated on the front (left) and on the side (right); B: network associated to the catalytic function (red, first in A) and network associated to ligand specificity (blue, second in A); C: third (green), fourth (orange) and fifth (yellow) networks detected in A.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2723916&req=5

pcbi-1000488-g012: Leucine dehydrogenases.A: Matrix of relative coevolution scores for the leucine dehydrogenase family. The 5 identified networks have been manually selected on the matrix. Signals for detection are noisy and errors in clustering positions are likely; due to red scores, the last position of the matrix, for instance, seem misplaced and better clustered with positions appeared before in the matrix. Despite the intrinsic difficulty in detection, the strong difference in signals among networks, globally justifies all five. The first and third networks display similar signals (see red scores along the associated columns and rows) but each of them shares different signals with the second network. The same is observed for the fourth and the fifth networks with respect to the third one. BC: Coevolved residues networks on the Bacillus sphaericus leucine dehydrogenase structure 1LEH (chain B). The catalytic site is illustrated on the front (left) and on the side (right); B: network associated to the catalytic function (red, first in A) and network associated to ligand specificity (blue, second in A); C: third (green), fourth (orange) and fifth (yellow) networks detected in A.
Mentions: Among the 580 alignment positions of the 571 sequences of the amino acid dehydrogenase family, 169 (29% of the alignment positions) have been selected as seed positions. The MST method applied to this family lead to the (manual) identification of 5 networks on the relative coevolution score matrix (Figure 12A). Positions identified in the networks represent 22% of the residues in the structure 1LEH chain B. Notice that a noisy interference is observed between the different networks (this corresponds to red dots appearing in the strip just below the squares delimiting the networks).

Bottom Line: We propose a combinatorial method for mapping conserved networks of amino acid interactions in a protein which is based on the analysis of a set of aligned sequences, the associated distance tree and the combinatorics of its subtrees.The method drops the constraints on high sequence divergence limiting the range of applicability of the statistical approaches previously proposed.We apply the method to four protein families where we show an accurate detection of functional networks and the possibility to treat sets of protein sequences of variable divergence.

View Article: PubMed Central - PubMed

Affiliation: Génomique Analytique, Université Pierre et Marie Curie, Paris, France.

ABSTRACT
Communication between distant sites often defines the biological role of a protein: amino acid long-range interactions are as important in binding specificity, allosteric regulation and conformational change as residues directly contacting the substrate. The maintaining of functional and structural coupling of long-range interacting residues requires coevolution of these residues. Networks of interaction between coevolved residues can be reconstructed, and from the networks, one can possibly derive insights into functional mechanisms for the protein family. We propose a combinatorial method for mapping conserved networks of amino acid interactions in a protein which is based on the analysis of a set of aligned sequences, the associated distance tree and the combinatorics of its subtrees. The degree of coevolution of all pairs of coevolved residues is identified numerically, and networks are reconstructed with a dedicated clustering algorithm. The method drops the constraints on high sequence divergence limiting the range of applicability of the statistical approaches previously proposed. We apply the method to four protein families where we show an accurate detection of functional networks and the possibility to treat sets of protein sequences of variable divergence.

Show MeSH