Limits...
A combinatorial approach to detect coevolved amino acid networks in protein families of variable divergence.

Baussand J, Carbone A - PLoS Comput. Biol. (2009)

Bottom Line: We propose a combinatorial method for mapping conserved networks of amino acid interactions in a protein which is based on the analysis of a set of aligned sequences, the associated distance tree and the combinatorics of its subtrees.The method drops the constraints on high sequence divergence limiting the range of applicability of the statistical approaches previously proposed.We apply the method to four protein families where we show an accurate detection of functional networks and the possibility to treat sets of protein sequences of variable divergence.

View Article: PubMed Central - PubMed

Affiliation: Génomique Analytique, Université Pierre et Marie Curie, Paris, France.

ABSTRACT
Communication between distant sites often defines the biological role of a protein: amino acid long-range interactions are as important in binding specificity, allosteric regulation and conformational change as residues directly contacting the substrate. The maintaining of functional and structural coupling of long-range interacting residues requires coevolution of these residues. Networks of interaction between coevolved residues can be reconstructed, and from the networks, one can possibly derive insights into functional mechanisms for the protein family. We propose a combinatorial method for mapping conserved networks of amino acid interactions in a protein which is based on the analysis of a set of aligned sequences, the associated distance tree and the combinatorics of its subtrees. The degree of coevolution of all pairs of coevolved residues is identified numerically, and networks are reconstructed with a dedicated clustering algorithm. The method drops the constraints on high sequence divergence limiting the range of applicability of the statistical approaches previously proposed. We apply the method to four protein families where we show an accurate detection of functional networks and the possibility to treat sets of protein sequences of variable divergence.

Show MeSH
Serine proteases.A: Matrix of relative coevolution scores  for the serine protease family. Three coevolved residues networks have been manually selected from the matrix and are indicated by black boxes. BCD: Coevolved residues network detected for the serine protease family are indicated using the Van der Walls representation in the bovine trypsin structure (two faces of the 1AUJ chain A). The catalytic triad is represented by a yellow wireframe. L1 and L2 loops supporting the S1 site are indicated. Position 172 on the L3 loop in orange and position 189 on the L1 loop in yellow are indicated using the Van der Walls representation. A substrate analog (inhibitor) of the ligand is in green; B: network associated to the catalytic site (red) except for the catalytic triad that belongs to this network; C: network with potential structural role (blue); D: network associated to the ligand specificity (brown). E: Global view of the coevolved residues networks.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2723916&req=5

pcbi-1000488-g011: Serine proteases.A: Matrix of relative coevolution scores for the serine protease family. Three coevolved residues networks have been manually selected from the matrix and are indicated by black boxes. BCD: Coevolved residues network detected for the serine protease family are indicated using the Van der Walls representation in the bovine trypsin structure (two faces of the 1AUJ chain A). The catalytic triad is represented by a yellow wireframe. L1 and L2 loops supporting the S1 site are indicated. Position 172 on the L3 loop in orange and position 189 on the L1 loop in yellow are indicated using the Van der Walls representation. A substrate analog (inhibitor) of the ligand is in green; B: network associated to the catalytic site (red) except for the catalytic triad that belongs to this network; C: network with potential structural role (blue); D: network associated to the ligand specificity (brown). E: Global view of the coevolved residues networks.

Mentions: Serine protease are enzymes with a catalytic triad performing the cleavage of peptidic liaison. Different serine proteases exist according to their ligand specificity. For instance, trypsins are specific to liaison involving a lysin or an arginin whereas chymotrypsins are specific to liaison involving hydrophobic or aromatic residues (preferentially phenylalanine) [30],[31]. A major determinant in the ligand specificity is the S1 pocket which interacts with the specific residue of the ligand. A negative charge (Asp189) in the bottom of the S1 pocket of trypsin suggests a local electrostatic mechanism for the specific ligand recognition of positively charged residues. However the modification of a serine protease from a trypsin to a chymotrypsin specificity requires the mutation of several positions in the S1 pocket and on the surface loops L1, L2 and L3 close to the S1 pocket [30] (indicated in Figure 11B, left). This implies that a group of residues cooperatively acts for the ligand specificity of serine proteases.


A combinatorial approach to detect coevolved amino acid networks in protein families of variable divergence.

Baussand J, Carbone A - PLoS Comput. Biol. (2009)

Serine proteases.A: Matrix of relative coevolution scores  for the serine protease family. Three coevolved residues networks have been manually selected from the matrix and are indicated by black boxes. BCD: Coevolved residues network detected for the serine protease family are indicated using the Van der Walls representation in the bovine trypsin structure (two faces of the 1AUJ chain A). The catalytic triad is represented by a yellow wireframe. L1 and L2 loops supporting the S1 site are indicated. Position 172 on the L3 loop in orange and position 189 on the L1 loop in yellow are indicated using the Van der Walls representation. A substrate analog (inhibitor) of the ligand is in green; B: network associated to the catalytic site (red) except for the catalytic triad that belongs to this network; C: network with potential structural role (blue); D: network associated to the ligand specificity (brown). E: Global view of the coevolved residues networks.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2723916&req=5

pcbi-1000488-g011: Serine proteases.A: Matrix of relative coevolution scores for the serine protease family. Three coevolved residues networks have been manually selected from the matrix and are indicated by black boxes. BCD: Coevolved residues network detected for the serine protease family are indicated using the Van der Walls representation in the bovine trypsin structure (two faces of the 1AUJ chain A). The catalytic triad is represented by a yellow wireframe. L1 and L2 loops supporting the S1 site are indicated. Position 172 on the L3 loop in orange and position 189 on the L1 loop in yellow are indicated using the Van der Walls representation. A substrate analog (inhibitor) of the ligand is in green; B: network associated to the catalytic site (red) except for the catalytic triad that belongs to this network; C: network with potential structural role (blue); D: network associated to the ligand specificity (brown). E: Global view of the coevolved residues networks.
Mentions: Serine protease are enzymes with a catalytic triad performing the cleavage of peptidic liaison. Different serine proteases exist according to their ligand specificity. For instance, trypsins are specific to liaison involving a lysin or an arginin whereas chymotrypsins are specific to liaison involving hydrophobic or aromatic residues (preferentially phenylalanine) [30],[31]. A major determinant in the ligand specificity is the S1 pocket which interacts with the specific residue of the ligand. A negative charge (Asp189) in the bottom of the S1 pocket of trypsin suggests a local electrostatic mechanism for the specific ligand recognition of positively charged residues. However the modification of a serine protease from a trypsin to a chymotrypsin specificity requires the mutation of several positions in the S1 pocket and on the surface loops L1, L2 and L3 close to the S1 pocket [30] (indicated in Figure 11B, left). This implies that a group of residues cooperatively acts for the ligand specificity of serine proteases.

Bottom Line: We propose a combinatorial method for mapping conserved networks of amino acid interactions in a protein which is based on the analysis of a set of aligned sequences, the associated distance tree and the combinatorics of its subtrees.The method drops the constraints on high sequence divergence limiting the range of applicability of the statistical approaches previously proposed.We apply the method to four protein families where we show an accurate detection of functional networks and the possibility to treat sets of protein sequences of variable divergence.

View Article: PubMed Central - PubMed

Affiliation: Génomique Analytique, Université Pierre et Marie Curie, Paris, France.

ABSTRACT
Communication between distant sites often defines the biological role of a protein: amino acid long-range interactions are as important in binding specificity, allosteric regulation and conformational change as residues directly contacting the substrate. The maintaining of functional and structural coupling of long-range interacting residues requires coevolution of these residues. Networks of interaction between coevolved residues can be reconstructed, and from the networks, one can possibly derive insights into functional mechanisms for the protein family. We propose a combinatorial method for mapping conserved networks of amino acid interactions in a protein which is based on the analysis of a set of aligned sequences, the associated distance tree and the combinatorics of its subtrees. The degree of coevolution of all pairs of coevolved residues is identified numerically, and networks are reconstructed with a dedicated clustering algorithm. The method drops the constraints on high sequence divergence limiting the range of applicability of the statistical approaches previously proposed. We apply the method to four protein families where we show an accurate detection of functional networks and the possibility to treat sets of protein sequences of variable divergence.

Show MeSH