Limits...
Identification of coevolving residues and coevolution potentials emphasizing structure, bond formation and catalytic coordination in protein evolution.

Little DY, Chen L - PLoS ONE (2009)

Bottom Line: The selective pressures associated with a mutation at one site should therefore depend on the amino acid identity of interacting sites.Finally, we demonstrate that pairs of catalytic residues have a significantly increased likelihood to be identified as coevolving.These correlations to distinct protein features verify the accuracy of our algorithm and are consistent with a model of coevolution in which selective pressures towards preserving residue interactions act to shape the mutational landscape of a protein by restricting the set of admissible neutral mutations.

View Article: PubMed Central - PubMed

Affiliation: Department of Molecular and Cell Biology, University of California, Berkeley, California, United States of America.

ABSTRACT
The structure and function of a protein is dependent on coordinated interactions between its residues. The selective pressures associated with a mutation at one site should therefore depend on the amino acid identity of interacting sites. Mutual information has previously been applied to multiple sequence alignments as a means of detecting coevolutionary interactions. Here, we introduce a refinement of the mutual information method that: 1) removes a significant, non-coevolutionary bias and 2) accounts for heteroscedasticity. Using a large, non-overlapping database of protein alignments, we demonstrate that predicted coevolving residue-pairs tend to lie in close physical proximity. We introduce coevolution potentials as a novel measure of the propensity for the 20 amino acids to pair amongst predicted coevolutionary interactions. Ionic, hydrogen, and disulfide bond-forming pairs exhibited the highest potentials. Finally, we demonstrate that pairs of catalytic residues have a significantly increased likelihood to be identified as coevolving. These correlations to distinct protein features verify the accuracy of our algorithm and are consistent with a model of coevolution in which selective pressures towards preserving residue interactions act to shape the mutational landscape of a protein by restricting the set of admissible neutral mutations.

Show MeSH

Related in: MedlinePlus

Measuring coevolution without bias.(A) MI scores are correlated to random information scores (RI) in which all coevolutionary and phylogenetic relationships have been removed by random perturbations (RI is an average over 300 randomizations). This demonstrates that MI suffers from a non-phylogenetic bias. (B) The percentage of tested residue pairs that have coevolution measures higher than their average random measure (Standard deviations over 300 randomizations are plotted but are too small to be visualized). Phylogenetic biases induce high MI and MI/Hi,j scorings, which are unobtainable from randomized results. (C) MI is correlated to Hi,j. (D) MI/Hi,j is correlated to its randomized values (same MI/Hi,j measure but with all coevolutionary and phylogenetic relationships removed from sites by random perturbation of amino acids). MI/Hi,j is therefore subject to non-phylogenetic biases. (E) A colorimetric representation of MI scores between pairs of residues in the 2nd PDZ domain of the Human Erbin protein. The striated appearance highlights a large variation in basal MI values between sites. Residue positions are aligned from the N-terminus to the C-terminus. Red = high MI, Blue = low MI, Darkest Blue = untested (>20% gaps). (F) MI is correlated to . (G) Res is not correlated with its randomized values. (H) Positions are ranked in order of increasing variance in Res scores (red line indicates deviation of Res scores) and the distribution of Res scores are plotted. Higher variation at a site increases the likelihood of false indentification of coevolution at that site [5]. (I) ZRes scores are calculated as the product of the z-scores of a Res value relative to its distribution across each site. Light red points represent residue pairs where both z-scores were negative. The ZRes score for such sites are taken as the negative of the product of the z-scores (dark red points). The negative of the lower bound of ZRes (gray lines) is a cutoff for choosing coevolving residues (green points).
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2651771&req=5

pone-0004762-g001: Measuring coevolution without bias.(A) MI scores are correlated to random information scores (RI) in which all coevolutionary and phylogenetic relationships have been removed by random perturbations (RI is an average over 300 randomizations). This demonstrates that MI suffers from a non-phylogenetic bias. (B) The percentage of tested residue pairs that have coevolution measures higher than their average random measure (Standard deviations over 300 randomizations are plotted but are too small to be visualized). Phylogenetic biases induce high MI and MI/Hi,j scorings, which are unobtainable from randomized results. (C) MI is correlated to Hi,j. (D) MI/Hi,j is correlated to its randomized values (same MI/Hi,j measure but with all coevolutionary and phylogenetic relationships removed from sites by random perturbation of amino acids). MI/Hi,j is therefore subject to non-phylogenetic biases. (E) A colorimetric representation of MI scores between pairs of residues in the 2nd PDZ domain of the Human Erbin protein. The striated appearance highlights a large variation in basal MI values between sites. Residue positions are aligned from the N-terminus to the C-terminus. Red = high MI, Blue = low MI, Darkest Blue = untested (>20% gaps). (F) MI is correlated to . (G) Res is not correlated with its randomized values. (H) Positions are ranked in order of increasing variance in Res scores (red line indicates deviation of Res scores) and the distribution of Res scores are plotted. Higher variation at a site increases the likelihood of false indentification of coevolution at that site [5]. (I) ZRes scores are calculated as the product of the z-scores of a Res value relative to its distribution across each site. Light red points represent residue pairs where both z-scores were negative. The ZRes score for such sites are taken as the negative of the product of the z-scores (dark red points). The negative of the lower bound of ZRes (gray lines) is a cutoff for choosing coevolving residues (green points).

Mentions: In order to calculate a reliable numerical estimate for MI, many instances of the random variables are necessary (i.e. many copies of a protein evolving independently but under the same selective pressures). We approximated this by considering the sequences of an MSA as instances of our random variable. The sequences of an MSA, however, fail to meet the assumption of independent evolution. While the stabilization of a mutation in an ancestral protein represents only one evolutionary event, it would be considered, under MI-analysis, as representing an independent event for each descendant protein of that ancestor in the MSA. This treatment of a single event as multiple independent events should act as a source of bias that increases the mutual information among residues. By independently mixing the amino acids at each site among the sequences of an MSA, we can calculate random mutual information (RI) scores in which all coevolutionary signals and potential phylogenetic biases have been removed. As an example, we plotted the MI scores for each pair of amino acid sites in the Pfam full alignment of 5612 PDZ domains against their average RI scores from 300 randomizations (Figure 1A; Pfam ID: PF00595 [21]). The RI score for two sites was almost never higher than their MI score (Figure 1B; less than 1 residue pair out of all 2193 pairs per randomization). Since we would expect most residues to have strong evolutionary interactions with only a limited number of sites [6], the increased mutual information scores of the unperturbed MSA relative to the randomized MSA likely represent the influence of phylogenetic relationships.


Identification of coevolving residues and coevolution potentials emphasizing structure, bond formation and catalytic coordination in protein evolution.

Little DY, Chen L - PLoS ONE (2009)

Measuring coevolution without bias.(A) MI scores are correlated to random information scores (RI) in which all coevolutionary and phylogenetic relationships have been removed by random perturbations (RI is an average over 300 randomizations). This demonstrates that MI suffers from a non-phylogenetic bias. (B) The percentage of tested residue pairs that have coevolution measures higher than their average random measure (Standard deviations over 300 randomizations are plotted but are too small to be visualized). Phylogenetic biases induce high MI and MI/Hi,j scorings, which are unobtainable from randomized results. (C) MI is correlated to Hi,j. (D) MI/Hi,j is correlated to its randomized values (same MI/Hi,j measure but with all coevolutionary and phylogenetic relationships removed from sites by random perturbation of amino acids). MI/Hi,j is therefore subject to non-phylogenetic biases. (E) A colorimetric representation of MI scores between pairs of residues in the 2nd PDZ domain of the Human Erbin protein. The striated appearance highlights a large variation in basal MI values between sites. Residue positions are aligned from the N-terminus to the C-terminus. Red = high MI, Blue = low MI, Darkest Blue = untested (>20% gaps). (F) MI is correlated to . (G) Res is not correlated with its randomized values. (H) Positions are ranked in order of increasing variance in Res scores (red line indicates deviation of Res scores) and the distribution of Res scores are plotted. Higher variation at a site increases the likelihood of false indentification of coevolution at that site [5]. (I) ZRes scores are calculated as the product of the z-scores of a Res value relative to its distribution across each site. Light red points represent residue pairs where both z-scores were negative. The ZRes score for such sites are taken as the negative of the product of the z-scores (dark red points). The negative of the lower bound of ZRes (gray lines) is a cutoff for choosing coevolving residues (green points).
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2651771&req=5

pone-0004762-g001: Measuring coevolution without bias.(A) MI scores are correlated to random information scores (RI) in which all coevolutionary and phylogenetic relationships have been removed by random perturbations (RI is an average over 300 randomizations). This demonstrates that MI suffers from a non-phylogenetic bias. (B) The percentage of tested residue pairs that have coevolution measures higher than their average random measure (Standard deviations over 300 randomizations are plotted but are too small to be visualized). Phylogenetic biases induce high MI and MI/Hi,j scorings, which are unobtainable from randomized results. (C) MI is correlated to Hi,j. (D) MI/Hi,j is correlated to its randomized values (same MI/Hi,j measure but with all coevolutionary and phylogenetic relationships removed from sites by random perturbation of amino acids). MI/Hi,j is therefore subject to non-phylogenetic biases. (E) A colorimetric representation of MI scores between pairs of residues in the 2nd PDZ domain of the Human Erbin protein. The striated appearance highlights a large variation in basal MI values between sites. Residue positions are aligned from the N-terminus to the C-terminus. Red = high MI, Blue = low MI, Darkest Blue = untested (>20% gaps). (F) MI is correlated to . (G) Res is not correlated with its randomized values. (H) Positions are ranked in order of increasing variance in Res scores (red line indicates deviation of Res scores) and the distribution of Res scores are plotted. Higher variation at a site increases the likelihood of false indentification of coevolution at that site [5]. (I) ZRes scores are calculated as the product of the z-scores of a Res value relative to its distribution across each site. Light red points represent residue pairs where both z-scores were negative. The ZRes score for such sites are taken as the negative of the product of the z-scores (dark red points). The negative of the lower bound of ZRes (gray lines) is a cutoff for choosing coevolving residues (green points).
Mentions: In order to calculate a reliable numerical estimate for MI, many instances of the random variables are necessary (i.e. many copies of a protein evolving independently but under the same selective pressures). We approximated this by considering the sequences of an MSA as instances of our random variable. The sequences of an MSA, however, fail to meet the assumption of independent evolution. While the stabilization of a mutation in an ancestral protein represents only one evolutionary event, it would be considered, under MI-analysis, as representing an independent event for each descendant protein of that ancestor in the MSA. This treatment of a single event as multiple independent events should act as a source of bias that increases the mutual information among residues. By independently mixing the amino acids at each site among the sequences of an MSA, we can calculate random mutual information (RI) scores in which all coevolutionary signals and potential phylogenetic biases have been removed. As an example, we plotted the MI scores for each pair of amino acid sites in the Pfam full alignment of 5612 PDZ domains against their average RI scores from 300 randomizations (Figure 1A; Pfam ID: PF00595 [21]). The RI score for two sites was almost never higher than their MI score (Figure 1B; less than 1 residue pair out of all 2193 pairs per randomization). Since we would expect most residues to have strong evolutionary interactions with only a limited number of sites [6], the increased mutual information scores of the unperturbed MSA relative to the randomized MSA likely represent the influence of phylogenetic relationships.

Bottom Line: The selective pressures associated with a mutation at one site should therefore depend on the amino acid identity of interacting sites.Finally, we demonstrate that pairs of catalytic residues have a significantly increased likelihood to be identified as coevolving.These correlations to distinct protein features verify the accuracy of our algorithm and are consistent with a model of coevolution in which selective pressures towards preserving residue interactions act to shape the mutational landscape of a protein by restricting the set of admissible neutral mutations.

View Article: PubMed Central - PubMed

Affiliation: Department of Molecular and Cell Biology, University of California, Berkeley, California, United States of America.

ABSTRACT
The structure and function of a protein is dependent on coordinated interactions between its residues. The selective pressures associated with a mutation at one site should therefore depend on the amino acid identity of interacting sites. Mutual information has previously been applied to multiple sequence alignments as a means of detecting coevolutionary interactions. Here, we introduce a refinement of the mutual information method that: 1) removes a significant, non-coevolutionary bias and 2) accounts for heteroscedasticity. Using a large, non-overlapping database of protein alignments, we demonstrate that predicted coevolving residue-pairs tend to lie in close physical proximity. We introduce coevolution potentials as a novel measure of the propensity for the 20 amino acids to pair amongst predicted coevolutionary interactions. Ionic, hydrogen, and disulfide bond-forming pairs exhibited the highest potentials. Finally, we demonstrate that pairs of catalytic residues have a significantly increased likelihood to be identified as coevolving. These correlations to distinct protein features verify the accuracy of our algorithm and are consistent with a model of coevolution in which selective pressures towards preserving residue interactions act to shape the mutational landscape of a protein by restricting the set of admissible neutral mutations.

Show MeSH
Related in: MedlinePlus