Limits...
Identification of coevolving residues and coevolution potentials emphasizing structure, bond formation and catalytic coordination in protein evolution.

Little DY, Chen L - PLoS ONE (2009)

Bottom Line: The selective pressures associated with a mutation at one site should therefore depend on the amino acid identity of interacting sites.Finally, we demonstrate that pairs of catalytic residues have a significantly increased likelihood to be identified as coevolving.These correlations to distinct protein features verify the accuracy of our algorithm and are consistent with a model of coevolution in which selective pressures towards preserving residue interactions act to shape the mutational landscape of a protein by restricting the set of admissible neutral mutations.

View Article: PubMed Central - PubMed

Affiliation: Department of Molecular and Cell Biology, University of California, Berkeley, California, United States of America.

ABSTRACT
The structure and function of a protein is dependent on coordinated interactions between its residues. The selective pressures associated with a mutation at one site should therefore depend on the amino acid identity of interacting sites. Mutual information has previously been applied to multiple sequence alignments as a means of detecting coevolutionary interactions. Here, we introduce a refinement of the mutual information method that: 1) removes a significant, non-coevolutionary bias and 2) accounts for heteroscedasticity. Using a large, non-overlapping database of protein alignments, we demonstrate that predicted coevolving residue-pairs tend to lie in close physical proximity. We introduce coevolution potentials as a novel measure of the propensity for the 20 amino acids to pair amongst predicted coevolutionary interactions. Ionic, hydrogen, and disulfide bond-forming pairs exhibited the highest potentials. Finally, we demonstrate that pairs of catalytic residues have a significantly increased likelihood to be identified as coevolving. These correlations to distinct protein features verify the accuracy of our algorithm and are consistent with a model of coevolution in which selective pressures towards preserving residue interactions act to shape the mutational landscape of a protein by restricting the set of admissible neutral mutations.

Show MeSH
Comparison of ZRes to other measures of coevolution.(A) To ease processing load, calculations were limited to the 424 alignments with representative structures for which the product of the protein sequence length and alignment size was less than or equal to 100,000. Following the analysis performed previously [5], all residue pairs were ranked from highest to lowest ZRes score. For ranks 1 up to 100, the fraction of residue pairs at or higher than each rank lying within 6 Å of each other was calculated. The average of this contact accuracy across all alignments was then plotted (blue). The process was repeated with the Res (green), OMES (brown), McBASC (magenta), MIp (red), and MI (black) measures. (B) as in A, but utilizing all 1240 alignments with representative crystal structure. The results from one randomization of residue pair rankings are plotted in black. Statistical significance was assessed by Friedman's nonparametric 2-way ANOVA for measure effects on selectivity after factoring out rank effects. All pair-wise comparison in both A and B were significant except between MIp and Res.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2651771&req=5

pone-0004762-g009: Comparison of ZRes to other measures of coevolution.(A) To ease processing load, calculations were limited to the 424 alignments with representative structures for which the product of the protein sequence length and alignment size was less than or equal to 100,000. Following the analysis performed previously [5], all residue pairs were ranked from highest to lowest ZRes score. For ranks 1 up to 100, the fraction of residue pairs at or higher than each rank lying within 6 Å of each other was calculated. The average of this contact accuracy across all alignments was then plotted (blue). The process was repeated with the Res (green), OMES (brown), McBASC (magenta), MIp (red), and MI (black) measures. (B) as in A, but utilizing all 1240 alignments with representative crystal structure. The results from one randomization of residue pair rankings are plotted in black. Statistical significance was assessed by Friedman's nonparametric 2-way ANOVA for measure effects on selectivity after factoring out rank effects. All pair-wise comparison in both A and B were significant except between MIp and Res.

Mentions: To compare our algorithm to these previously developed methods, we used contact prediction accuracy as an approximate correlate of coevolution prediction accuracy. Since none of these algorithms utilize structural data (including primary sequence order) and since none of them are based on known signals for contact prediction, any correlation with structural data should arise from their ability to recognize coevolving sites combined with a tendency for coevolving sites to be close together (or for close residues to be coevolving). Contact prediction therefore is a reasonable approximation of algorithm accuracy. In order to make the comparisons, each measure was used to rank all tested site pairs for each analyzed protein family, and the percentage of the top ranking site pairs contacting in their representative structures were calculated. Our ZRes measure out-performed both OMES and McBASC (p<1×10−16, Friedman's nonparametric two-way ANOVA; Figure 9A). Furthermore, whereas MIp and Res performed equally well, they both under-performed ZRes, showing that our controls for heteroscedasticity significantly improved the measure (p<1×10−16, Friedman's nonparametric two-way ANOVA; Figures 9A and 9B). Since shorter protein sequences have a large fraction of residue pairs in contact with each other (Figure S3B), we repeated the analysis adjusting for sequence length by normalizing the number of top scoring site pairs chosen for each protein family by the length of the protein sequence (Figure S6). Again, ZRes performed significantly better than all other measures (p<0.05 for 1% protein sequence length down to p<1×10−5 for 32% protein sequence length, K-S test).


Identification of coevolving residues and coevolution potentials emphasizing structure, bond formation and catalytic coordination in protein evolution.

Little DY, Chen L - PLoS ONE (2009)

Comparison of ZRes to other measures of coevolution.(A) To ease processing load, calculations were limited to the 424 alignments with representative structures for which the product of the protein sequence length and alignment size was less than or equal to 100,000. Following the analysis performed previously [5], all residue pairs were ranked from highest to lowest ZRes score. For ranks 1 up to 100, the fraction of residue pairs at or higher than each rank lying within 6 Å of each other was calculated. The average of this contact accuracy across all alignments was then plotted (blue). The process was repeated with the Res (green), OMES (brown), McBASC (magenta), MIp (red), and MI (black) measures. (B) as in A, but utilizing all 1240 alignments with representative crystal structure. The results from one randomization of residue pair rankings are plotted in black. Statistical significance was assessed by Friedman's nonparametric 2-way ANOVA for measure effects on selectivity after factoring out rank effects. All pair-wise comparison in both A and B were significant except between MIp and Res.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2651771&req=5

pone-0004762-g009: Comparison of ZRes to other measures of coevolution.(A) To ease processing load, calculations were limited to the 424 alignments with representative structures for which the product of the protein sequence length and alignment size was less than or equal to 100,000. Following the analysis performed previously [5], all residue pairs were ranked from highest to lowest ZRes score. For ranks 1 up to 100, the fraction of residue pairs at or higher than each rank lying within 6 Å of each other was calculated. The average of this contact accuracy across all alignments was then plotted (blue). The process was repeated with the Res (green), OMES (brown), McBASC (magenta), MIp (red), and MI (black) measures. (B) as in A, but utilizing all 1240 alignments with representative crystal structure. The results from one randomization of residue pair rankings are plotted in black. Statistical significance was assessed by Friedman's nonparametric 2-way ANOVA for measure effects on selectivity after factoring out rank effects. All pair-wise comparison in both A and B were significant except between MIp and Res.
Mentions: To compare our algorithm to these previously developed methods, we used contact prediction accuracy as an approximate correlate of coevolution prediction accuracy. Since none of these algorithms utilize structural data (including primary sequence order) and since none of them are based on known signals for contact prediction, any correlation with structural data should arise from their ability to recognize coevolving sites combined with a tendency for coevolving sites to be close together (or for close residues to be coevolving). Contact prediction therefore is a reasonable approximation of algorithm accuracy. In order to make the comparisons, each measure was used to rank all tested site pairs for each analyzed protein family, and the percentage of the top ranking site pairs contacting in their representative structures were calculated. Our ZRes measure out-performed both OMES and McBASC (p<1×10−16, Friedman's nonparametric two-way ANOVA; Figure 9A). Furthermore, whereas MIp and Res performed equally well, they both under-performed ZRes, showing that our controls for heteroscedasticity significantly improved the measure (p<1×10−16, Friedman's nonparametric two-way ANOVA; Figures 9A and 9B). Since shorter protein sequences have a large fraction of residue pairs in contact with each other (Figure S3B), we repeated the analysis adjusting for sequence length by normalizing the number of top scoring site pairs chosen for each protein family by the length of the protein sequence (Figure S6). Again, ZRes performed significantly better than all other measures (p<0.05 for 1% protein sequence length down to p<1×10−5 for 32% protein sequence length, K-S test).

Bottom Line: The selective pressures associated with a mutation at one site should therefore depend on the amino acid identity of interacting sites.Finally, we demonstrate that pairs of catalytic residues have a significantly increased likelihood to be identified as coevolving.These correlations to distinct protein features verify the accuracy of our algorithm and are consistent with a model of coevolution in which selective pressures towards preserving residue interactions act to shape the mutational landscape of a protein by restricting the set of admissible neutral mutations.

View Article: PubMed Central - PubMed

Affiliation: Department of Molecular and Cell Biology, University of California, Berkeley, California, United States of America.

ABSTRACT
The structure and function of a protein is dependent on coordinated interactions between its residues. The selective pressures associated with a mutation at one site should therefore depend on the amino acid identity of interacting sites. Mutual information has previously been applied to multiple sequence alignments as a means of detecting coevolutionary interactions. Here, we introduce a refinement of the mutual information method that: 1) removes a significant, non-coevolutionary bias and 2) accounts for heteroscedasticity. Using a large, non-overlapping database of protein alignments, we demonstrate that predicted coevolving residue-pairs tend to lie in close physical proximity. We introduce coevolution potentials as a novel measure of the propensity for the 20 amino acids to pair amongst predicted coevolutionary interactions. Ionic, hydrogen, and disulfide bond-forming pairs exhibited the highest potentials. Finally, we demonstrate that pairs of catalytic residues have a significantly increased likelihood to be identified as coevolving. These correlations to distinct protein features verify the accuracy of our algorithm and are consistent with a model of coevolution in which selective pressures towards preserving residue interactions act to shape the mutational landscape of a protein by restricting the set of admissible neutral mutations.

Show MeSH