Limits...
Comparable contributions of structural-functional constraints and expression level to the rate of protein sequence evolution.

Wolf MY, Wolf YI, Koonin EV - Biol. Direct (2008)

Bottom Line: The contributions of the translation rate, as determined by the effect of the fusion of a pair of domains within a multidomain protein, and intrinsic, domain-specific structural-functional constraints appear to be comparable in magnitude.Fusion of domains in a multidomain protein results in substantial homogenization of the domain-specific evolutionary rates but significant differences between domain-specific evolution rates remain.Thus, the rate of translation and intrinsic structural-functional constraints both exert sizable and comparable effects on sequence evolution.

View Article: PubMed Central - HTML - PubMed

Affiliation: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA. nskwolf@gmail.com

ABSTRACT

Background: Proteins show a broad range of evolutionary rates. Understanding the factors that are responsible for the characteristic rate of evolution of a given protein arguably is one of the major goals of evolutionary biology. A long-standing general assumption used to be that the evolution rate is, primarily, determined by the specific functional constraints that affect the given protein. These constrains were traditionally thought to depend both on the specific features of the protein's structure and its biological role. The advent of systems biology brought about new types of data, such as expression level and protein-protein interactions, and unexpectedly, a variety of correlations between protein evolution rate and these variables have been observed. The strongest connections by far were repeatedly seen between protein sequence evolution rate and the expression level of the respective gene. It has been hypothesized that this link is due to the selection for the robustness of the protein structure to mistranslation-induced misfolding that is particularly important for highly expressed proteins and is the dominant determinant of the sequence evolution rate.

Results: This work is an attempt to assess the relative contributions of protein domain structure and function, on the one hand, and expression level on the other hand, to the rate of sequence evolution. To this end, we performed a genome-wide analysis of the effect of the fusion of a pair of domains in multidomain proteins on the difference in the domain-specific evolutionary rates. The mistranslation-induced misfolding hypothesis would predict that, within multidomain proteins, fused domains, on average, should evolve at substantially closer rates than the same domains in different proteins because, within a mutlidomain protein, all domains are translated at the same rate. We performed a comprehensive comparison of the evolutionary rates of mammalian and plant protein domains that are either joined in multidomain proteins or contained in distinct proteins. Substantial homogenization of evolutionary rates in multidomain proteins was, indeed, observed in both animals and plants, although highly significant differences between domain-specific rates remained. The contributions of the translation rate, as determined by the effect of the fusion of a pair of domains within a multidomain protein, and intrinsic, domain-specific structural-functional constraints appear to be comparable in magnitude.

Conclusion: Fusion of domains in a multidomain protein results in substantial homogenization of the domain-specific evolutionary rates but significant differences between domain-specific evolution rates remain. Thus, the rate of translation and intrinsic structural-functional constraints both exert sizable and comparable effects on sequence evolution.

Show MeSH
Correlations between the evolutionary rate differences of domains within multidomain proteins and the same domain pairs from separate proteins. Each data point corresponds to a pair of SCOP superfamilies (S1, S2) that is observed at least once in the same multidomain protein in a genome. X-axis: mean of log10 of the ratios of evolution rates of all combination of domains (Di1, Dj2) where Di1 ∈ S1 and Dj2 ∈ S2. Y-axis: mean of log10 of the ratios of evolution rates of all combination of domains (Di1, Dj2) where Di1 ∈ S1, Dj2 ∈ S2 and (Di1, Dj2) belong to the same multidomain protein. A – Human proteins. B – Randomized human protein sequences. C – Arabidopsis proteins. D – Randomized Arabidopsis protein sequences.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2572155&req=5

Figure 6: Correlations between the evolutionary rate differences of domains within multidomain proteins and the same domain pairs from separate proteins. Each data point corresponds to a pair of SCOP superfamilies (S1, S2) that is observed at least once in the same multidomain protein in a genome. X-axis: mean of log10 of the ratios of evolution rates of all combination of domains (Di1, Dj2) where Di1 ∈ S1 and Dj2 ∈ S2. Y-axis: mean of log10 of the ratios of evolution rates of all combination of domains (Di1, Dj2) where Di1 ∈ S1, Dj2 ∈ S2 and (Di1, Dj2) belong to the same multidomain protein. A – Human proteins. B – Randomized human protein sequences. C – Arabidopsis proteins. D – Randomized Arabidopsis protein sequences.

Mentions: We then examined the correlations between the mean rate differences of domain pairs that are conjoined in multidomain proteins and the same domain pairs found in different proteins. If fusion within a multidomain proteins, on average, has no effect on the evolutionary rates of the domains involved, the geometric mean of the ratio of the rates for the given pair of domains should be equal to that for all combinations of these domains (up to the sampling error), so the slope of the regression line is expected to be equal to 1. Conversely, if the evolution rates of the constituent domains in multidomain proteins are completely homogenized, all rate differences for domain pairs in multidomain proteins should be close to 0 (again, subject to a sampling error), and the slope of the regression line would be equal to 0 as well. The results show that both in human (Figure 6A) and in Arabidopsis (Figure 6C), there is a limited but statistically highly significant, positive correlation between the rate differences of domains in the two classes of domain pairs. This correlation was in a sharp contrast with the results of similar comparisons that were performed with sequences of multidomain proteins that were randomized over the entire lengths and the rates of evolution were then compared for regions within the original domain boundaries (compare Figure 6A with 6B, and Figure 6C with 6D). The slope of the linear trendline in the log-log scale was ~0.38 for the 963 domain pairs in human genome and ~0.64 for the 355 domain pairs in Arabidopsis genome. For instance, if the mean evolution rates of two domains in human proteins differ by a factor of 2, then an ~1.3 fold difference in rates can be expected when these domains are fused within a single multidomain protein; similarly, in the case of Arabidopsis, an ~1.6 fold difference can be expected. These results indicate that the contributions of translation-rate related factors and intrinsic structural-functional constraints to the rate of protein sequence evolution are comparable.


Comparable contributions of structural-functional constraints and expression level to the rate of protein sequence evolution.

Wolf MY, Wolf YI, Koonin EV - Biol. Direct (2008)

Correlations between the evolutionary rate differences of domains within multidomain proteins and the same domain pairs from separate proteins. Each data point corresponds to a pair of SCOP superfamilies (S1, S2) that is observed at least once in the same multidomain protein in a genome. X-axis: mean of log10 of the ratios of evolution rates of all combination of domains (Di1, Dj2) where Di1 ∈ S1 and Dj2 ∈ S2. Y-axis: mean of log10 of the ratios of evolution rates of all combination of domains (Di1, Dj2) where Di1 ∈ S1, Dj2 ∈ S2 and (Di1, Dj2) belong to the same multidomain protein. A – Human proteins. B – Randomized human protein sequences. C – Arabidopsis proteins. D – Randomized Arabidopsis protein sequences.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2572155&req=5

Figure 6: Correlations between the evolutionary rate differences of domains within multidomain proteins and the same domain pairs from separate proteins. Each data point corresponds to a pair of SCOP superfamilies (S1, S2) that is observed at least once in the same multidomain protein in a genome. X-axis: mean of log10 of the ratios of evolution rates of all combination of domains (Di1, Dj2) where Di1 ∈ S1 and Dj2 ∈ S2. Y-axis: mean of log10 of the ratios of evolution rates of all combination of domains (Di1, Dj2) where Di1 ∈ S1, Dj2 ∈ S2 and (Di1, Dj2) belong to the same multidomain protein. A – Human proteins. B – Randomized human protein sequences. C – Arabidopsis proteins. D – Randomized Arabidopsis protein sequences.
Mentions: We then examined the correlations between the mean rate differences of domain pairs that are conjoined in multidomain proteins and the same domain pairs found in different proteins. If fusion within a multidomain proteins, on average, has no effect on the evolutionary rates of the domains involved, the geometric mean of the ratio of the rates for the given pair of domains should be equal to that for all combinations of these domains (up to the sampling error), so the slope of the regression line is expected to be equal to 1. Conversely, if the evolution rates of the constituent domains in multidomain proteins are completely homogenized, all rate differences for domain pairs in multidomain proteins should be close to 0 (again, subject to a sampling error), and the slope of the regression line would be equal to 0 as well. The results show that both in human (Figure 6A) and in Arabidopsis (Figure 6C), there is a limited but statistically highly significant, positive correlation between the rate differences of domains in the two classes of domain pairs. This correlation was in a sharp contrast with the results of similar comparisons that were performed with sequences of multidomain proteins that were randomized over the entire lengths and the rates of evolution were then compared for regions within the original domain boundaries (compare Figure 6A with 6B, and Figure 6C with 6D). The slope of the linear trendline in the log-log scale was ~0.38 for the 963 domain pairs in human genome and ~0.64 for the 355 domain pairs in Arabidopsis genome. For instance, if the mean evolution rates of two domains in human proteins differ by a factor of 2, then an ~1.3 fold difference in rates can be expected when these domains are fused within a single multidomain protein; similarly, in the case of Arabidopsis, an ~1.6 fold difference can be expected. These results indicate that the contributions of translation-rate related factors and intrinsic structural-functional constraints to the rate of protein sequence evolution are comparable.

Bottom Line: The contributions of the translation rate, as determined by the effect of the fusion of a pair of domains within a multidomain protein, and intrinsic, domain-specific structural-functional constraints appear to be comparable in magnitude.Fusion of domains in a multidomain protein results in substantial homogenization of the domain-specific evolutionary rates but significant differences between domain-specific evolution rates remain.Thus, the rate of translation and intrinsic structural-functional constraints both exert sizable and comparable effects on sequence evolution.

View Article: PubMed Central - HTML - PubMed

Affiliation: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA. nskwolf@gmail.com

ABSTRACT

Background: Proteins show a broad range of evolutionary rates. Understanding the factors that are responsible for the characteristic rate of evolution of a given protein arguably is one of the major goals of evolutionary biology. A long-standing general assumption used to be that the evolution rate is, primarily, determined by the specific functional constraints that affect the given protein. These constrains were traditionally thought to depend both on the specific features of the protein's structure and its biological role. The advent of systems biology brought about new types of data, such as expression level and protein-protein interactions, and unexpectedly, a variety of correlations between protein evolution rate and these variables have been observed. The strongest connections by far were repeatedly seen between protein sequence evolution rate and the expression level of the respective gene. It has been hypothesized that this link is due to the selection for the robustness of the protein structure to mistranslation-induced misfolding that is particularly important for highly expressed proteins and is the dominant determinant of the sequence evolution rate.

Results: This work is an attempt to assess the relative contributions of protein domain structure and function, on the one hand, and expression level on the other hand, to the rate of sequence evolution. To this end, we performed a genome-wide analysis of the effect of the fusion of a pair of domains in multidomain proteins on the difference in the domain-specific evolutionary rates. The mistranslation-induced misfolding hypothesis would predict that, within multidomain proteins, fused domains, on average, should evolve at substantially closer rates than the same domains in different proteins because, within a mutlidomain protein, all domains are translated at the same rate. We performed a comprehensive comparison of the evolutionary rates of mammalian and plant protein domains that are either joined in multidomain proteins or contained in distinct proteins. Substantial homogenization of evolutionary rates in multidomain proteins was, indeed, observed in both animals and plants, although highly significant differences between domain-specific rates remained. The contributions of the translation rate, as determined by the effect of the fusion of a pair of domains within a multidomain protein, and intrinsic, domain-specific structural-functional constraints appear to be comparable in magnitude.

Conclusion: Fusion of domains in a multidomain protein results in substantial homogenization of the domain-specific evolutionary rates but significant differences between domain-specific evolution rates remain. Thus, the rate of translation and intrinsic structural-functional constraints both exert sizable and comparable effects on sequence evolution.

Show MeSH