Limits...
Evolution of the genetic code: partial optimization of a random code for robustness to translation error in a rugged fitness landscape.

Novozhilov AS, Wolf YI, Koonin EV - Biol. Direct (2007)

Bottom Line: It has been repeatedly argued that this structure of the code results from selective optimization for robustness to translation errors such that translational misreading has the minimal adverse effect.The properties of the standard code were compared to the properties of four sets of codes, namely, purely random codes, random codes that are more robust than the standard code, and two sets of codes that resulted from optimization of the first two sets.The reason the code is not fully optimized could be the trade-off between the beneficial effect of increasing robustness to translation errors and the deleterious effect of codon series reassignment that becomes increasingly severe with growing complexity of the evolving system.

View Article: PubMed Central - HTML - PubMed

Affiliation: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA. novozhil@ncbi.nlm.nih.gov

ABSTRACT

Background: The standard genetic code table has a distinctly non-random structure, with similar amino acids often encoded by codons series that differ by a single nucleotide substitution, typically, in the third or the first position of the codon. It has been repeatedly argued that this structure of the code results from selective optimization for robustness to translation errors such that translational misreading has the minimal adverse effect. Indeed, it has been shown in several studies that the standard code is more robust than a substantial majority of random codes. However, it remains unclear how much evolution the standard code underwent, what is the level of optimization, and what is the likely starting point.

Results: We explored possible evolutionary trajectories of the genetic code within a limited domain of the vast space of possible codes. Only those codes were analyzed for robustness to translation error that possess the same block structure and the same degree of degeneracy as the standard code. This choice of a small part of the vast space of possible codes is based on the notion that the block structure of the standard code is a consequence of the structure of the complex between the cognate tRNA and the codon in mRNA where the third base of the codon plays a minimum role as a specificity determinant. Within this part of the fitness landscape, a simple evolutionary algorithm, with elementary evolutionary steps comprising swaps of four-codon or two-codon series, was employed to investigate the optimization of codes for the maximum attainable robustness. The properties of the standard code were compared to the properties of four sets of codes, namely, purely random codes, random codes that are more robust than the standard code, and two sets of codes that resulted from optimization of the first two sets. The comparison of these sets of codes with the standard code and its locally optimized version showed that, on average, optimization of random codes yielded evolutionary trajectories that converged at the same level of robustness to translation errors as the optimization path of the standard code; however, the standard code required considerably fewer steps to reach that level than an average random code. When evolution starts from random codes whose fitness is comparable to that of the standard code, they typically reach much higher level of optimization than the standard code, i.e., the standard code is much closer to its local minimum (fitness peak) than most of the random codes with similar levels of robustness. Thus, the standard genetic code appears to be a point on an evolutionary trajectory from a random point (code) about half the way to the summit of the local peak. The fitness landscape of code evolution appears to be extremely rugged, containing numerous peaks with a broad distribution of heights, and the standard code is relatively unremarkable, being located on the slope of a moderate-height peak.

Conclusion: The standard code appears to be the result of partial optimization of a random code for robustness to errors of translation. The reason the code is not fully optimized could be the trade-off between the beneficial effect of increasing robustness to translation errors and the deleterious effect of codon series reassignment that becomes increasingly severe with growing complexity of the evolving system. Thus, evolution of the code can be represented as a combination of adaptation and frozen accident.

Show MeSH

Related in: MedlinePlus

Comparison of the standard code with random alternatives for different amino acid substitution matrices and cost functions (1). Z-score is the distance, measured in standard deviations, between the mean of random code costs and the standard code cost. ϕ1, ϕ2, ϕ3 are the cost functions (1) where f(c) is the frequency of codon c; ϕ4, ϕ5, ϕ6 are the cost functions (1) for f(c) = 1 ϕ7, ϕ8, ϕ9;are the cost functions (1) where f(c) is the respective amino acid frequency; in ϕ1, ϕ4, ϕ7 p(c'/c) = 1 for any c and c' that differ by 1 nucleotide, and p(c'/c) = 0 otherwise; ϕ2, ϕ5, ϕ8 incorporate the inferred transition-transversion bias, i.e., p(c'/c) = trb if c and c' differ by a transition, and p(c'/c) = 1 if cand c' differ by a transversion (trb = 2 in our calculations); ϕ3, ϕ6, ϕ9 use the scheme (2).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2211284&req=5

Figure 2: Comparison of the standard code with random alternatives for different amino acid substitution matrices and cost functions (1). Z-score is the distance, measured in standard deviations, between the mean of random code costs and the standard code cost. ϕ1, ϕ2, ϕ3 are the cost functions (1) where f(c) is the frequency of codon c; ϕ4, ϕ5, ϕ6 are the cost functions (1) for f(c) = 1 ϕ7, ϕ8, ϕ9;are the cost functions (1) where f(c) is the respective amino acid frequency; in ϕ1, ϕ4, ϕ7 p(c'/c) = 1 for any c and c' that differ by 1 nucleotide, and p(c'/c) = 0 otherwise; ϕ2, ϕ5, ϕ8 incorporate the inferred transition-transversion bias, i.e., p(c'/c) = trb if c and c' differ by a transition, and p(c'/c) = 1 if cand c' differ by a transversion (trb = 2 in our calculations); ϕ3, ϕ6, ϕ9 use the scheme (2).

Mentions: We then estimated the fraction of random codes that outperform the standard code using different error cost functions described in the preceding section. Briefly, (i) codon frequencies typical of extant genomes seem to be suboptimal with respect to the code robustness (error cost) because taking them into account in (1) decreases the difference between the standard code and the random codes (see Fig. 2, cost functions ϕ1, ϕ 2, ϕ 3); similar observations have been reported previously [44,45,56]; (ii) for the PRS and the Gilis matrix cost measures, the inclusion of the inferred translation bias [i.e., the differences in misreading frequencies between the nucleotide positions in a codon together with the positional transition-transversion bias as represented in (2)], improves the code robustness; by contrast, for the PAM and BLOSUM matrices, the scheme (2), usually, has no significant effect or even reduces the code robustness when compared to calculations that take into account only the transition-transversion bias [e.g., cost functions ϕ5, ϕ 6, where ϕ5 incorporates transition-transversion bias only, whereas ϕ6 is the cost function using (2), see also pairs of cost functions ϕ2, ϕ 3 and ϕ8, ϕ 9 (Fig. 2)); (iii) the PAM and BLOSUM matrices show a higher level of robustness for the standard code when compared to the PRS and the Gilis score matrix; (iv) the PAM 74-100 matrix showed results very similar to those obtained with PAM 250 or BLOSUM 80, suggesting that it is equally inadequate as a measure for the cost of amino acid substitutions, as previously proposed by others [48,57]; (v) inclusion of amino acid frequencies, according to [22], had no effect on the genetic code optimality, in contrast to the results of Gilis et al. [22], this can be attributed to the fact that, in our algorithm, each amino acid is coded for by the same number of codons as in the standard code [compare the cost functions ϕ7, ϕ 8, ϕ9, in which amino acid frequencies are included, with ϕ4, ϕ 5, ϕ6, where no differential weights are assigned to codons (Fig. 2)].


Evolution of the genetic code: partial optimization of a random code for robustness to translation error in a rugged fitness landscape.

Novozhilov AS, Wolf YI, Koonin EV - Biol. Direct (2007)

Comparison of the standard code with random alternatives for different amino acid substitution matrices and cost functions (1). Z-score is the distance, measured in standard deviations, between the mean of random code costs and the standard code cost. ϕ1, ϕ2, ϕ3 are the cost functions (1) where f(c) is the frequency of codon c; ϕ4, ϕ5, ϕ6 are the cost functions (1) for f(c) = 1 ϕ7, ϕ8, ϕ9;are the cost functions (1) where f(c) is the respective amino acid frequency; in ϕ1, ϕ4, ϕ7 p(c'/c) = 1 for any c and c' that differ by 1 nucleotide, and p(c'/c) = 0 otherwise; ϕ2, ϕ5, ϕ8 incorporate the inferred transition-transversion bias, i.e., p(c'/c) = trb if c and c' differ by a transition, and p(c'/c) = 1 if cand c' differ by a transversion (trb = 2 in our calculations); ϕ3, ϕ6, ϕ9 use the scheme (2).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2211284&req=5

Figure 2: Comparison of the standard code with random alternatives for different amino acid substitution matrices and cost functions (1). Z-score is the distance, measured in standard deviations, between the mean of random code costs and the standard code cost. ϕ1, ϕ2, ϕ3 are the cost functions (1) where f(c) is the frequency of codon c; ϕ4, ϕ5, ϕ6 are the cost functions (1) for f(c) = 1 ϕ7, ϕ8, ϕ9;are the cost functions (1) where f(c) is the respective amino acid frequency; in ϕ1, ϕ4, ϕ7 p(c'/c) = 1 for any c and c' that differ by 1 nucleotide, and p(c'/c) = 0 otherwise; ϕ2, ϕ5, ϕ8 incorporate the inferred transition-transversion bias, i.e., p(c'/c) = trb if c and c' differ by a transition, and p(c'/c) = 1 if cand c' differ by a transversion (trb = 2 in our calculations); ϕ3, ϕ6, ϕ9 use the scheme (2).
Mentions: We then estimated the fraction of random codes that outperform the standard code using different error cost functions described in the preceding section. Briefly, (i) codon frequencies typical of extant genomes seem to be suboptimal with respect to the code robustness (error cost) because taking them into account in (1) decreases the difference between the standard code and the random codes (see Fig. 2, cost functions ϕ1, ϕ 2, ϕ 3); similar observations have been reported previously [44,45,56]; (ii) for the PRS and the Gilis matrix cost measures, the inclusion of the inferred translation bias [i.e., the differences in misreading frequencies between the nucleotide positions in a codon together with the positional transition-transversion bias as represented in (2)], improves the code robustness; by contrast, for the PAM and BLOSUM matrices, the scheme (2), usually, has no significant effect or even reduces the code robustness when compared to calculations that take into account only the transition-transversion bias [e.g., cost functions ϕ5, ϕ 6, where ϕ5 incorporates transition-transversion bias only, whereas ϕ6 is the cost function using (2), see also pairs of cost functions ϕ2, ϕ 3 and ϕ8, ϕ 9 (Fig. 2)); (iii) the PAM and BLOSUM matrices show a higher level of robustness for the standard code when compared to the PRS and the Gilis score matrix; (iv) the PAM 74-100 matrix showed results very similar to those obtained with PAM 250 or BLOSUM 80, suggesting that it is equally inadequate as a measure for the cost of amino acid substitutions, as previously proposed by others [48,57]; (v) inclusion of amino acid frequencies, according to [22], had no effect on the genetic code optimality, in contrast to the results of Gilis et al. [22], this can be attributed to the fact that, in our algorithm, each amino acid is coded for by the same number of codons as in the standard code [compare the cost functions ϕ7, ϕ 8, ϕ9, in which amino acid frequencies are included, with ϕ4, ϕ 5, ϕ6, where no differential weights are assigned to codons (Fig. 2)].

Bottom Line: It has been repeatedly argued that this structure of the code results from selective optimization for robustness to translation errors such that translational misreading has the minimal adverse effect.The properties of the standard code were compared to the properties of four sets of codes, namely, purely random codes, random codes that are more robust than the standard code, and two sets of codes that resulted from optimization of the first two sets.The reason the code is not fully optimized could be the trade-off between the beneficial effect of increasing robustness to translation errors and the deleterious effect of codon series reassignment that becomes increasingly severe with growing complexity of the evolving system.

View Article: PubMed Central - HTML - PubMed

Affiliation: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA. novozhil@ncbi.nlm.nih.gov

ABSTRACT

Background: The standard genetic code table has a distinctly non-random structure, with similar amino acids often encoded by codons series that differ by a single nucleotide substitution, typically, in the third or the first position of the codon. It has been repeatedly argued that this structure of the code results from selective optimization for robustness to translation errors such that translational misreading has the minimal adverse effect. Indeed, it has been shown in several studies that the standard code is more robust than a substantial majority of random codes. However, it remains unclear how much evolution the standard code underwent, what is the level of optimization, and what is the likely starting point.

Results: We explored possible evolutionary trajectories of the genetic code within a limited domain of the vast space of possible codes. Only those codes were analyzed for robustness to translation error that possess the same block structure and the same degree of degeneracy as the standard code. This choice of a small part of the vast space of possible codes is based on the notion that the block structure of the standard code is a consequence of the structure of the complex between the cognate tRNA and the codon in mRNA where the third base of the codon plays a minimum role as a specificity determinant. Within this part of the fitness landscape, a simple evolutionary algorithm, with elementary evolutionary steps comprising swaps of four-codon or two-codon series, was employed to investigate the optimization of codes for the maximum attainable robustness. The properties of the standard code were compared to the properties of four sets of codes, namely, purely random codes, random codes that are more robust than the standard code, and two sets of codes that resulted from optimization of the first two sets. The comparison of these sets of codes with the standard code and its locally optimized version showed that, on average, optimization of random codes yielded evolutionary trajectories that converged at the same level of robustness to translation errors as the optimization path of the standard code; however, the standard code required considerably fewer steps to reach that level than an average random code. When evolution starts from random codes whose fitness is comparable to that of the standard code, they typically reach much higher level of optimization than the standard code, i.e., the standard code is much closer to its local minimum (fitness peak) than most of the random codes with similar levels of robustness. Thus, the standard genetic code appears to be a point on an evolutionary trajectory from a random point (code) about half the way to the summit of the local peak. The fitness landscape of code evolution appears to be extremely rugged, containing numerous peaks with a broad distribution of heights, and the standard code is relatively unremarkable, being located on the slope of a moderate-height peak.

Conclusion: The standard code appears to be the result of partial optimization of a random code for robustness to errors of translation. The reason the code is not fully optimized could be the trade-off between the beneficial effect of increasing robustness to translation errors and the deleterious effect of codon series reassignment that becomes increasingly severe with growing complexity of the evolving system. Thus, evolution of the code can be represented as a combination of adaptation and frozen accident.

Show MeSH
Related in: MedlinePlus