Limits...
An Interpretation of the Ancestral Codon from Miller's Amino Acids and Nucleotide Correlations in Modern Coding Sequences.

Carels N, Ponce de Leon M - Bioinform Biol Insights (2015)

Bottom Line: We demonstrate that RWY is a more appropriate pattern than the classical RNY, and purine bias (Rrr) is the product of a network of nucleotide compensations induced by functional constraints on the physicochemical properties of proteins.Through deductions from universal correlation properties, we also demonstrate that amino acids from Miller's spark discharge experiment are compatible with functional primeval proteins at the dawn of living cell radiation on earth.These amino acids match the hydropathy and secondary structures of modern proteins.

View Article: PubMed Central - PubMed

Affiliation: Laboratório de Modelagem de Sistemas Biológicos, National Institute for Science and Technology on Innovation in Neglected Diseases (INCT/IDN), Centro de Desenvolvimento Tecnológico em Saúde (CDTS), Fundação Oswaldo Cruz (FIOCRUZ), Rio de Janeiro, Brazil.

ABSTRACT
Purine bias, which is usually referred to as an "ancestral codon", is known to result in short-range correlations between nucleotides in coding sequences, and it is common in all species. We demonstrate that RWY is a more appropriate pattern than the classical RNY, and purine bias (Rrr) is the product of a network of nucleotide compensations induced by functional constraints on the physicochemical properties of proteins. Through deductions from universal correlation properties, we also demonstrate that amino acids from Miller's spark discharge experiment are compatible with functional primeval proteins at the dawn of living cell radiation on earth. These amino acids match the hydropathy and secondary structures of modern proteins.

No MeSH data available.


Correlations between A2 and A1 (panels A, D, G), A2 and T2 (panels B, E, H), A2 and C3 (panels C, F, I) in H. sapiens (Hs, n = 10,892, panels A, B, C), P. falciparum (Pf, n = 6,844, panels D, E, F), and C. reinhardtii (Cr, n = 15,727, panels G, H, I). r stands for the correlation coefficient and P for the statistical significance. Each r coefficient is associated with a P-value <0.001. Gray dots are for UFM-certified CDSs, and black dots are for CDSs homologous to proteins from PDB. (A) r = 0.57, y = 1.16x + 0.70. (B) r = −0.13. (C) r = −0.48, y = −0.35x + 42.39. (D) rUFM = 0.49, rpdb = 0.49, y = 1.2x – 4.6. (E) rUFM = 0.43, rpdb = −0.57, y = −32.48x + 926.73. (F) rUFM = −0.05, rpdb = 0.25. (G) rUFM = 0.48, rpdb = 0.63, y = 1.4x – 2.5. (H) rUFM = 0.18, rpdb = 0.28. (I) rUFM = 0.07, rpdb = 0.19.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4401237&req=5

f4-bbi-9-2015-037: Correlations between A2 and A1 (panels A, D, G), A2 and T2 (panels B, E, H), A2 and C3 (panels C, F, I) in H. sapiens (Hs, n = 10,892, panels A, B, C), P. falciparum (Pf, n = 6,844, panels D, E, F), and C. reinhardtii (Cr, n = 15,727, panels G, H, I). r stands for the correlation coefficient and P for the statistical significance. Each r coefficient is associated with a P-value <0.001. Gray dots are for UFM-certified CDSs, and black dots are for CDSs homologous to proteins from PDB. (A) r = 0.57, y = 1.16x + 0.70. (B) r = −0.13. (C) r = −0.48, y = −0.35x + 42.39. (D) rUFM = 0.49, rpdb = 0.49, y = 1.2x – 4.6. (E) rUFM = 0.43, rpdb = −0.57, y = −32.48x + 926.73. (F) rUFM = −0.05, rpdb = 0.25. (G) rUFM = 0.48, rpdb = 0.63, y = 1.4x – 2.5. (H) rUFM = 0.18, rpdb = 0.28. (I) rUFM = 0.07, rpdb = 0.19.

Mentions: Under unbiased nucleotide distribution, one would expect a similar guanine frequency in the three codon positions as would match their average value for CDSs. However, in P. falciparum, G2 (G2 = 10.946, σG2 = 3.268) < G (G = 14.881, σG = 2.842) < G1 (G1 = 24.525, σG1 = 5.924) (Fig. 1A) in such a way that hypotheses of G1 = G or G2 = G must be rejected according to Student's t-test (α = 0.05 and 0.01), which is also true in C. reinhardtii where G2 (G2 = 22.832, σG2 = 5.047) < G (G = 36.026, σG = 3.825) < G1 (G1 = 42.451, σG1 = 6.141) (Fig. 1B). For sake of completeness, let us note here for G calculation that G3 = 9.171 (σG3 = 2.746) in P. falciparum and G3 = 42.795 (σG3 = 5.993) in C. reinhardtii (data not shown). Thus, the purine bias is such that the G1 and G2 levels (%) are, respectively, higher and lower than expected in P. falciparum, which is AT-rich, as well as in C. reinhardtii, which is by contrast GC-rich (Fig. 1A, C, E) with the consequence that G1 > G2. G2 is negatively correlated with T2 (Fig. 2B, D, F); a negative correlation was also found between C2 and A2 (Table 1). The negative correlations between R2 and Y2 agree with the observation that AT2 is generally larger than GC2. In reality, the average GC2 level in C. reinhardtii is 53.70% (σ = 8.31), which indicates that the AT2 level is 46.30% (σ = 8.31); thus, this highly GC-rich eukaryote has GC2 ≈ AT2 ≈ 50% (Fig. 3). By contrast, the average GC2 level in P. falciparum is 25.51% (σ = 6.84), which indicates that the AT2 level is 74.49% (σ = 6.86), and according to the universal correlation, the average GC2 level in H. sapiens is 42.54% (σ = 6.62), which falls between that of P. falciparum and C. reinhardtii. Thus, one can reasonably draw the relationship between GC2 and AT2 for any biological species as W2 ≥ S2. Because A1 is positively correlated with A2 (Fig. 4A, D, G), A2 and T2 tend to increase together on average (Fig. 4B, E, H), and R1 > R2 is generally true in coding frames ≥300 bp.4 This situation occurs at the cost of C1 because A1 and C1 are negatively correlated (Table 1). A3 has a strong negative correlation with C3 (r < −0.9), and the second position is connected to the third via the negative correlation between A2 and C3 (Fig. 4C, F, I). Because A1 is positively correlated with A2 (Fig. 4A, D, G), A1 is also negatively correlated with C3 and A3 (Table 1).


An Interpretation of the Ancestral Codon from Miller's Amino Acids and Nucleotide Correlations in Modern Coding Sequences.

Carels N, Ponce de Leon M - Bioinform Biol Insights (2015)

Correlations between A2 and A1 (panels A, D, G), A2 and T2 (panels B, E, H), A2 and C3 (panels C, F, I) in H. sapiens (Hs, n = 10,892, panels A, B, C), P. falciparum (Pf, n = 6,844, panels D, E, F), and C. reinhardtii (Cr, n = 15,727, panels G, H, I). r stands for the correlation coefficient and P for the statistical significance. Each r coefficient is associated with a P-value <0.001. Gray dots are for UFM-certified CDSs, and black dots are for CDSs homologous to proteins from PDB. (A) r = 0.57, y = 1.16x + 0.70. (B) r = −0.13. (C) r = −0.48, y = −0.35x + 42.39. (D) rUFM = 0.49, rpdb = 0.49, y = 1.2x – 4.6. (E) rUFM = 0.43, rpdb = −0.57, y = −32.48x + 926.73. (F) rUFM = −0.05, rpdb = 0.25. (G) rUFM = 0.48, rpdb = 0.63, y = 1.4x – 2.5. (H) rUFM = 0.18, rpdb = 0.28. (I) rUFM = 0.07, rpdb = 0.19.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4401237&req=5

f4-bbi-9-2015-037: Correlations between A2 and A1 (panels A, D, G), A2 and T2 (panels B, E, H), A2 and C3 (panels C, F, I) in H. sapiens (Hs, n = 10,892, panels A, B, C), P. falciparum (Pf, n = 6,844, panels D, E, F), and C. reinhardtii (Cr, n = 15,727, panels G, H, I). r stands for the correlation coefficient and P for the statistical significance. Each r coefficient is associated with a P-value <0.001. Gray dots are for UFM-certified CDSs, and black dots are for CDSs homologous to proteins from PDB. (A) r = 0.57, y = 1.16x + 0.70. (B) r = −0.13. (C) r = −0.48, y = −0.35x + 42.39. (D) rUFM = 0.49, rpdb = 0.49, y = 1.2x – 4.6. (E) rUFM = 0.43, rpdb = −0.57, y = −32.48x + 926.73. (F) rUFM = −0.05, rpdb = 0.25. (G) rUFM = 0.48, rpdb = 0.63, y = 1.4x – 2.5. (H) rUFM = 0.18, rpdb = 0.28. (I) rUFM = 0.07, rpdb = 0.19.
Mentions: Under unbiased nucleotide distribution, one would expect a similar guanine frequency in the three codon positions as would match their average value for CDSs. However, in P. falciparum, G2 (G2 = 10.946, σG2 = 3.268) < G (G = 14.881, σG = 2.842) < G1 (G1 = 24.525, σG1 = 5.924) (Fig. 1A) in such a way that hypotheses of G1 = G or G2 = G must be rejected according to Student's t-test (α = 0.05 and 0.01), which is also true in C. reinhardtii where G2 (G2 = 22.832, σG2 = 5.047) < G (G = 36.026, σG = 3.825) < G1 (G1 = 42.451, σG1 = 6.141) (Fig. 1B). For sake of completeness, let us note here for G calculation that G3 = 9.171 (σG3 = 2.746) in P. falciparum and G3 = 42.795 (σG3 = 5.993) in C. reinhardtii (data not shown). Thus, the purine bias is such that the G1 and G2 levels (%) are, respectively, higher and lower than expected in P. falciparum, which is AT-rich, as well as in C. reinhardtii, which is by contrast GC-rich (Fig. 1A, C, E) with the consequence that G1 > G2. G2 is negatively correlated with T2 (Fig. 2B, D, F); a negative correlation was also found between C2 and A2 (Table 1). The negative correlations between R2 and Y2 agree with the observation that AT2 is generally larger than GC2. In reality, the average GC2 level in C. reinhardtii is 53.70% (σ = 8.31), which indicates that the AT2 level is 46.30% (σ = 8.31); thus, this highly GC-rich eukaryote has GC2 ≈ AT2 ≈ 50% (Fig. 3). By contrast, the average GC2 level in P. falciparum is 25.51% (σ = 6.84), which indicates that the AT2 level is 74.49% (σ = 6.86), and according to the universal correlation, the average GC2 level in H. sapiens is 42.54% (σ = 6.62), which falls between that of P. falciparum and C. reinhardtii. Thus, one can reasonably draw the relationship between GC2 and AT2 for any biological species as W2 ≥ S2. Because A1 is positively correlated with A2 (Fig. 4A, D, G), A2 and T2 tend to increase together on average (Fig. 4B, E, H), and R1 > R2 is generally true in coding frames ≥300 bp.4 This situation occurs at the cost of C1 because A1 and C1 are negatively correlated (Table 1). A3 has a strong negative correlation with C3 (r < −0.9), and the second position is connected to the third via the negative correlation between A2 and C3 (Fig. 4C, F, I). Because A1 is positively correlated with A2 (Fig. 4A, D, G), A1 is also negatively correlated with C3 and A3 (Table 1).

Bottom Line: We demonstrate that RWY is a more appropriate pattern than the classical RNY, and purine bias (Rrr) is the product of a network of nucleotide compensations induced by functional constraints on the physicochemical properties of proteins.Through deductions from universal correlation properties, we also demonstrate that amino acids from Miller's spark discharge experiment are compatible with functional primeval proteins at the dawn of living cell radiation on earth.These amino acids match the hydropathy and secondary structures of modern proteins.

View Article: PubMed Central - PubMed

Affiliation: Laboratório de Modelagem de Sistemas Biológicos, National Institute for Science and Technology on Innovation in Neglected Diseases (INCT/IDN), Centro de Desenvolvimento Tecnológico em Saúde (CDTS), Fundação Oswaldo Cruz (FIOCRUZ), Rio de Janeiro, Brazil.

ABSTRACT
Purine bias, which is usually referred to as an "ancestral codon", is known to result in short-range correlations between nucleotides in coding sequences, and it is common in all species. We demonstrate that RWY is a more appropriate pattern than the classical RNY, and purine bias (Rrr) is the product of a network of nucleotide compensations induced by functional constraints on the physicochemical properties of proteins. Through deductions from universal correlation properties, we also demonstrate that amino acids from Miller's spark discharge experiment are compatible with functional primeval proteins at the dawn of living cell radiation on earth. These amino acids match the hydropathy and secondary structures of modern proteins.

No MeSH data available.