Limits...
Prebiotically plausible mechanisms increase compositional diversity of nucleic acid sequences.

Derr J, Manapat ML, Rajamani S, Leu K, Xulvi-Brunet R, Joseph I, Nowak MA, Chen IA - Nucleic Acids Res. (2012)

Bottom Line: However, ribonucleotides vary greatly in synthetic yield, reactivity and degradation rate, and their non-enzymatic polymerization results in compositionally biased sequences.Our work suggests that prebiotically plausible chemical mechanisms of nucleic acid polymerization and ligation could predispose toward a diverse pool of longer, potentially structured molecules.Such mechanisms could have set the stage for the appearance of functional activity very early in the emergence of life.

View Article: PubMed Central - PubMed

Affiliation: FAS Center for Systems Biology, Harvard University, Cambridge, MA 02138, USA.

ABSTRACT
During the origin of life, the biological information of nucleic acid polymers must have increased to encode functional molecules (the RNA world). Ribozymes tend to be compositionally unbiased, as is the vast majority of possible sequence space. However, ribonucleotides vary greatly in synthetic yield, reactivity and degradation rate, and their non-enzymatic polymerization results in compositionally biased sequences. While natural selection could lead to complex sequences, molecules with some activity are required to begin this process. Was the emergence of compositionally diverse sequences a matter of chance, or could prebiotically plausible reactions counter chemical biases to increase the probability of finding a ribozyme? Our in silico simulations using a two-letter alphabet show that template-directed ligation and high concatenation rates counter compositional bias and shift the pool toward longer sequences, permitting greater exploration of sequence space and stable folding. We verified experimentally that unbiased DNA sequences are more efficient templates for ligation, thus increasing the compositional diversity of the pool. Our work suggests that prebiotically plausible chemical mechanisms of nucleic acid polymerization and ligation could predispose toward a diverse pool of longer, potentially structured molecules. Such mechanisms could have set the stage for the appearance of functional activity very early in the emergence of life.

Show MeSH

Related in: MedlinePlus

Compositional diversity, sequence space and predicted RNA folding energy. (a) Most of sequence space is of high compositional diversity. Histogram of C4 for RNA sequences, computed from random sampling of 109 sequences of length 50 (black dots) in silico. The complete histogram for all possible sequences of shorter length is computable and is similar to that of the random sample of 50-mers (length 10 = blue, 12 = pink, 14 = green, 17 = orange). (b) Compositional diversity (C4) and predicted minimum folding energy (Em) for known ribozymes (length 40–60; see Supplementary Data) (45) are shown as blue dots with mean and SD (blue lines). (c) C4 versus Em (black dots) predicted by Viennafold (41) for 2.5 × 106 RNA sequences of length 50. To minimize effects from GC-content, we restricted the in silico sampling to sequences whose GC content is 40–60%. To avoid sampling artifacts, sequences were assigned to five bins according to C4, and an equal number of unique sequences were analyzed in each bin. The bin averages are shown as the red line (see Supplementary Data for values and SDs).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3378899&req=5

gks065-F1: Compositional diversity, sequence space and predicted RNA folding energy. (a) Most of sequence space is of high compositional diversity. Histogram of C4 for RNA sequences, computed from random sampling of 109 sequences of length 50 (black dots) in silico. The complete histogram for all possible sequences of shorter length is computable and is similar to that of the random sample of 50-mers (length 10 = blue, 12 = pink, 14 = green, 17 = orange). (b) Compositional diversity (C4) and predicted minimum folding energy (Em) for known ribozymes (length 40–60; see Supplementary Data) (45) are shown as blue dots with mean and SD (blue lines). (c) C4 versus Em (black dots) predicted by Viennafold (41) for 2.5 × 106 RNA sequences of length 50. To minimize effects from GC-content, we restricted the in silico sampling to sequences whose GC content is 40–60%. To avoid sampling artifacts, sequences were assigned to five bins according to C4, and an equal number of unique sequences were analyzed in each bin. The bin averages are shown as the red line (see Supplementary Data for values and SDs).

Mentions: We find Ck to be a useful measure of the diversity of k-mers within sequence s, because values of Ck close to 1 would be desirable in the RNA world for at least two reasons. First, high Ck characterizes the vast majority of sequence space, because the number of different sequences corresponding to a particular composition is greater if the composition is more uniform (43). The total number of possible unique sequences varies approximately exponentially with Ck (Figure 1a). Biases in monomer composition and reactivity would decrease the average Ck and thus restrict the exploration of sequence space. For example, a 10-fold bias in composition decreases the average Ck from 0.94 to 0.43, which represents a severe restriction in sequence space given the exponential dependence (Supplementary Data). While sequence space might contain many potentially structured molecules (44), any search through sequence space for which the average Ck is low would under-represent or omit a large fraction of possible sequences. Therefore, high average Ck is desirable for finding rare, functional molecules.Figure 1.


Prebiotically plausible mechanisms increase compositional diversity of nucleic acid sequences.

Derr J, Manapat ML, Rajamani S, Leu K, Xulvi-Brunet R, Joseph I, Nowak MA, Chen IA - Nucleic Acids Res. (2012)

Compositional diversity, sequence space and predicted RNA folding energy. (a) Most of sequence space is of high compositional diversity. Histogram of C4 for RNA sequences, computed from random sampling of 109 sequences of length 50 (black dots) in silico. The complete histogram for all possible sequences of shorter length is computable and is similar to that of the random sample of 50-mers (length 10 = blue, 12 = pink, 14 = green, 17 = orange). (b) Compositional diversity (C4) and predicted minimum folding energy (Em) for known ribozymes (length 40–60; see Supplementary Data) (45) are shown as blue dots with mean and SD (blue lines). (c) C4 versus Em (black dots) predicted by Viennafold (41) for 2.5 × 106 RNA sequences of length 50. To minimize effects from GC-content, we restricted the in silico sampling to sequences whose GC content is 40–60%. To avoid sampling artifacts, sequences were assigned to five bins according to C4, and an equal number of unique sequences were analyzed in each bin. The bin averages are shown as the red line (see Supplementary Data for values and SDs).
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3378899&req=5

gks065-F1: Compositional diversity, sequence space and predicted RNA folding energy. (a) Most of sequence space is of high compositional diversity. Histogram of C4 for RNA sequences, computed from random sampling of 109 sequences of length 50 (black dots) in silico. The complete histogram for all possible sequences of shorter length is computable and is similar to that of the random sample of 50-mers (length 10 = blue, 12 = pink, 14 = green, 17 = orange). (b) Compositional diversity (C4) and predicted minimum folding energy (Em) for known ribozymes (length 40–60; see Supplementary Data) (45) are shown as blue dots with mean and SD (blue lines). (c) C4 versus Em (black dots) predicted by Viennafold (41) for 2.5 × 106 RNA sequences of length 50. To minimize effects from GC-content, we restricted the in silico sampling to sequences whose GC content is 40–60%. To avoid sampling artifacts, sequences were assigned to five bins according to C4, and an equal number of unique sequences were analyzed in each bin. The bin averages are shown as the red line (see Supplementary Data for values and SDs).
Mentions: We find Ck to be a useful measure of the diversity of k-mers within sequence s, because values of Ck close to 1 would be desirable in the RNA world for at least two reasons. First, high Ck characterizes the vast majority of sequence space, because the number of different sequences corresponding to a particular composition is greater if the composition is more uniform (43). The total number of possible unique sequences varies approximately exponentially with Ck (Figure 1a). Biases in monomer composition and reactivity would decrease the average Ck and thus restrict the exploration of sequence space. For example, a 10-fold bias in composition decreases the average Ck from 0.94 to 0.43, which represents a severe restriction in sequence space given the exponential dependence (Supplementary Data). While sequence space might contain many potentially structured molecules (44), any search through sequence space for which the average Ck is low would under-represent or omit a large fraction of possible sequences. Therefore, high average Ck is desirable for finding rare, functional molecules.Figure 1.

Bottom Line: However, ribonucleotides vary greatly in synthetic yield, reactivity and degradation rate, and their non-enzymatic polymerization results in compositionally biased sequences.Our work suggests that prebiotically plausible chemical mechanisms of nucleic acid polymerization and ligation could predispose toward a diverse pool of longer, potentially structured molecules.Such mechanisms could have set the stage for the appearance of functional activity very early in the emergence of life.

View Article: PubMed Central - PubMed

Affiliation: FAS Center for Systems Biology, Harvard University, Cambridge, MA 02138, USA.

ABSTRACT
During the origin of life, the biological information of nucleic acid polymers must have increased to encode functional molecules (the RNA world). Ribozymes tend to be compositionally unbiased, as is the vast majority of possible sequence space. However, ribonucleotides vary greatly in synthetic yield, reactivity and degradation rate, and their non-enzymatic polymerization results in compositionally biased sequences. While natural selection could lead to complex sequences, molecules with some activity are required to begin this process. Was the emergence of compositionally diverse sequences a matter of chance, or could prebiotically plausible reactions counter chemical biases to increase the probability of finding a ribozyme? Our in silico simulations using a two-letter alphabet show that template-directed ligation and high concatenation rates counter compositional bias and shift the pool toward longer sequences, permitting greater exploration of sequence space and stable folding. We verified experimentally that unbiased DNA sequences are more efficient templates for ligation, thus increasing the compositional diversity of the pool. Our work suggests that prebiotically plausible chemical mechanisms of nucleic acid polymerization and ligation could predispose toward a diverse pool of longer, potentially structured molecules. Such mechanisms could have set the stage for the appearance of functional activity very early in the emergence of life.

Show MeSH
Related in: MedlinePlus