Limits...
Quantitative analysis of RNA-protein interactions on a massively parallel array reveals biophysical and evolutionary landscapes.

Buenrostro JD, Araya CL, Chircus LM, Layton CJ, Chang HY, Snyder MP, Greenleaf WJ - Nat. Biotechnol. (2014)

Bottom Line: RNA-protein interactions drive fundamental biological processes and are targets for molecular engineering, yet quantitative and comprehensive understanding of the sequence determinants of affinity remains limited.By analyzing the biophysical constraints and modeling mutational paths describing the molecular evolution of MS2 from low- to high-affinity hairpins, we quantify widespread molecular epistasis and a long-hypothesized, structure-dependent preference for G:U base pairs over C:A intermediates in evolutionary trajectories.Our results suggest that quantitative analysis of RNA on a massively parallel array (RNA-MaP) provides generalizable insight into the biophysical basis and evolutionary consequences of sequence-function relationships.

View Article: PubMed Central - PubMed

Affiliation: 1] Department of Genetics, Stanford University School of Medicine, Stanford, California, USA. [2] Program in Epithelial Biology and the Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, California, USA. [3] These authors contributed equally to this work.

ABSTRACT
RNA-protein interactions drive fundamental biological processes and are targets for molecular engineering, yet quantitative and comprehensive understanding of the sequence determinants of affinity remains limited. Here we repurpose a high-throughput sequencing instrument to quantitatively measure binding and dissociation of a fluorescently labeled protein to >10(7) RNA targets generated on a flow cell surface by in situ transcription and intermolecular tethering of RNA to DNA. Studying the MS2 coat protein, we decompose the binding energy contributions from primary and secondary RNA structure, and observe that differences in affinity are often driven by sequence-specific changes in both association and dissociation rates. By analyzing the biophysical constraints and modeling mutational paths describing the molecular evolution of MS2 from low- to high-affinity hairpins, we quantify widespread molecular epistasis and a long-hypothesized, structure-dependent preference for G:U base pairs over C:A intermediates in evolutionary trajectories. Our results suggest that quantitative analysis of RNA on a massively parallel array (RNA-MaP) provides generalizable insight into the biophysical basis and evolutionary consequences of sequence-function relationships.

Show MeSH

Related in: MedlinePlus

Evolutionary landscapes are highly constrained by biophysical requirements(a) Tesseracts describe traversal probabilities for the complete set (N=24) of mutational paths between low and high-affinity variants within 4 mutations. The AUC of the cumulative probability of ranked paths measures evolutionary constraint (EAUC), as modulated by epistasis (ε). (b) Density of cumulative probabilities for the ranked paths of 1,997 measured tesseracts. The fraction of the total path probabilities captured per individual path is shown as a function of path rank in the inset. The cumulative sum of these individual values is integrated to calculate EAUC. (c) Distribution of EAUC scores from observed tesseracts (red), tesseracts with uniform path probabilities (blue) and tesseracts with random affinities (purple) imply a highlystructured epistatic landscape. The number of variants significantly constrained (P < 0.01, Benjamini-Hochberg) is indicated for both models. Average evolutionary probability (d) and constraint (e) for paths with changes at each position of the hairpin. (f) Intermediate trajectories for base pair A:U→G:C and U:A→G:C transitions. (g) Probability ratio of evolutionary paths passing through G:U vs. A:C intermediates by base derived from 696 tesseracts with A:U→G:C base pair transformations.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4414031&req=5

Figure 5: Evolutionary landscapes are highly constrained by biophysical requirements(a) Tesseracts describe traversal probabilities for the complete set (N=24) of mutational paths between low and high-affinity variants within 4 mutations. The AUC of the cumulative probability of ranked paths measures evolutionary constraint (EAUC), as modulated by epistasis (ε). (b) Density of cumulative probabilities for the ranked paths of 1,997 measured tesseracts. The fraction of the total path probabilities captured per individual path is shown as a function of path rank in the inset. The cumulative sum of these individual values is integrated to calculate EAUC. (c) Distribution of EAUC scores from observed tesseracts (red), tesseracts with uniform path probabilities (blue) and tesseracts with random affinities (purple) imply a highlystructured epistatic landscape. The number of variants significantly constrained (P < 0.01, Benjamini-Hochberg) is indicated for both models. Average evolutionary probability (d) and constraint (e) for paths with changes at each position of the hairpin. (f) Intermediate trajectories for base pair A:U→G:C and U:A→G:C transitions. (g) Probability ratio of evolutionary paths passing through G:U vs. A:C intermediates by base derived from 696 tesseracts with A:U→G:C base pair transformations.

Mentions: We sought to understand how biophysical properties shape RNA sequence evolution towards higher binding affinity by examining the prevalence of epistasis, or differential mutational path probabilities caused by non-additive affinity gains, in molecular evolution—a question of intense debate41,42. Following previous work43,44, we reconstructed 1,997 complete sets of mutational paths (tesseracts) describing the probability of evolving through permutations of four mutations from 1,597 low-affinity to 127 high-affinity hairpins. We modeled the probability of mutation, or the traversal from a source to a target node, as the effective probability of MS2 binding to the target over all sequences within one mutation of the source in the tesseract. Mutations can arise in any order, resulting in N=4!=24 distinct paths through which mutations may be sequentially acquired (Fig. 5a), with path probabilities defined as the product of probabilities for each mutational step. This model allows us to examine how molecular evolution towards higher affinity could proceed in an RNA-protein interaction, a question separate from the in vivo evolutionary landscape of MS2 sequences where the relationship between affinity and cellular fitness, as well as pleiotropic roles of this sequence in the MS2 genome, define the contours of the fitness landscape.


Quantitative analysis of RNA-protein interactions on a massively parallel array reveals biophysical and evolutionary landscapes.

Buenrostro JD, Araya CL, Chircus LM, Layton CJ, Chang HY, Snyder MP, Greenleaf WJ - Nat. Biotechnol. (2014)

Evolutionary landscapes are highly constrained by biophysical requirements(a) Tesseracts describe traversal probabilities for the complete set (N=24) of mutational paths between low and high-affinity variants within 4 mutations. The AUC of the cumulative probability of ranked paths measures evolutionary constraint (EAUC), as modulated by epistasis (ε). (b) Density of cumulative probabilities for the ranked paths of 1,997 measured tesseracts. The fraction of the total path probabilities captured per individual path is shown as a function of path rank in the inset. The cumulative sum of these individual values is integrated to calculate EAUC. (c) Distribution of EAUC scores from observed tesseracts (red), tesseracts with uniform path probabilities (blue) and tesseracts with random affinities (purple) imply a highlystructured epistatic landscape. The number of variants significantly constrained (P < 0.01, Benjamini-Hochberg) is indicated for both models. Average evolutionary probability (d) and constraint (e) for paths with changes at each position of the hairpin. (f) Intermediate trajectories for base pair A:U→G:C and U:A→G:C transitions. (g) Probability ratio of evolutionary paths passing through G:U vs. A:C intermediates by base derived from 696 tesseracts with A:U→G:C base pair transformations.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4414031&req=5

Figure 5: Evolutionary landscapes are highly constrained by biophysical requirements(a) Tesseracts describe traversal probabilities for the complete set (N=24) of mutational paths between low and high-affinity variants within 4 mutations. The AUC of the cumulative probability of ranked paths measures evolutionary constraint (EAUC), as modulated by epistasis (ε). (b) Density of cumulative probabilities for the ranked paths of 1,997 measured tesseracts. The fraction of the total path probabilities captured per individual path is shown as a function of path rank in the inset. The cumulative sum of these individual values is integrated to calculate EAUC. (c) Distribution of EAUC scores from observed tesseracts (red), tesseracts with uniform path probabilities (blue) and tesseracts with random affinities (purple) imply a highlystructured epistatic landscape. The number of variants significantly constrained (P < 0.01, Benjamini-Hochberg) is indicated for both models. Average evolutionary probability (d) and constraint (e) for paths with changes at each position of the hairpin. (f) Intermediate trajectories for base pair A:U→G:C and U:A→G:C transitions. (g) Probability ratio of evolutionary paths passing through G:U vs. A:C intermediates by base derived from 696 tesseracts with A:U→G:C base pair transformations.
Mentions: We sought to understand how biophysical properties shape RNA sequence evolution towards higher binding affinity by examining the prevalence of epistasis, or differential mutational path probabilities caused by non-additive affinity gains, in molecular evolution—a question of intense debate41,42. Following previous work43,44, we reconstructed 1,997 complete sets of mutational paths (tesseracts) describing the probability of evolving through permutations of four mutations from 1,597 low-affinity to 127 high-affinity hairpins. We modeled the probability of mutation, or the traversal from a source to a target node, as the effective probability of MS2 binding to the target over all sequences within one mutation of the source in the tesseract. Mutations can arise in any order, resulting in N=4!=24 distinct paths through which mutations may be sequentially acquired (Fig. 5a), with path probabilities defined as the product of probabilities for each mutational step. This model allows us to examine how molecular evolution towards higher affinity could proceed in an RNA-protein interaction, a question separate from the in vivo evolutionary landscape of MS2 sequences where the relationship between affinity and cellular fitness, as well as pleiotropic roles of this sequence in the MS2 genome, define the contours of the fitness landscape.

Bottom Line: RNA-protein interactions drive fundamental biological processes and are targets for molecular engineering, yet quantitative and comprehensive understanding of the sequence determinants of affinity remains limited.By analyzing the biophysical constraints and modeling mutational paths describing the molecular evolution of MS2 from low- to high-affinity hairpins, we quantify widespread molecular epistasis and a long-hypothesized, structure-dependent preference for G:U base pairs over C:A intermediates in evolutionary trajectories.Our results suggest that quantitative analysis of RNA on a massively parallel array (RNA-MaP) provides generalizable insight into the biophysical basis and evolutionary consequences of sequence-function relationships.

View Article: PubMed Central - PubMed

Affiliation: 1] Department of Genetics, Stanford University School of Medicine, Stanford, California, USA. [2] Program in Epithelial Biology and the Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, California, USA. [3] These authors contributed equally to this work.

ABSTRACT
RNA-protein interactions drive fundamental biological processes and are targets for molecular engineering, yet quantitative and comprehensive understanding of the sequence determinants of affinity remains limited. Here we repurpose a high-throughput sequencing instrument to quantitatively measure binding and dissociation of a fluorescently labeled protein to >10(7) RNA targets generated on a flow cell surface by in situ transcription and intermolecular tethering of RNA to DNA. Studying the MS2 coat protein, we decompose the binding energy contributions from primary and secondary RNA structure, and observe that differences in affinity are often driven by sequence-specific changes in both association and dissociation rates. By analyzing the biophysical constraints and modeling mutational paths describing the molecular evolution of MS2 from low- to high-affinity hairpins, we quantify widespread molecular epistasis and a long-hypothesized, structure-dependent preference for G:U base pairs over C:A intermediates in evolutionary trajectories. Our results suggest that quantitative analysis of RNA on a massively parallel array (RNA-MaP) provides generalizable insight into the biophysical basis and evolutionary consequences of sequence-function relationships.

Show MeSH
Related in: MedlinePlus