Limits...
Estimation of ribosome profiling performance and reproducibility at various levels of resolution.

Diament A, Tuller T - Biol. Direct (2016)

Bottom Line: Indeed, dozens of ribo-seq studies have included results related to local ribosome densities in different parts of the transcript; nevertheless, the performance of Ribo-seq has yet to be quantitatively evaluated and reported in a large-scale multi-organismal and multi-protocol study of currently available datasets.Our major conclusion is that the ability to infer signals of ribosomal densities at nucleotide scale is considerably lower than previously thought, as signals at this level are not reproduced well in experimental replicates.We believe that our results are important for every researcher studying translation and specifically for researchers analyzing data generated by the Ribo-seq approach.

View Article: PubMed Central - PubMed

Affiliation: Biomedical Engineering Department, Tel Aviv University, Tel Aviv-Yafo, Israel.

ABSTRACT

Background: Ribosome profiling (or Ribo-seq) is currently the most popular methodology for studying translation; it has been employed in recent years to decipher various fundamental gene expression regulation aspects. The main promise of the approach is its ability to detect ribosome densities over an entire transcriptome in high resolution of single codons. Indeed, dozens of ribo-seq studies have included results related to local ribosome densities in different parts of the transcript; nevertheless, the performance of Ribo-seq has yet to be quantitatively evaluated and reported in a large-scale multi-organismal and multi-protocol study of currently available datasets.

Results: Here we provide the first objective evaluation of Ribo-seq at the resolution of a single nucleotide(s) using clear, interpretable measures, based on the analysis of 15 experiments, 6 organisms, and a total of 612, 961 transcripts. Our major conclusion is that the ability to infer signals of ribosomal densities at nucleotide scale is considerably lower than previously thought, as signals at this level are not reproduced well in experimental replicates. In addition, we provide various quantitative measures that connect the expected error rate with Ribo-seq analysis resolution.

Conclusions: The analysis of Ribo-seq data at the resolution of codons and nucleotides provides a challenging task, calls for task-specific statistical methods and further protocol improvements. We believe that our results are important for every researcher studying translation and specifically for researchers analyzing data generated by the Ribo-seq approach.

Reviewers: This article was reviewed by Dmitrij Frishman, Eugene Koonin and Frank Eisenhaber.

No MeSH data available.


Related in: MedlinePlus

Local and global reproducibility in RP replicates. The figure presents the inter-replicate variance for a measured nucleotide position in the transcript (blue) and for complete genes (red). Y-axis is the standard deviation of the fraction of total read counts (RCs) measured in replicate 1 (read count 1, RC1), while the X-axis denotes the total number of read counts in that position in both replicates (RC1, RC2). Each point (bin) is based on the standard deviation of 1000 positions in the dataset for nt-reads, or 100 positions for gene-reads. The confidence in the measurement increases (the variance decreases) with the total read count, as expected. The difference between the two profiles indicates that additional noise and bias exist at the nucleotide level, that is considerably higher than in the gene level. This noise/difference is evident even after the profiles reach plateau, and its gain varies from experiment to experiment. Repeated for: a Ingolia-2009 [10]; b Li-2012 [36]; c Stadler-2011 [26]; d Ingolia-2011 [38]
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4862193&req=5

Fig2: Local and global reproducibility in RP replicates. The figure presents the inter-replicate variance for a measured nucleotide position in the transcript (blue) and for complete genes (red). Y-axis is the standard deviation of the fraction of total read counts (RCs) measured in replicate 1 (read count 1, RC1), while the X-axis denotes the total number of read counts in that position in both replicates (RC1, RC2). Each point (bin) is based on the standard deviation of 1000 positions in the dataset for nt-reads, or 100 positions for gene-reads. The confidence in the measurement increases (the variance decreases) with the total read count, as expected. The difference between the two profiles indicates that additional noise and bias exist at the nucleotide level, that is considerably higher than in the gene level. This noise/difference is evident even after the profiles reach plateau, and its gain varies from experiment to experiment. Repeated for: a Ingolia-2009 [10]; b Li-2012 [36]; c Stadler-2011 [26]; d Ingolia-2011 [38]

Mentions: Correlations between experimental replicates in the ribosome profiling literature are often reported to be very high [10, 23, 43], similar in level to RNA-seq measurements [10] (Fig. 1). We analyzed 15 ribosome profiling experiments containing multiple replicates from 6 organisms and confirmed that, indeed, the correlations between the Ribo-seq read count densities (RCD) of genes in different experimental replicates are high in most cases (r between 0.85 and 1.00). However, while representing every gene with a single value is informative enough for certain types of analyses, many of the questions that ribosome profiling was designed to answer require reproducibility at a much-higher resolution, up to the nucleotide level. It should be noted that local RP measurements (e.g., nucleotide positions) are subject to additional biases and noise that are not as dominant at the global, gene level. For example, one source for such type of noise could be related to inefficient halting of elongation that at some probability allows for additional cycles of elongation to occur [39]. Thus, previous analyses of replicate consistency at the global level cannot predict reproducibility at the local level (Fig. 2). We therefore tested for the first time the reproducibility of ribosome occupancy profiles at the nucleotide level (Fig. 3). The coverage (percentage of nucleotides in the transcript to which at least one ribosomal footprint mapped) of most transcripts in the genome is low, leading to sparse profiles with many differences between repetitions. For example, a typical gene in terms of coverage in the Ingolia-2009 [10] dataset appears in Fig. 3a, with a coverage as low as 8 % (this is in fact the 3rd quartile, with a coverage higher than that of 75 % of the genes). The correlation between measured read counts at every nucleotide position in replicates for this transcript was 0.24 (p = 2x10−16) (Fig. 3b), a significant but rather weak correlation (only 5.8 % of the variance of the read count profile of one replicate can be explained by the second one). We computed per-position correlations for the entire transcriptome between replicates in the 15 experiments (Fig. 3c). For example, the median correlation between two transcripts appearing in the Ingolia-2009 dataset [10] is 0.12 (p = 5.7x10−8). Similarly, in most ribosome profiling experiments analyzed we found that the median correlation in the genome was below 0.4 (16 % of the variance of the read count profile of one replicate can be explained by the second one), indicating that the profiles are not reproducible at the nucleotide level. The 20 % highly expressed genes in each experiment showed higher correlations, but still typically below 0.6 (36 % of the variance of the read count profile of one replicate can be explained by the second one). Highly expressed genes have a higher RCD and tend to have profiles of higher coverage, leading to a higher number of reads per position and to a higher confidence in their count per position, which promotes reproducibility (Fig. 3c). It should be noted that we obtained similar results for datasets that were generated using various RP protocol variants, including such that avoided pre-treatment of the samples with cycloheximide before lysis [23, 26, 36, 44]. Similar conclusions regarding the local and global reproducibility of RP were obtained via different measures, demonstrating the robustness of these conclusions (Fig. 2).Fig. 1


Estimation of ribosome profiling performance and reproducibility at various levels of resolution.

Diament A, Tuller T - Biol. Direct (2016)

Local and global reproducibility in RP replicates. The figure presents the inter-replicate variance for a measured nucleotide position in the transcript (blue) and for complete genes (red). Y-axis is the standard deviation of the fraction of total read counts (RCs) measured in replicate 1 (read count 1, RC1), while the X-axis denotes the total number of read counts in that position in both replicates (RC1, RC2). Each point (bin) is based on the standard deviation of 1000 positions in the dataset for nt-reads, or 100 positions for gene-reads. The confidence in the measurement increases (the variance decreases) with the total read count, as expected. The difference between the two profiles indicates that additional noise and bias exist at the nucleotide level, that is considerably higher than in the gene level. This noise/difference is evident even after the profiles reach plateau, and its gain varies from experiment to experiment. Repeated for: a Ingolia-2009 [10]; b Li-2012 [36]; c Stadler-2011 [26]; d Ingolia-2011 [38]
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4862193&req=5

Fig2: Local and global reproducibility in RP replicates. The figure presents the inter-replicate variance for a measured nucleotide position in the transcript (blue) and for complete genes (red). Y-axis is the standard deviation of the fraction of total read counts (RCs) measured in replicate 1 (read count 1, RC1), while the X-axis denotes the total number of read counts in that position in both replicates (RC1, RC2). Each point (bin) is based on the standard deviation of 1000 positions in the dataset for nt-reads, or 100 positions for gene-reads. The confidence in the measurement increases (the variance decreases) with the total read count, as expected. The difference between the two profiles indicates that additional noise and bias exist at the nucleotide level, that is considerably higher than in the gene level. This noise/difference is evident even after the profiles reach plateau, and its gain varies from experiment to experiment. Repeated for: a Ingolia-2009 [10]; b Li-2012 [36]; c Stadler-2011 [26]; d Ingolia-2011 [38]
Mentions: Correlations between experimental replicates in the ribosome profiling literature are often reported to be very high [10, 23, 43], similar in level to RNA-seq measurements [10] (Fig. 1). We analyzed 15 ribosome profiling experiments containing multiple replicates from 6 organisms and confirmed that, indeed, the correlations between the Ribo-seq read count densities (RCD) of genes in different experimental replicates are high in most cases (r between 0.85 and 1.00). However, while representing every gene with a single value is informative enough for certain types of analyses, many of the questions that ribosome profiling was designed to answer require reproducibility at a much-higher resolution, up to the nucleotide level. It should be noted that local RP measurements (e.g., nucleotide positions) are subject to additional biases and noise that are not as dominant at the global, gene level. For example, one source for such type of noise could be related to inefficient halting of elongation that at some probability allows for additional cycles of elongation to occur [39]. Thus, previous analyses of replicate consistency at the global level cannot predict reproducibility at the local level (Fig. 2). We therefore tested for the first time the reproducibility of ribosome occupancy profiles at the nucleotide level (Fig. 3). The coverage (percentage of nucleotides in the transcript to which at least one ribosomal footprint mapped) of most transcripts in the genome is low, leading to sparse profiles with many differences between repetitions. For example, a typical gene in terms of coverage in the Ingolia-2009 [10] dataset appears in Fig. 3a, with a coverage as low as 8 % (this is in fact the 3rd quartile, with a coverage higher than that of 75 % of the genes). The correlation between measured read counts at every nucleotide position in replicates for this transcript was 0.24 (p = 2x10−16) (Fig. 3b), a significant but rather weak correlation (only 5.8 % of the variance of the read count profile of one replicate can be explained by the second one). We computed per-position correlations for the entire transcriptome between replicates in the 15 experiments (Fig. 3c). For example, the median correlation between two transcripts appearing in the Ingolia-2009 dataset [10] is 0.12 (p = 5.7x10−8). Similarly, in most ribosome profiling experiments analyzed we found that the median correlation in the genome was below 0.4 (16 % of the variance of the read count profile of one replicate can be explained by the second one), indicating that the profiles are not reproducible at the nucleotide level. The 20 % highly expressed genes in each experiment showed higher correlations, but still typically below 0.6 (36 % of the variance of the read count profile of one replicate can be explained by the second one). Highly expressed genes have a higher RCD and tend to have profiles of higher coverage, leading to a higher number of reads per position and to a higher confidence in their count per position, which promotes reproducibility (Fig. 3c). It should be noted that we obtained similar results for datasets that were generated using various RP protocol variants, including such that avoided pre-treatment of the samples with cycloheximide before lysis [23, 26, 36, 44]. Similar conclusions regarding the local and global reproducibility of RP were obtained via different measures, demonstrating the robustness of these conclusions (Fig. 2).Fig. 1

Bottom Line: Indeed, dozens of ribo-seq studies have included results related to local ribosome densities in different parts of the transcript; nevertheless, the performance of Ribo-seq has yet to be quantitatively evaluated and reported in a large-scale multi-organismal and multi-protocol study of currently available datasets.Our major conclusion is that the ability to infer signals of ribosomal densities at nucleotide scale is considerably lower than previously thought, as signals at this level are not reproduced well in experimental replicates.We believe that our results are important for every researcher studying translation and specifically for researchers analyzing data generated by the Ribo-seq approach.

View Article: PubMed Central - PubMed

Affiliation: Biomedical Engineering Department, Tel Aviv University, Tel Aviv-Yafo, Israel.

ABSTRACT

Background: Ribosome profiling (or Ribo-seq) is currently the most popular methodology for studying translation; it has been employed in recent years to decipher various fundamental gene expression regulation aspects. The main promise of the approach is its ability to detect ribosome densities over an entire transcriptome in high resolution of single codons. Indeed, dozens of ribo-seq studies have included results related to local ribosome densities in different parts of the transcript; nevertheless, the performance of Ribo-seq has yet to be quantitatively evaluated and reported in a large-scale multi-organismal and multi-protocol study of currently available datasets.

Results: Here we provide the first objective evaluation of Ribo-seq at the resolution of a single nucleotide(s) using clear, interpretable measures, based on the analysis of 15 experiments, 6 organisms, and a total of 612, 961 transcripts. Our major conclusion is that the ability to infer signals of ribosomal densities at nucleotide scale is considerably lower than previously thought, as signals at this level are not reproduced well in experimental replicates. In addition, we provide various quantitative measures that connect the expected error rate with Ribo-seq analysis resolution.

Conclusions: The analysis of Ribo-seq data at the resolution of codons and nucleotides provides a challenging task, calls for task-specific statistical methods and further protocol improvements. We believe that our results are important for every researcher studying translation and specifically for researchers analyzing data generated by the Ribo-seq approach.

Reviewers: This article was reviewed by Dmitrij Frishman, Eugene Koonin and Frank Eisenhaber.

No MeSH data available.


Related in: MedlinePlus