Limits...
Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling.

Łabaj PP, Leparc GG, Linggi BE, Markillie LM, Wiley HS, Kreil DP - Bioinformatics (2011)

Bottom Line: Based on established tools, we then introduce a new approach for mapping and analysing sequencing reads that yields substantially improved performance in gene expression profiling, increasing the number of transcripts that can reliably be quantified to over 40%.Extrapolations to higher sequencing depths highlight the need for efficient complementary steps.In discussion we outline possible experimental and computational strategies for further improvements in quantification precision. rnaseq10@boku.ac.at

View Article: PubMed Central - PubMed

Affiliation: Boku University Vienna, 1190 Muthgasse 18, Vienna, Austria.

ABSTRACT

Motivation: Measurement precision determines the power of any analysis to reliably identify significant signals, such as in screens for differential expression, independent of whether the experimental design incorporates replicates or not. With the compilation of large-scale RNA-Seq datasets with technical replicate samples, however, we can now, for the first time, perform a systematic analysis of the precision of expression level estimates from massively parallel sequencing technology. This then allows considerations for its improvement by computational or experimental means.

Results: We report on a comprehensive study of target identification and measurement precision, including their dependence on transcript expression levels, read depth and other parameters. In particular, an impressive recall of 84% of the estimated true transcript population could be achieved with 331 million 50 bp reads, with diminishing returns from longer read lengths and even less gains from increased sequencing depths. Most of the measurement power (75%) is spent on only 7% of the known transcriptome, however, making less strongly expressed transcripts harder to measure. Consequently, <30% of all transcripts could be quantified reliably with a relative error<20%. Based on established tools, we then introduce a new approach for mapping and analysing sequencing reads that yields substantially improved performance in gene expression profiling, increasing the number of transcripts that can reliably be quantified to over 40%. Extrapolations to higher sequencing depths highlight the need for efficient complementary steps. In discussion we outline possible experimental and computational strategies for further improvements in quantification precision.

Contact: rnaseq10@boku.ac.at

Show MeSH

Related in: MedlinePlus

Standard deviation versus expression level. The plot shows the variation across three technical replicate measurements (standard deviation, y-axis), with each discernible dot representing a transcript target. In shaded areas, the grey level represents density, with dark shading indicating higher densities. The standard deviation is in general larger for transcripts with lower mean expression level (x-axis). More strongly expressed transcripts could often be measured reliably, with a relative error of 20% or less. Interestingly, just 41% of all transcript targets could be measured that precisely (below the horizontal dashed line). Of the 41% most strongly expressed transcripts (to the right of the vertical dashed line), on the other hand, 84% could be measured reliably (below the horizontal dashed line). This is reflected by the high density of targets on the right (dark shading) falling largely below the horizontal line, which is not the case to the left of the vertical dashed line.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3117338&req=5

Figure 2: Standard deviation versus expression level. The plot shows the variation across three technical replicate measurements (standard deviation, y-axis), with each discernible dot representing a transcript target. In shaded areas, the grey level represents density, with dark shading indicating higher densities. The standard deviation is in general larger for transcripts with lower mean expression level (x-axis). More strongly expressed transcripts could often be measured reliably, with a relative error of 20% or less. Interestingly, just 41% of all transcript targets could be measured that precisely (below the horizontal dashed line). Of the 41% most strongly expressed transcripts (to the right of the vertical dashed line), on the other hand, 84% could be measured reliably (below the horizontal dashed line). This is reflected by the high density of targets on the right (dark shading) falling largely below the horizontal line, which is not the case to the left of the vertical dashed line.

Mentions: While a genome level alignment by TopHat detects additional transcripts de novo, the Bowtie alignment of reads to the given spliceform sequences is much more sensitive in the identification of known junctions (almost threefold better; see Table 1). These junctions, however, often play a key role in identifying the expression of a particular spliceform. We could thus identify 101 169 spliceforms (72% of all known transcripts), of which 56 980 could be measured reliably (57%). That means we could assess 41% of all known spliceforms with a relative error of ≤20%. These fall below the horizontal dashed line of Figure 2, which plots the measurement standard deviation versus transcript expression level. The scatter clearly decreases with higher transcript abundance. In view of the 41% of all spliceforms achieving good reproducibility, we can also consider the 41% of targets with the highest expression level. They are found to the right of the vertical mark. Of these, the vast majority (84%) could be measured reliably (below the horizontal line). This is a direct consequence of the sampling nature of RNA-Seq, as discussed below.Fig. 2.


Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling.

Łabaj PP, Leparc GG, Linggi BE, Markillie LM, Wiley HS, Kreil DP - Bioinformatics (2011)

Standard deviation versus expression level. The plot shows the variation across three technical replicate measurements (standard deviation, y-axis), with each discernible dot representing a transcript target. In shaded areas, the grey level represents density, with dark shading indicating higher densities. The standard deviation is in general larger for transcripts with lower mean expression level (x-axis). More strongly expressed transcripts could often be measured reliably, with a relative error of 20% or less. Interestingly, just 41% of all transcript targets could be measured that precisely (below the horizontal dashed line). Of the 41% most strongly expressed transcripts (to the right of the vertical dashed line), on the other hand, 84% could be measured reliably (below the horizontal dashed line). This is reflected by the high density of targets on the right (dark shading) falling largely below the horizontal line, which is not the case to the left of the vertical dashed line.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3117338&req=5

Figure 2: Standard deviation versus expression level. The plot shows the variation across three technical replicate measurements (standard deviation, y-axis), with each discernible dot representing a transcript target. In shaded areas, the grey level represents density, with dark shading indicating higher densities. The standard deviation is in general larger for transcripts with lower mean expression level (x-axis). More strongly expressed transcripts could often be measured reliably, with a relative error of 20% or less. Interestingly, just 41% of all transcript targets could be measured that precisely (below the horizontal dashed line). Of the 41% most strongly expressed transcripts (to the right of the vertical dashed line), on the other hand, 84% could be measured reliably (below the horizontal dashed line). This is reflected by the high density of targets on the right (dark shading) falling largely below the horizontal line, which is not the case to the left of the vertical dashed line.
Mentions: While a genome level alignment by TopHat detects additional transcripts de novo, the Bowtie alignment of reads to the given spliceform sequences is much more sensitive in the identification of known junctions (almost threefold better; see Table 1). These junctions, however, often play a key role in identifying the expression of a particular spliceform. We could thus identify 101 169 spliceforms (72% of all known transcripts), of which 56 980 could be measured reliably (57%). That means we could assess 41% of all known spliceforms with a relative error of ≤20%. These fall below the horizontal dashed line of Figure 2, which plots the measurement standard deviation versus transcript expression level. The scatter clearly decreases with higher transcript abundance. In view of the 41% of all spliceforms achieving good reproducibility, we can also consider the 41% of targets with the highest expression level. They are found to the right of the vertical mark. Of these, the vast majority (84%) could be measured reliably (below the horizontal line). This is a direct consequence of the sampling nature of RNA-Seq, as discussed below.Fig. 2.

Bottom Line: Based on established tools, we then introduce a new approach for mapping and analysing sequencing reads that yields substantially improved performance in gene expression profiling, increasing the number of transcripts that can reliably be quantified to over 40%.Extrapolations to higher sequencing depths highlight the need for efficient complementary steps.In discussion we outline possible experimental and computational strategies for further improvements in quantification precision. rnaseq10@boku.ac.at

View Article: PubMed Central - PubMed

Affiliation: Boku University Vienna, 1190 Muthgasse 18, Vienna, Austria.

ABSTRACT

Motivation: Measurement precision determines the power of any analysis to reliably identify significant signals, such as in screens for differential expression, independent of whether the experimental design incorporates replicates or not. With the compilation of large-scale RNA-Seq datasets with technical replicate samples, however, we can now, for the first time, perform a systematic analysis of the precision of expression level estimates from massively parallel sequencing technology. This then allows considerations for its improvement by computational or experimental means.

Results: We report on a comprehensive study of target identification and measurement precision, including their dependence on transcript expression levels, read depth and other parameters. In particular, an impressive recall of 84% of the estimated true transcript population could be achieved with 331 million 50 bp reads, with diminishing returns from longer read lengths and even less gains from increased sequencing depths. Most of the measurement power (75%) is spent on only 7% of the known transcriptome, however, making less strongly expressed transcripts harder to measure. Consequently, <30% of all transcripts could be quantified reliably with a relative error<20%. Based on established tools, we then introduce a new approach for mapping and analysing sequencing reads that yields substantially improved performance in gene expression profiling, increasing the number of transcripts that can reliably be quantified to over 40%. Extrapolations to higher sequencing depths highlight the need for efficient complementary steps. In discussion we outline possible experimental and computational strategies for further improvements in quantification precision.

Contact: rnaseq10@boku.ac.at

Show MeSH
Related in: MedlinePlus