Limits...
Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling.

Łabaj PP, Leparc GG, Linggi BE, Markillie LM, Wiley HS, Kreil DP - Bioinformatics (2011)

Bottom Line: Based on established tools, we then introduce a new approach for mapping and analysing sequencing reads that yields substantially improved performance in gene expression profiling, increasing the number of transcripts that can reliably be quantified to over 40%.Extrapolations to higher sequencing depths highlight the need for efficient complementary steps.In discussion we outline possible experimental and computational strategies for further improvements in quantification precision. rnaseq10@boku.ac.at

View Article: PubMed Central - PubMed

Affiliation: Boku University Vienna, 1190 Muthgasse 18, Vienna, Austria.

ABSTRACT

Motivation: Measurement precision determines the power of any analysis to reliably identify significant signals, such as in screens for differential expression, independent of whether the experimental design incorporates replicates or not. With the compilation of large-scale RNA-Seq datasets with technical replicate samples, however, we can now, for the first time, perform a systematic analysis of the precision of expression level estimates from massively parallel sequencing technology. This then allows considerations for its improvement by computational or experimental means.

Results: We report on a comprehensive study of target identification and measurement precision, including their dependence on transcript expression levels, read depth and other parameters. In particular, an impressive recall of 84% of the estimated true transcript population could be achieved with 331 million 50 bp reads, with diminishing returns from longer read lengths and even less gains from increased sequencing depths. Most of the measurement power (75%) is spent on only 7% of the known transcriptome, however, making less strongly expressed transcripts harder to measure. Consequently, <30% of all transcripts could be quantified reliably with a relative error<20%. Based on established tools, we then introduce a new approach for mapping and analysing sequencing reads that yields substantially improved performance in gene expression profiling, increasing the number of transcripts that can reliably be quantified to over 40%. Extrapolations to higher sequencing depths highlight the need for efficient complementary steps. In discussion we outline possible experimental and computational strategies for further improvements in quantification precision.

Contact: rnaseq10@boku.ac.at

Show MeSH

Related in: MedlinePlus

Comparison of measurement variation. The graph compares the rescaled cumulative distributions of the standard deviation for alternative technologies and data processing protocols.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3117338&req=5

Figure 6: Comparison of measurement variation. The graph compares the rescaled cumulative distributions of the standard deviation for alternative technologies and data processing protocols.

Mentions: As an independent reference point, we also examined the measurement precision of Affymetrix GeneChips, a well established microarray platform. Figure 6 compares the distributions of the measurement errors for RNA-Seq and chips for different data processing protocols (line styles and shades). On the y-axis, the number of transcripts is shown for which the quantification error was not more than a given value (x-axis). For the standard Bowtie protocol (black dotted lines), only 17% of all known transcripts could be assessed reliably with an error ≤20%. The ‘TopHat+Cufflinks+model’ protocol (dot-dashed) yielded 39 116 reliably measured spliceforms (28%). In contrast, the combined approach introduced in this article yielded 56 980 such spliceforms (41%), providing an extension by almost 50%.


Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling.

Łabaj PP, Leparc GG, Linggi BE, Markillie LM, Wiley HS, Kreil DP - Bioinformatics (2011)

Comparison of measurement variation. The graph compares the rescaled cumulative distributions of the standard deviation for alternative technologies and data processing protocols.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3117338&req=5

Figure 6: Comparison of measurement variation. The graph compares the rescaled cumulative distributions of the standard deviation for alternative technologies and data processing protocols.
Mentions: As an independent reference point, we also examined the measurement precision of Affymetrix GeneChips, a well established microarray platform. Figure 6 compares the distributions of the measurement errors for RNA-Seq and chips for different data processing protocols (line styles and shades). On the y-axis, the number of transcripts is shown for which the quantification error was not more than a given value (x-axis). For the standard Bowtie protocol (black dotted lines), only 17% of all known transcripts could be assessed reliably with an error ≤20%. The ‘TopHat+Cufflinks+model’ protocol (dot-dashed) yielded 39 116 reliably measured spliceforms (28%). In contrast, the combined approach introduced in this article yielded 56 980 such spliceforms (41%), providing an extension by almost 50%.

Bottom Line: Based on established tools, we then introduce a new approach for mapping and analysing sequencing reads that yields substantially improved performance in gene expression profiling, increasing the number of transcripts that can reliably be quantified to over 40%.Extrapolations to higher sequencing depths highlight the need for efficient complementary steps.In discussion we outline possible experimental and computational strategies for further improvements in quantification precision. rnaseq10@boku.ac.at

View Article: PubMed Central - PubMed

Affiliation: Boku University Vienna, 1190 Muthgasse 18, Vienna, Austria.

ABSTRACT

Motivation: Measurement precision determines the power of any analysis to reliably identify significant signals, such as in screens for differential expression, independent of whether the experimental design incorporates replicates or not. With the compilation of large-scale RNA-Seq datasets with technical replicate samples, however, we can now, for the first time, perform a systematic analysis of the precision of expression level estimates from massively parallel sequencing technology. This then allows considerations for its improvement by computational or experimental means.

Results: We report on a comprehensive study of target identification and measurement precision, including their dependence on transcript expression levels, read depth and other parameters. In particular, an impressive recall of 84% of the estimated true transcript population could be achieved with 331 million 50 bp reads, with diminishing returns from longer read lengths and even less gains from increased sequencing depths. Most of the measurement power (75%) is spent on only 7% of the known transcriptome, however, making less strongly expressed transcripts harder to measure. Consequently, <30% of all transcripts could be quantified reliably with a relative error<20%. Based on established tools, we then introduce a new approach for mapping and analysing sequencing reads that yields substantially improved performance in gene expression profiling, increasing the number of transcripts that can reliably be quantified to over 40%. Extrapolations to higher sequencing depths highlight the need for efficient complementary steps. In discussion we outline possible experimental and computational strategies for further improvements in quantification precision.

Contact: rnaseq10@boku.ac.at

Show MeSH
Related in: MedlinePlus