Limits...
R-SAP: a multi-threading computational pipeline for the characterization of high-throughput RNA-sequencing data.

Mittal VK, McDonald JF - Nucleic Acids Res. (2012)

Bottom Line: We present here a user-friendly and fully automated RNA-Seq analysis pipeline (R-SAP) with built-in multi-threading capability to analyze and quantitate high-throughput RNA-Seq datasets.R-SAP follows a hierarchical decision making procedure to accurately characterize various classes of transcripts and achieves a near linear decrease in data processing time as a result of increased multi-threading.In addition, RNA expression level estimates obtained using R-SAP display high concordance with levels measured by microarrays.

View Article: PubMed Central - PubMed

Affiliation: School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA.

ABSTRACT
The rapid expansion in the quantity and quality of RNA-Seq data requires the development of sophisticated high-performance bioinformatics tools capable of rapidly transforming this data into meaningful information that is easily interpretable by biologists. Currently available analysis tools are often not easily installed by the general biologist and most of them lack inherent parallel processing capabilities widely recognized as an essential feature of next-generation bioinformatics tools. We present here a user-friendly and fully automated RNA-Seq analysis pipeline (R-SAP) with built-in multi-threading capability to analyze and quantitate high-throughput RNA-Seq datasets. R-SAP follows a hierarchical decision making procedure to accurately characterize various classes of transcripts and achieves a near linear decrease in data processing time as a result of increased multi-threading. In addition, RNA expression level estimates obtained using R-SAP display high concordance with levels measured by microarrays.

Show MeSH

Related in: MedlinePlus

Benchmarking of R-SAP's running time as compared with Cufflinks. R-SAP (gray line) and Cufflinks (black line) running time (Y-axis) for the quantification of 20 million reads from ENCODE Gm12878 RNA-Seq dataset was compared. R-SAP shows near linear scalability as the number of parallel threads (X-axis) are increased. Inset shows the same plot magnified for Cufflinks running time.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3351179&req=5

gks047-F7: Benchmarking of R-SAP's running time as compared with Cufflinks. R-SAP (gray line) and Cufflinks (black line) running time (Y-axis) for the quantification of 20 million reads from ENCODE Gm12878 RNA-Seq dataset was compared. R-SAP shows near linear scalability as the number of parallel threads (X-axis) are increased. Inset shows the same plot magnified for Cufflinks running time.

Mentions: We benchmarked R-SAP's runtime performance and effect of parallelization against Cufflinks. For the test run purposes, we selected reference genome alignments of 20 million reads from our ENCODE RNA-Seq test dataset that was aligned to the reference genome (hg18) previously using BLAT and TopHat. These 20 million reads were selected from high-scoring reads previously classified by R-SAP. In order to make the comparison between R-SAP and Cufflinks fair, we ran Cufflinks only in its quantification mode while R-SAP was allowed to run only characterization and transcript expression estimation modules. RefSeq transcripts (hg18) were used as the reference annotation set. Running time for R-SAP and Cufflinks with varying number of parallel threads is shown in Figure 7. Although we observed a near linear scalability in R-SAPs performance, Cufflinks performed better than R-SAP for any given number of threads.Figure 7.


R-SAP: a multi-threading computational pipeline for the characterization of high-throughput RNA-sequencing data.

Mittal VK, McDonald JF - Nucleic Acids Res. (2012)

Benchmarking of R-SAP's running time as compared with Cufflinks. R-SAP (gray line) and Cufflinks (black line) running time (Y-axis) for the quantification of 20 million reads from ENCODE Gm12878 RNA-Seq dataset was compared. R-SAP shows near linear scalability as the number of parallel threads (X-axis) are increased. Inset shows the same plot magnified for Cufflinks running time.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3351179&req=5

gks047-F7: Benchmarking of R-SAP's running time as compared with Cufflinks. R-SAP (gray line) and Cufflinks (black line) running time (Y-axis) for the quantification of 20 million reads from ENCODE Gm12878 RNA-Seq dataset was compared. R-SAP shows near linear scalability as the number of parallel threads (X-axis) are increased. Inset shows the same plot magnified for Cufflinks running time.
Mentions: We benchmarked R-SAP's runtime performance and effect of parallelization against Cufflinks. For the test run purposes, we selected reference genome alignments of 20 million reads from our ENCODE RNA-Seq test dataset that was aligned to the reference genome (hg18) previously using BLAT and TopHat. These 20 million reads were selected from high-scoring reads previously classified by R-SAP. In order to make the comparison between R-SAP and Cufflinks fair, we ran Cufflinks only in its quantification mode while R-SAP was allowed to run only characterization and transcript expression estimation modules. RefSeq transcripts (hg18) were used as the reference annotation set. Running time for R-SAP and Cufflinks with varying number of parallel threads is shown in Figure 7. Although we observed a near linear scalability in R-SAPs performance, Cufflinks performed better than R-SAP for any given number of threads.Figure 7.

Bottom Line: We present here a user-friendly and fully automated RNA-Seq analysis pipeline (R-SAP) with built-in multi-threading capability to analyze and quantitate high-throughput RNA-Seq datasets.R-SAP follows a hierarchical decision making procedure to accurately characterize various classes of transcripts and achieves a near linear decrease in data processing time as a result of increased multi-threading.In addition, RNA expression level estimates obtained using R-SAP display high concordance with levels measured by microarrays.

View Article: PubMed Central - PubMed

Affiliation: School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA.

ABSTRACT
The rapid expansion in the quantity and quality of RNA-Seq data requires the development of sophisticated high-performance bioinformatics tools capable of rapidly transforming this data into meaningful information that is easily interpretable by biologists. Currently available analysis tools are often not easily installed by the general biologist and most of them lack inherent parallel processing capabilities widely recognized as an essential feature of next-generation bioinformatics tools. We present here a user-friendly and fully automated RNA-Seq analysis pipeline (R-SAP) with built-in multi-threading capability to analyze and quantitate high-throughput RNA-Seq datasets. R-SAP follows a hierarchical decision making procedure to accurately characterize various classes of transcripts and achieves a near linear decrease in data processing time as a result of increased multi-threading. In addition, RNA expression level estimates obtained using R-SAP display high concordance with levels measured by microarrays.

Show MeSH
Related in: MedlinePlus