Limits...
R-SAP: a multi-threading computational pipeline for the characterization of high-throughput RNA-sequencing data.

Mittal VK, McDonald JF - Nucleic Acids Res. (2012)

Bottom Line: We present here a user-friendly and fully automated RNA-Seq analysis pipeline (R-SAP) with built-in multi-threading capability to analyze and quantitate high-throughput RNA-Seq datasets.R-SAP follows a hierarchical decision making procedure to accurately characterize various classes of transcripts and achieves a near linear decrease in data processing time as a result of increased multi-threading.In addition, RNA expression level estimates obtained using R-SAP display high concordance with levels measured by microarrays.

View Article: PubMed Central - PubMed

Affiliation: School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA.

ABSTRACT
The rapid expansion in the quantity and quality of RNA-Seq data requires the development of sophisticated high-performance bioinformatics tools capable of rapidly transforming this data into meaningful information that is easily interpretable by biologists. Currently available analysis tools are often not easily installed by the general biologist and most of them lack inherent parallel processing capabilities widely recognized as an essential feature of next-generation bioinformatics tools. We present here a user-friendly and fully automated RNA-Seq analysis pipeline (R-SAP) with built-in multi-threading capability to analyze and quantitate high-throughput RNA-Seq datasets. R-SAP follows a hierarchical decision making procedure to accurately characterize various classes of transcripts and achieves a near linear decrease in data processing time as a result of increased multi-threading. In addition, RNA expression level estimates obtained using R-SAP display high concordance with levels measured by microarrays.

Show MeSH

Related in: MedlinePlus

Comparison of R-SAP estimated RPKM (reads per kilobase of exon model per million mapped reads) (Y-axis) values versus Affymetrix microarray and TaqMan qRT–PCR expression values (X-axis). (A) Correlation of 0.67 (Affymetrix microarray) and (B) 0.88 (TaqMan qRT–PCR) (B) were obtained using the MAQC Human reference sample (C) A higher correlation of 0.78 (Affymetrix microarray) was obtained using the Gm12878 reference cell line from the ENCODE project.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3351179&req=5

gks047-F6: Comparison of R-SAP estimated RPKM (reads per kilobase of exon model per million mapped reads) (Y-axis) values versus Affymetrix microarray and TaqMan qRT–PCR expression values (X-axis). (A) Correlation of 0.67 (Affymetrix microarray) and (B) 0.88 (TaqMan qRT–PCR) (B) were obtained using the MAQC Human reference sample (C) A higher correlation of 0.78 (Affymetrix microarray) was obtained using the Gm12878 reference cell line from the ENCODE project.

Mentions: Comparison between R-SAP's RPKM values from MAQC Human Reference sample and gene-expression values determined from Affymetrix U133 Plus2.0 resulted in a significant correlation (Spearman correlation = 0.67, P < 0.0001) (Figure 6A) that is in agreement with the similar correlations previously reported in (40,41). We further evaluated our expression estimates by comparing with TaqMan qRT–PCR measurements that is generally considered a more accurate abundance estimation than microarrays. After initial filtering, we retained 962 expressed RefSeq transcripts from TaqMan qRT–PCR data, of which 727 were also present (RPKM > 0) in the RPKM estimates from R-SAP. With TaqMan qRT–PCR estimates, we observed a better correlation of (Spearman correlation = 0.88, P < 0.001, Figure 6B) our RPKM values than those with microarray estimated values.Figure 6.


R-SAP: a multi-threading computational pipeline for the characterization of high-throughput RNA-sequencing data.

Mittal VK, McDonald JF - Nucleic Acids Res. (2012)

Comparison of R-SAP estimated RPKM (reads per kilobase of exon model per million mapped reads) (Y-axis) values versus Affymetrix microarray and TaqMan qRT–PCR expression values (X-axis). (A) Correlation of 0.67 (Affymetrix microarray) and (B) 0.88 (TaqMan qRT–PCR) (B) were obtained using the MAQC Human reference sample (C) A higher correlation of 0.78 (Affymetrix microarray) was obtained using the Gm12878 reference cell line from the ENCODE project.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3351179&req=5

gks047-F6: Comparison of R-SAP estimated RPKM (reads per kilobase of exon model per million mapped reads) (Y-axis) values versus Affymetrix microarray and TaqMan qRT–PCR expression values (X-axis). (A) Correlation of 0.67 (Affymetrix microarray) and (B) 0.88 (TaqMan qRT–PCR) (B) were obtained using the MAQC Human reference sample (C) A higher correlation of 0.78 (Affymetrix microarray) was obtained using the Gm12878 reference cell line from the ENCODE project.
Mentions: Comparison between R-SAP's RPKM values from MAQC Human Reference sample and gene-expression values determined from Affymetrix U133 Plus2.0 resulted in a significant correlation (Spearman correlation = 0.67, P < 0.0001) (Figure 6A) that is in agreement with the similar correlations previously reported in (40,41). We further evaluated our expression estimates by comparing with TaqMan qRT–PCR measurements that is generally considered a more accurate abundance estimation than microarrays. After initial filtering, we retained 962 expressed RefSeq transcripts from TaqMan qRT–PCR data, of which 727 were also present (RPKM > 0) in the RPKM estimates from R-SAP. With TaqMan qRT–PCR estimates, we observed a better correlation of (Spearman correlation = 0.88, P < 0.001, Figure 6B) our RPKM values than those with microarray estimated values.Figure 6.

Bottom Line: We present here a user-friendly and fully automated RNA-Seq analysis pipeline (R-SAP) with built-in multi-threading capability to analyze and quantitate high-throughput RNA-Seq datasets.R-SAP follows a hierarchical decision making procedure to accurately characterize various classes of transcripts and achieves a near linear decrease in data processing time as a result of increased multi-threading.In addition, RNA expression level estimates obtained using R-SAP display high concordance with levels measured by microarrays.

View Article: PubMed Central - PubMed

Affiliation: School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA.

ABSTRACT
The rapid expansion in the quantity and quality of RNA-Seq data requires the development of sophisticated high-performance bioinformatics tools capable of rapidly transforming this data into meaningful information that is easily interpretable by biologists. Currently available analysis tools are often not easily installed by the general biologist and most of them lack inherent parallel processing capabilities widely recognized as an essential feature of next-generation bioinformatics tools. We present here a user-friendly and fully automated RNA-Seq analysis pipeline (R-SAP) with built-in multi-threading capability to analyze and quantitate high-throughput RNA-Seq datasets. R-SAP follows a hierarchical decision making procedure to accurately characterize various classes of transcripts and achieves a near linear decrease in data processing time as a result of increased multi-threading. In addition, RNA expression level estimates obtained using R-SAP display high concordance with levels measured by microarrays.

Show MeSH
Related in: MedlinePlus