Limits...
R-SAP: a multi-threading computational pipeline for the characterization of high-throughput RNA-sequencing data.

Mittal VK, McDonald JF - Nucleic Acids Res. (2012)

Bottom Line: We present here a user-friendly and fully automated RNA-Seq analysis pipeline (R-SAP) with built-in multi-threading capability to analyze and quantitate high-throughput RNA-Seq datasets.R-SAP follows a hierarchical decision making procedure to accurately characterize various classes of transcripts and achieves a near linear decrease in data processing time as a result of increased multi-threading.In addition, RNA expression level estimates obtained using R-SAP display high concordance with levels measured by microarrays.

View Article: PubMed Central - PubMed

Affiliation: School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA.

ABSTRACT
The rapid expansion in the quantity and quality of RNA-Seq data requires the development of sophisticated high-performance bioinformatics tools capable of rapidly transforming this data into meaningful information that is easily interpretable by biologists. Currently available analysis tools are often not easily installed by the general biologist and most of them lack inherent parallel processing capabilities widely recognized as an essential feature of next-generation bioinformatics tools. We present here a user-friendly and fully automated RNA-Seq analysis pipeline (R-SAP) with built-in multi-threading capability to analyze and quantitate high-throughput RNA-Seq datasets. R-SAP follows a hierarchical decision making procedure to accurately characterize various classes of transcripts and achieves a near linear decrease in data processing time as a result of increased multi-threading. In addition, RNA expression level estimates obtained using R-SAP display high concordance with levels measured by microarrays.

Show MeSH

Related in: MedlinePlus

Schematic diagram of the detection and annotation of chimeric transcripts by R-SAP using fragmented genomic alignments. (A) Best possible alignment pairs are selected for the reads displaying significant sequence similarity to the reference genome. Alignment fragments are then individually compared with known transcript models. (B) Alignment pairs belong to two different genes (inter-chromosomal or intra-chromosomal). (C) Alignment pairs mapped to the same gene but in opposite orientation on the reference genome. (D) Both pairs mapped within the same gene but their order on the sequencing read is opposite of their alignment order on the corresponding gene. (E, F) At least one alignment pair mapped to the genomic region with no known gene from the reference gene set.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3351179&req=5

gks047-F3: Schematic diagram of the detection and annotation of chimeric transcripts by R-SAP using fragmented genomic alignments. (A) Best possible alignment pairs are selected for the reads displaying significant sequence similarity to the reference genome. Alignment fragments are then individually compared with known transcript models. (B) Alignment pairs belong to two different genes (inter-chromosomal or intra-chromosomal). (C) Alignment pairs mapped to the same gene but in opposite orientation on the reference genome. (D) Both pairs mapped within the same gene but their order on the sequencing read is opposite of their alignment order on the corresponding gene. (E, F) At least one alignment pair mapped to the genomic region with no known gene from the reference gene set.

Mentions: Chimeric transcripts may be due to genomic rearrangements such as translocations and inversions, or transcriptional processes such as co-transcription, trans-splicing or aberrant intra-genic (within the same gene) splicing (14,15,28,29). Sequencing reads from chimeric transcripts are very likely to produce discrete alignments to distant or close genomic loci. In order to detect candidate chimeric reads, all the reads with top-scoring alignments displaying low query coverage (below the cutoff coverage value, default 90%) and an alignment identity greater than the cutoff value (default 95%) are selected. These reads are considered potential chimeric reads only if the region not covered in the top-scoring alignment of the read is at least 20 bp (default gap threshold). The 20 bp was selected as the default setting because alignment algorithms will not produce a significant alignment for the relatively short remaining part of the read. Once the above criteria are met, alignments are parsed to obtain the alignment pair for the top-scoring alignment (Figure 3A).Figure 3.


R-SAP: a multi-threading computational pipeline for the characterization of high-throughput RNA-sequencing data.

Mittal VK, McDonald JF - Nucleic Acids Res. (2012)

Schematic diagram of the detection and annotation of chimeric transcripts by R-SAP using fragmented genomic alignments. (A) Best possible alignment pairs are selected for the reads displaying significant sequence similarity to the reference genome. Alignment fragments are then individually compared with known transcript models. (B) Alignment pairs belong to two different genes (inter-chromosomal or intra-chromosomal). (C) Alignment pairs mapped to the same gene but in opposite orientation on the reference genome. (D) Both pairs mapped within the same gene but their order on the sequencing read is opposite of their alignment order on the corresponding gene. (E, F) At least one alignment pair mapped to the genomic region with no known gene from the reference gene set.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3351179&req=5

gks047-F3: Schematic diagram of the detection and annotation of chimeric transcripts by R-SAP using fragmented genomic alignments. (A) Best possible alignment pairs are selected for the reads displaying significant sequence similarity to the reference genome. Alignment fragments are then individually compared with known transcript models. (B) Alignment pairs belong to two different genes (inter-chromosomal or intra-chromosomal). (C) Alignment pairs mapped to the same gene but in opposite orientation on the reference genome. (D) Both pairs mapped within the same gene but their order on the sequencing read is opposite of their alignment order on the corresponding gene. (E, F) At least one alignment pair mapped to the genomic region with no known gene from the reference gene set.
Mentions: Chimeric transcripts may be due to genomic rearrangements such as translocations and inversions, or transcriptional processes such as co-transcription, trans-splicing or aberrant intra-genic (within the same gene) splicing (14,15,28,29). Sequencing reads from chimeric transcripts are very likely to produce discrete alignments to distant or close genomic loci. In order to detect candidate chimeric reads, all the reads with top-scoring alignments displaying low query coverage (below the cutoff coverage value, default 90%) and an alignment identity greater than the cutoff value (default 95%) are selected. These reads are considered potential chimeric reads only if the region not covered in the top-scoring alignment of the read is at least 20 bp (default gap threshold). The 20 bp was selected as the default setting because alignment algorithms will not produce a significant alignment for the relatively short remaining part of the read. Once the above criteria are met, alignments are parsed to obtain the alignment pair for the top-scoring alignment (Figure 3A).Figure 3.

Bottom Line: We present here a user-friendly and fully automated RNA-Seq analysis pipeline (R-SAP) with built-in multi-threading capability to analyze and quantitate high-throughput RNA-Seq datasets.R-SAP follows a hierarchical decision making procedure to accurately characterize various classes of transcripts and achieves a near linear decrease in data processing time as a result of increased multi-threading.In addition, RNA expression level estimates obtained using R-SAP display high concordance with levels measured by microarrays.

View Article: PubMed Central - PubMed

Affiliation: School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA.

ABSTRACT
The rapid expansion in the quantity and quality of RNA-Seq data requires the development of sophisticated high-performance bioinformatics tools capable of rapidly transforming this data into meaningful information that is easily interpretable by biologists. Currently available analysis tools are often not easily installed by the general biologist and most of them lack inherent parallel processing capabilities widely recognized as an essential feature of next-generation bioinformatics tools. We present here a user-friendly and fully automated RNA-Seq analysis pipeline (R-SAP) with built-in multi-threading capability to analyze and quantitate high-throughput RNA-Seq datasets. R-SAP follows a hierarchical decision making procedure to accurately characterize various classes of transcripts and achieves a near linear decrease in data processing time as a result of increased multi-threading. In addition, RNA expression level estimates obtained using R-SAP display high concordance with levels measured by microarrays.

Show MeSH
Related in: MedlinePlus