Limits...
Revealing stable processing products from ribosome-associated small RNAs by deep-sequencing data analysis.

Zywicki M, Bakowska-Zywicka K, Polacek N - Nucleic Acids Res. (2012)

Bottom Line: Up to date no methodology has been presented to distinguish stable functional RNA species from rapidly degraded side products of nucleases.Here, we present a novel automated computational pipeline, named APART, providing a complete workflow for the reliable detection of RNA processing products from next-generation-sequencing data.To disclose the potential of APART, we have analyzed a cDNA library derived from small ribosome-associated RNAs in Saccharomyces cerevisiae.

View Article: PubMed Central - PubMed

Affiliation: Innsbruck Biocenter, Medical University Innsbruck, Division of Genomics and RNomics, Fritz-Pregl-Strasse 3, 6020 Innsbruck, Austria. marek.zywicki@i-med.ac.at

ABSTRACT
The exploration of the non-protein-coding RNA (ncRNA) transcriptome is currently focused on profiling of microRNA expression and detection of novel ncRNA transcription units. However, recent studies suggest that RNA processing can be a multi-layer process leading to the generation of ncRNAs of diverse functions from a single primary transcript. Up to date no methodology has been presented to distinguish stable functional RNA species from rapidly degraded side products of nucleases. Thus the correct assessment of widespread RNA processing events is one of the major obstacles in transcriptome research. Here, we present a novel automated computational pipeline, named APART, providing a complete workflow for the reliable detection of RNA processing products from next-generation-sequencing data. The major features include efficient handling of non-unique reads, detection of novel stable ncRNA transcripts and processing products and annotation of known transcripts based on multiple sources of information. To disclose the potential of APART, we have analyzed a cDNA library derived from small ribosome-associated RNAs in Saccharomyces cerevisiae. By employing the APART pipeline, we were able to detect and confirm by independent experimental methods multiple novel stable RNA molecules differentially processed from well known ncRNAs, like rRNAs, tRNAs or snoRNAs, in a stress-dependent manner.

Show MeSH

Related in: MedlinePlus

The length dependence of multiple mapping events on the level of reads and contigs observed in the ribosome-associated cDNA library. (A) Distribution of the average genomic hit numbers for reads of different lengths. No significant increase of hit numbers is observed for shorter reads. (B) Distribution of genomic uniqueness values for contigs identified in the study. Although for shorter contigs (<150 nt) higher variability of uniqness is observed, there is no strict dependence on the contig length.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3351166&req=5

gks020-F2: The length dependence of multiple mapping events on the level of reads and contigs observed in the ribosome-associated cDNA library. (A) Distribution of the average genomic hit numbers for reads of different lengths. No significant increase of hit numbers is observed for shorter reads. (B) Distribution of genomic uniqueness values for contigs identified in the study. Although for shorter contigs (<150 nt) higher variability of uniqness is observed, there is no strict dependence on the contig length.

Mentions: The second reason for multiple mapping of the reads to the reference genome is the random similarity of short sequence blocks across the genome. The shorter a particular read is, the higher is the probability of a random match outside of the loci of origin of the transcript. Such spurious matching could influence the calculation of the genomic uniqueness. To prevent such random alignments, the minimum length of the reads used for analysis is set by default to 18. The analysis of our yeast ribosome-derived library shows that at such a read length cut-off there is no strict dependence between the read length and the number of genomic matches (Figure 2A). Similarly, there is no correlation between genomic uniqueness values and contigs length (Figure 2B). Higher variability observed in the lower length range seems to be rather caused by higher number of reads/contigs of such length originating from various types of genes than increased spurious matching (shortest are not the most variable).Figure 2.


Revealing stable processing products from ribosome-associated small RNAs by deep-sequencing data analysis.

Zywicki M, Bakowska-Zywicka K, Polacek N - Nucleic Acids Res. (2012)

The length dependence of multiple mapping events on the level of reads and contigs observed in the ribosome-associated cDNA library. (A) Distribution of the average genomic hit numbers for reads of different lengths. No significant increase of hit numbers is observed for shorter reads. (B) Distribution of genomic uniqueness values for contigs identified in the study. Although for shorter contigs (<150 nt) higher variability of uniqness is observed, there is no strict dependence on the contig length.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3351166&req=5

gks020-F2: The length dependence of multiple mapping events on the level of reads and contigs observed in the ribosome-associated cDNA library. (A) Distribution of the average genomic hit numbers for reads of different lengths. No significant increase of hit numbers is observed for shorter reads. (B) Distribution of genomic uniqueness values for contigs identified in the study. Although for shorter contigs (<150 nt) higher variability of uniqness is observed, there is no strict dependence on the contig length.
Mentions: The second reason for multiple mapping of the reads to the reference genome is the random similarity of short sequence blocks across the genome. The shorter a particular read is, the higher is the probability of a random match outside of the loci of origin of the transcript. Such spurious matching could influence the calculation of the genomic uniqueness. To prevent such random alignments, the minimum length of the reads used for analysis is set by default to 18. The analysis of our yeast ribosome-derived library shows that at such a read length cut-off there is no strict dependence between the read length and the number of genomic matches (Figure 2A). Similarly, there is no correlation between genomic uniqueness values and contigs length (Figure 2B). Higher variability observed in the lower length range seems to be rather caused by higher number of reads/contigs of such length originating from various types of genes than increased spurious matching (shortest are not the most variable).Figure 2.

Bottom Line: Up to date no methodology has been presented to distinguish stable functional RNA species from rapidly degraded side products of nucleases.Here, we present a novel automated computational pipeline, named APART, providing a complete workflow for the reliable detection of RNA processing products from next-generation-sequencing data.To disclose the potential of APART, we have analyzed a cDNA library derived from small ribosome-associated RNAs in Saccharomyces cerevisiae.

View Article: PubMed Central - PubMed

Affiliation: Innsbruck Biocenter, Medical University Innsbruck, Division of Genomics and RNomics, Fritz-Pregl-Strasse 3, 6020 Innsbruck, Austria. marek.zywicki@i-med.ac.at

ABSTRACT
The exploration of the non-protein-coding RNA (ncRNA) transcriptome is currently focused on profiling of microRNA expression and detection of novel ncRNA transcription units. However, recent studies suggest that RNA processing can be a multi-layer process leading to the generation of ncRNAs of diverse functions from a single primary transcript. Up to date no methodology has been presented to distinguish stable functional RNA species from rapidly degraded side products of nucleases. Thus the correct assessment of widespread RNA processing events is one of the major obstacles in transcriptome research. Here, we present a novel automated computational pipeline, named APART, providing a complete workflow for the reliable detection of RNA processing products from next-generation-sequencing data. The major features include efficient handling of non-unique reads, detection of novel stable ncRNA transcripts and processing products and annotation of known transcripts based on multiple sources of information. To disclose the potential of APART, we have analyzed a cDNA library derived from small ribosome-associated RNAs in Saccharomyces cerevisiae. By employing the APART pipeline, we were able to detect and confirm by independent experimental methods multiple novel stable RNA molecules differentially processed from well known ncRNAs, like rRNAs, tRNAs or snoRNAs, in a stress-dependent manner.

Show MeSH
Related in: MedlinePlus