Limits...
Differential and coherent processing patterns from small RNAs.

Pundhir S, Gorodkin J - Sci Rep (2015)

Bottom Line: While the annotated loci predominantly consist of ~24 nt short RNAs, the unannotated loci comparatively consist of ~17 nt short RNAs.Furthermore, these ~17 nt short RNAs are significantly enriched for overlap to transcription start sites and DNase I hypersensitive sites (p-value < 0.01) that are characteristic features of transcription initiation RNAs.We discuss how the computational pipeline developed in this study has the potential to be applied to other forms of RNA-seq data for further transcriptome-wide studies of differential and coherent processing.

View Article: PubMed Central - PubMed

Affiliation: Center for non-coding RNA in Technology and Health, IKVH, University of Copenhagen, Grønnegårdsvej 3, 1870, Frederiksberg C, Denmark.

ABSTRACT
Post-transcriptional processing events related to short RNAs are often reflected in their read profile patterns emerging from high-throughput sequencing data. MicroRNA arm switching across different tissues is a well-known example of what we define as differential processing. Here, short RNAs from the nine cell lines of the ENCODE project, irrespective of their annotation status, were analyzed for genomic loci representing differential or coherent processing. We observed differential processing predominantly in RNAs annotated as miRNA, snoRNA or tRNA. Four out of five known cases of differentially processed miRNAs that were in the input dataset were recovered and several novel cases were discovered. In contrast to differential processing, coherent processing is observed widespread in both annotated and unannotated regions. While the annotated loci predominantly consist of ~24 nt short RNAs, the unannotated loci comparatively consist of ~17 nt short RNAs. Furthermore, these ~17 nt short RNAs are significantly enriched for overlap to transcription start sites and DNase I hypersensitive sites (p-value < 0.01) that are characteristic features of transcription initiation RNAs. We discuss how the computational pipeline developed in this study has the potential to be applied to other forms of RNA-seq data for further transcriptome-wide studies of differential and coherent processing.

No MeSH data available.


Related in: MedlinePlus

Read length and genomic location of read profiles from 195 unannotated and coherently processed loci (CPL) in the ENCODE dataset.(A) The 195 CPL are consistently enriched (binomial test, p-value < 0.01; number in the boxes) for co-location to transcription start sites (TSS) and transcribed regions (x-axis) across all six human cell lines (y-axis). In all the six cell lines, the whole human genome has been divided into seven genomic distinct regions (enhancer, TSS, transcribed, CTCF, promoter flanking, weak enhancer and repressed) using ChIP-seq data from the ENCODE project4950. (B) Two distinct distributions of read length from 195 unannotated and 158 annotated CPL, respectively were observed. While most reads from annotated CPL were ≥22 nt in length, unannotated CPL were observed to be comprised mostly of reads that are <22 nt in length. A CPL is termed as annotated, if its genomic coordinates overlaps to the coordinates of a ncRNA and unannotated otherwise. C) Out of 195 unannotated CPL, read profiles from 41 loci are observed to be in proximity (1000 nt up- or downstream) to TSS and the reads from these loci are mostly observed to be either upstream or overlapping (−650 to 120 nt) to the TSS. Displayed is the mean percentage overlap of 41 read profiles (y-axis) at each of the 20 nt bin into which the genomic region 1000 nt up- and down-stream of TSS has been divided (x-axis). The bars above and below the x-axis show the coverage of those read profiles, which are in sense and anti-sense direction to that of the transcription, respectively. The black vertical line indicates the transcription start site and black arrow depict the direction of transcription.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4499813&req=5

f4: Read length and genomic location of read profiles from 195 unannotated and coherently processed loci (CPL) in the ENCODE dataset.(A) The 195 CPL are consistently enriched (binomial test, p-value < 0.01; number in the boxes) for co-location to transcription start sites (TSS) and transcribed regions (x-axis) across all six human cell lines (y-axis). In all the six cell lines, the whole human genome has been divided into seven genomic distinct regions (enhancer, TSS, transcribed, CTCF, promoter flanking, weak enhancer and repressed) using ChIP-seq data from the ENCODE project4950. (B) Two distinct distributions of read length from 195 unannotated and 158 annotated CPL, respectively were observed. While most reads from annotated CPL were ≥22 nt in length, unannotated CPL were observed to be comprised mostly of reads that are <22 nt in length. A CPL is termed as annotated, if its genomic coordinates overlaps to the coordinates of a ncRNA and unannotated otherwise. C) Out of 195 unannotated CPL, read profiles from 41 loci are observed to be in proximity (1000 nt up- or downstream) to TSS and the reads from these loci are mostly observed to be either upstream or overlapping (−650 to 120 nt) to the TSS. Displayed is the mean percentage overlap of 41 read profiles (y-axis) at each of the 20 nt bin into which the genomic region 1000 nt up- and down-stream of TSS has been divided (x-axis). The bars above and below the x-axis show the coverage of those read profiles, which are in sense and anti-sense direction to that of the transcription, respectively. The black vertical line indicates the transcription start site and black arrow depict the direction of transcription.

Mentions: Out of the seven distinct chromatin states, we observe a significant enrichment of the CPL within two states, TSS and transcribed region in all of the six cell lines from which they were generated (p-value < 0.01, Binomial test, Fig. 4A). In total, 107 out of 195 CPL were observed to be overlapping to a TSS (23) or a transcribed region (84) in at least one out of the six cell lines. Recently, a novel class of non-canonical miRNAs derived from transcription start site of protein coding genes (TSS-miRNAs) has been discovered in mouse52. However, we did not observed an enrichment of 195 CPL with the orthologous TSS-miRNAs in the human genome (p-value = 0.28, Binomial test).


Differential and coherent processing patterns from small RNAs.

Pundhir S, Gorodkin J - Sci Rep (2015)

Read length and genomic location of read profiles from 195 unannotated and coherently processed loci (CPL) in the ENCODE dataset.(A) The 195 CPL are consistently enriched (binomial test, p-value < 0.01; number in the boxes) for co-location to transcription start sites (TSS) and transcribed regions (x-axis) across all six human cell lines (y-axis). In all the six cell lines, the whole human genome has been divided into seven genomic distinct regions (enhancer, TSS, transcribed, CTCF, promoter flanking, weak enhancer and repressed) using ChIP-seq data from the ENCODE project4950. (B) Two distinct distributions of read length from 195 unannotated and 158 annotated CPL, respectively were observed. While most reads from annotated CPL were ≥22 nt in length, unannotated CPL were observed to be comprised mostly of reads that are <22 nt in length. A CPL is termed as annotated, if its genomic coordinates overlaps to the coordinates of a ncRNA and unannotated otherwise. C) Out of 195 unannotated CPL, read profiles from 41 loci are observed to be in proximity (1000 nt up- or downstream) to TSS and the reads from these loci are mostly observed to be either upstream or overlapping (−650 to 120 nt) to the TSS. Displayed is the mean percentage overlap of 41 read profiles (y-axis) at each of the 20 nt bin into which the genomic region 1000 nt up- and down-stream of TSS has been divided (x-axis). The bars above and below the x-axis show the coverage of those read profiles, which are in sense and anti-sense direction to that of the transcription, respectively. The black vertical line indicates the transcription start site and black arrow depict the direction of transcription.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4499813&req=5

f4: Read length and genomic location of read profiles from 195 unannotated and coherently processed loci (CPL) in the ENCODE dataset.(A) The 195 CPL are consistently enriched (binomial test, p-value < 0.01; number in the boxes) for co-location to transcription start sites (TSS) and transcribed regions (x-axis) across all six human cell lines (y-axis). In all the six cell lines, the whole human genome has been divided into seven genomic distinct regions (enhancer, TSS, transcribed, CTCF, promoter flanking, weak enhancer and repressed) using ChIP-seq data from the ENCODE project4950. (B) Two distinct distributions of read length from 195 unannotated and 158 annotated CPL, respectively were observed. While most reads from annotated CPL were ≥22 nt in length, unannotated CPL were observed to be comprised mostly of reads that are <22 nt in length. A CPL is termed as annotated, if its genomic coordinates overlaps to the coordinates of a ncRNA and unannotated otherwise. C) Out of 195 unannotated CPL, read profiles from 41 loci are observed to be in proximity (1000 nt up- or downstream) to TSS and the reads from these loci are mostly observed to be either upstream or overlapping (−650 to 120 nt) to the TSS. Displayed is the mean percentage overlap of 41 read profiles (y-axis) at each of the 20 nt bin into which the genomic region 1000 nt up- and down-stream of TSS has been divided (x-axis). The bars above and below the x-axis show the coverage of those read profiles, which are in sense and anti-sense direction to that of the transcription, respectively. The black vertical line indicates the transcription start site and black arrow depict the direction of transcription.
Mentions: Out of the seven distinct chromatin states, we observe a significant enrichment of the CPL within two states, TSS and transcribed region in all of the six cell lines from which they were generated (p-value < 0.01, Binomial test, Fig. 4A). In total, 107 out of 195 CPL were observed to be overlapping to a TSS (23) or a transcribed region (84) in at least one out of the six cell lines. Recently, a novel class of non-canonical miRNAs derived from transcription start site of protein coding genes (TSS-miRNAs) has been discovered in mouse52. However, we did not observed an enrichment of 195 CPL with the orthologous TSS-miRNAs in the human genome (p-value = 0.28, Binomial test).

Bottom Line: While the annotated loci predominantly consist of ~24 nt short RNAs, the unannotated loci comparatively consist of ~17 nt short RNAs.Furthermore, these ~17 nt short RNAs are significantly enriched for overlap to transcription start sites and DNase I hypersensitive sites (p-value < 0.01) that are characteristic features of transcription initiation RNAs.We discuss how the computational pipeline developed in this study has the potential to be applied to other forms of RNA-seq data for further transcriptome-wide studies of differential and coherent processing.

View Article: PubMed Central - PubMed

Affiliation: Center for non-coding RNA in Technology and Health, IKVH, University of Copenhagen, Grønnegårdsvej 3, 1870, Frederiksberg C, Denmark.

ABSTRACT
Post-transcriptional processing events related to short RNAs are often reflected in their read profile patterns emerging from high-throughput sequencing data. MicroRNA arm switching across different tissues is a well-known example of what we define as differential processing. Here, short RNAs from the nine cell lines of the ENCODE project, irrespective of their annotation status, were analyzed for genomic loci representing differential or coherent processing. We observed differential processing predominantly in RNAs annotated as miRNA, snoRNA or tRNA. Four out of five known cases of differentially processed miRNAs that were in the input dataset were recovered and several novel cases were discovered. In contrast to differential processing, coherent processing is observed widespread in both annotated and unannotated regions. While the annotated loci predominantly consist of ~24 nt short RNAs, the unannotated loci comparatively consist of ~17 nt short RNAs. Furthermore, these ~17 nt short RNAs are significantly enriched for overlap to transcription start sites and DNase I hypersensitive sites (p-value < 0.01) that are characteristic features of transcription initiation RNAs. We discuss how the computational pipeline developed in this study has the potential to be applied to other forms of RNA-seq data for further transcriptome-wide studies of differential and coherent processing.

No MeSH data available.


Related in: MedlinePlus