Limits...
Differential and coherent processing patterns from small RNAs.

Pundhir S, Gorodkin J - Sci Rep (2015)

Bottom Line: While the annotated loci predominantly consist of ~24 nt short RNAs, the unannotated loci comparatively consist of ~17 nt short RNAs.Furthermore, these ~17 nt short RNAs are significantly enriched for overlap to transcription start sites and DNase I hypersensitive sites (p-value < 0.01) that are characteristic features of transcription initiation RNAs.We discuss how the computational pipeline developed in this study has the potential to be applied to other forms of RNA-seq data for further transcriptome-wide studies of differential and coherent processing.

View Article: PubMed Central - PubMed

Affiliation: Center for non-coding RNA in Technology and Health, IKVH, University of Copenhagen, Grønnegårdsvej 3, 1870, Frederiksberg C, Denmark.

ABSTRACT
Post-transcriptional processing events related to short RNAs are often reflected in their read profile patterns emerging from high-throughput sequencing data. MicroRNA arm switching across different tissues is a well-known example of what we define as differential processing. Here, short RNAs from the nine cell lines of the ENCODE project, irrespective of their annotation status, were analyzed for genomic loci representing differential or coherent processing. We observed differential processing predominantly in RNAs annotated as miRNA, snoRNA or tRNA. Four out of five known cases of differentially processed miRNAs that were in the input dataset were recovered and several novel cases were discovered. In contrast to differential processing, coherent processing is observed widespread in both annotated and unannotated regions. While the annotated loci predominantly consist of ~24 nt short RNAs, the unannotated loci comparatively consist of ~17 nt short RNAs. Furthermore, these ~17 nt short RNAs are significantly enriched for overlap to transcription start sites and DNase I hypersensitive sites (p-value < 0.01) that are characteristic features of transcription initiation RNAs. We discuss how the computational pipeline developed in this study has the potential to be applied to other forms of RNA-seq data for further transcriptome-wide studies of differential and coherent processing.

No MeSH data available.


Related in: MedlinePlus

Analysis of read profiles from 701 genomic loci (L), their mean alignment scores () and its relationship to non-coding RNA processing and annotation.(A) General outline of various analysis steps performed to identify differentially and coherently processed loci. (B) Mean alignment score is computed by all versus all alignment of read profiles at each of the 701 loci using deepBlockAlign and computing their mean. A bimodal distribution is observed where all 97 differentially processed loci (DPL) and 251 non-DPL have a mean alignment score of <0.8, while the rest 353 loci have a mean score of ≥0.8. Here, a high mean alignment score (≥0.8) indicates coherent read profiles across all cell lines at a locus (coherently processed loci; CPL). Non-DPL refers to those loci where the read profiles are neither differentially nor coherently processed. (B) Density distribution of the non-coding RNA annotations belonging to each of the three categories of genomic loci (DPL, non-DPL and CPL). Coherently processing (CPL) is enriched in unannotated loci (p-value = 5.8e-07, Fisher’s exact test). In contract, most (83 out of 97) DPL are annotated (see methods) for the three ncRNA classes (miRNA, snoRNA and tRNA).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4499813&req=5

f3: Analysis of read profiles from 701 genomic loci (L), their mean alignment scores () and its relationship to non-coding RNA processing and annotation.(A) General outline of various analysis steps performed to identify differentially and coherently processed loci. (B) Mean alignment score is computed by all versus all alignment of read profiles at each of the 701 loci using deepBlockAlign and computing their mean. A bimodal distribution is observed where all 97 differentially processed loci (DPL) and 251 non-DPL have a mean alignment score of <0.8, while the rest 353 loci have a mean score of ≥0.8. Here, a high mean alignment score (≥0.8) indicates coherent read profiles across all cell lines at a locus (coherently processed loci; CPL). Non-DPL refers to those loci where the read profiles are neither differentially nor coherently processed. (B) Density distribution of the non-coding RNA annotations belonging to each of the three categories of genomic loci (DPL, non-DPL and CPL). Coherently processing (CPL) is enriched in unannotated loci (p-value = 5.8e-07, Fisher’s exact test). In contract, most (83 out of 97) DPL are annotated (see methods) for the three ncRNA classes (miRNA, snoRNA and tRNA).

Mentions: For each of the 701 loci where block groups are observed in both replicates of nine cell lines from the ENCODE dataset, we aligned all block groups against each other using deepBlockAlign18 and computed the mean alignment score (Fig. 3A). We observed a bimodal distribution characterized by around half of the loci (353) having a mean alignment score of ≥0.8 and the rest of the 348 loci having a mean alignment score of <0.8 (Fig. 3B). Since, the alignment score is a measure of similarity between two read profiles, a high mean alignment score (≥0.8) for the 353 loci indicates that the corresponding read profiles are coherent in terms of their respective arrangement of reads across different cell lines. We therefore have termed the 353 loci (alignment score ≥0.8) as Coherently Processed Loci (CPL). To further characterize the distinct features of read profiles or block groups from CPL, we compared the 6,354 block groups from 353 CPL with 6,264 block groups from the rest of the 348 loci (18 block groups per locus due to two replicates for each of the nine cell lines) in terms of the number of blocks, entropy and length of the block groups. Here, entropy measures the randomness in the arrangement of reads within a block group18. The lower the entropy, the more precisely arranged are the reads with respect to their start positions within the block group. We observed a significant enrichment (p-value = 5.8e-243, Fisher’s exact test) of 5,371 (85%) block groups that are short and composed of precisely arranged constituent reads at the 353 CPL in comparison to 1,875 (30%) of such block groups at the rest of the 348 loci (Supplementary Table S7). We define a block group as short and precise, if it is composed of only one read block, have low entropy of ≤2 and is ≤40 nt in length.


Differential and coherent processing patterns from small RNAs.

Pundhir S, Gorodkin J - Sci Rep (2015)

Analysis of read profiles from 701 genomic loci (L), their mean alignment scores () and its relationship to non-coding RNA processing and annotation.(A) General outline of various analysis steps performed to identify differentially and coherently processed loci. (B) Mean alignment score is computed by all versus all alignment of read profiles at each of the 701 loci using deepBlockAlign and computing their mean. A bimodal distribution is observed where all 97 differentially processed loci (DPL) and 251 non-DPL have a mean alignment score of <0.8, while the rest 353 loci have a mean score of ≥0.8. Here, a high mean alignment score (≥0.8) indicates coherent read profiles across all cell lines at a locus (coherently processed loci; CPL). Non-DPL refers to those loci where the read profiles are neither differentially nor coherently processed. (B) Density distribution of the non-coding RNA annotations belonging to each of the three categories of genomic loci (DPL, non-DPL and CPL). Coherently processing (CPL) is enriched in unannotated loci (p-value = 5.8e-07, Fisher’s exact test). In contract, most (83 out of 97) DPL are annotated (see methods) for the three ncRNA classes (miRNA, snoRNA and tRNA).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4499813&req=5

f3: Analysis of read profiles from 701 genomic loci (L), their mean alignment scores () and its relationship to non-coding RNA processing and annotation.(A) General outline of various analysis steps performed to identify differentially and coherently processed loci. (B) Mean alignment score is computed by all versus all alignment of read profiles at each of the 701 loci using deepBlockAlign and computing their mean. A bimodal distribution is observed where all 97 differentially processed loci (DPL) and 251 non-DPL have a mean alignment score of <0.8, while the rest 353 loci have a mean score of ≥0.8. Here, a high mean alignment score (≥0.8) indicates coherent read profiles across all cell lines at a locus (coherently processed loci; CPL). Non-DPL refers to those loci where the read profiles are neither differentially nor coherently processed. (B) Density distribution of the non-coding RNA annotations belonging to each of the three categories of genomic loci (DPL, non-DPL and CPL). Coherently processing (CPL) is enriched in unannotated loci (p-value = 5.8e-07, Fisher’s exact test). In contract, most (83 out of 97) DPL are annotated (see methods) for the three ncRNA classes (miRNA, snoRNA and tRNA).
Mentions: For each of the 701 loci where block groups are observed in both replicates of nine cell lines from the ENCODE dataset, we aligned all block groups against each other using deepBlockAlign18 and computed the mean alignment score (Fig. 3A). We observed a bimodal distribution characterized by around half of the loci (353) having a mean alignment score of ≥0.8 and the rest of the 348 loci having a mean alignment score of <0.8 (Fig. 3B). Since, the alignment score is a measure of similarity between two read profiles, a high mean alignment score (≥0.8) for the 353 loci indicates that the corresponding read profiles are coherent in terms of their respective arrangement of reads across different cell lines. We therefore have termed the 353 loci (alignment score ≥0.8) as Coherently Processed Loci (CPL). To further characterize the distinct features of read profiles or block groups from CPL, we compared the 6,354 block groups from 353 CPL with 6,264 block groups from the rest of the 348 loci (18 block groups per locus due to two replicates for each of the nine cell lines) in terms of the number of blocks, entropy and length of the block groups. Here, entropy measures the randomness in the arrangement of reads within a block group18. The lower the entropy, the more precisely arranged are the reads with respect to their start positions within the block group. We observed a significant enrichment (p-value = 5.8e-243, Fisher’s exact test) of 5,371 (85%) block groups that are short and composed of precisely arranged constituent reads at the 353 CPL in comparison to 1,875 (30%) of such block groups at the rest of the 348 loci (Supplementary Table S7). We define a block group as short and precise, if it is composed of only one read block, have low entropy of ≤2 and is ≤40 nt in length.

Bottom Line: While the annotated loci predominantly consist of ~24 nt short RNAs, the unannotated loci comparatively consist of ~17 nt short RNAs.Furthermore, these ~17 nt short RNAs are significantly enriched for overlap to transcription start sites and DNase I hypersensitive sites (p-value < 0.01) that are characteristic features of transcription initiation RNAs.We discuss how the computational pipeline developed in this study has the potential to be applied to other forms of RNA-seq data for further transcriptome-wide studies of differential and coherent processing.

View Article: PubMed Central - PubMed

Affiliation: Center for non-coding RNA in Technology and Health, IKVH, University of Copenhagen, Grønnegårdsvej 3, 1870, Frederiksberg C, Denmark.

ABSTRACT
Post-transcriptional processing events related to short RNAs are often reflected in their read profile patterns emerging from high-throughput sequencing data. MicroRNA arm switching across different tissues is a well-known example of what we define as differential processing. Here, short RNAs from the nine cell lines of the ENCODE project, irrespective of their annotation status, were analyzed for genomic loci representing differential or coherent processing. We observed differential processing predominantly in RNAs annotated as miRNA, snoRNA or tRNA. Four out of five known cases of differentially processed miRNAs that were in the input dataset were recovered and several novel cases were discovered. In contrast to differential processing, coherent processing is observed widespread in both annotated and unannotated regions. While the annotated loci predominantly consist of ~24 nt short RNAs, the unannotated loci comparatively consist of ~17 nt short RNAs. Furthermore, these ~17 nt short RNAs are significantly enriched for overlap to transcription start sites and DNase I hypersensitive sites (p-value < 0.01) that are characteristic features of transcription initiation RNAs. We discuss how the computational pipeline developed in this study has the potential to be applied to other forms of RNA-seq data for further transcriptome-wide studies of differential and coherent processing.

No MeSH data available.


Related in: MedlinePlus