Limits...
miRspring: a compact standalone research tool for analyzing miRNA-seq data.

Humphreys DT, Suter CM - Nucleic Acids Res. (2013)

Bottom Line: Additionally, we report on a new class of miRNA variants, which we term seed-isomiRs, identified through the novel visualization tools of the miRspring document.Further investigation identified that ∼30% of human miRBase entries are likely to have a seed-isomiR.We believe that miRspring will be a highly useful research tool that will enhance the analysis of miRNA data sets and thus increase our understanding of miRNA biology.

View Article: PubMed Central - PubMed

Affiliation: Molecular Genetics Division, Victor Chang Cardiac Research Institute, Darlinghurst, New South Wales, 2010, Australia and St Vincent's Clinical School, University of New South Wales, 2052, Australia.

ABSTRACT
High-throughput sequencing for microRNA (miRNA) profiling has revealed a vast complexity of miRNA processing variants, but these are difficult to discern for those without bioinformatics expertise and large computing capability. In this article, we present miRNA Sequence Profiling (miRspring) (http://mirspring.victorchang.edu.au), a software solution that creates a small portable research document that visualizes, calculates and reports on the complexities of miRNA processing. We designed an index-compression algorithm that allows the miRspring document to reproduce a complete miRNA sequence data set while retaining a small file size (typically <3 MB). Through analysis of 73 public data sets, we demonstrate miRspring's features in assessing quality parameters, miRNA cluster expression levels and miRNA processing. Additionally, we report on a new class of miRNA variants, which we term seed-isomiRs, identified through the novel visualization tools of the miRspring document. Further investigation identified that ∼30% of human miRBase entries are likely to have a seed-isomiR. We believe that miRspring will be a highly useful research tool that will enhance the analysis of miRNA data sets and thus increase our understanding of miRNA biology.

Show MeSH

Related in: MedlinePlus

Quality control visualization parameters provided by the miRspring document. (A) Cumulative distribution of miRNAs for a typical data set. (B) Examination of numerous data sets identified that the most abundant miRNA represented <35% of all reads. Data sets where the majority of sequence tags are taken by a single miRNA as in (C) have to be treated with caution, as any low abundant miRNAs are poorly represented. (D) In individual data set, we noticed that the less abundant miRNAs have a large distribution in length, whereas abundant miRNAs have a more uniform length. Additionally, we also noticed low abundant miRNAs are not processed as defined in miRBase and therefore considered non-canonically processed (E). Furthermore, more of the low abundant miRNAs tend to have 1 nt mismatches (F). Average, (G) length, (H) non-cannonical processing and (I) mismatches for each rank centile were calculated from the analyzed data sets.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3753622&req=5

gkt485-F2: Quality control visualization parameters provided by the miRspring document. (A) Cumulative distribution of miRNAs for a typical data set. (B) Examination of numerous data sets identified that the most abundant miRNA represented <35% of all reads. Data sets where the majority of sequence tags are taken by a single miRNA as in (C) have to be treated with caution, as any low abundant miRNAs are poorly represented. (D) In individual data set, we noticed that the less abundant miRNAs have a large distribution in length, whereas abundant miRNAs have a more uniform length. Additionally, we also noticed low abundant miRNAs are not processed as defined in miRBase and therefore considered non-canonically processed (E). Furthermore, more of the low abundant miRNAs tend to have 1 nt mismatches (F). Average, (G) length, (H) non-cannonical processing and (I) mismatches for each rank centile were calculated from the analyzed data sets.

Mentions: A number of reports have highlighted systematic transcript selection biases introduced into library preparations (16–18), highlighting the importance of using the same preparative method when comparing different data sets. However, to our knowledge, there are no quality control measures that reflect the efficiency of individual library preparations. In examining numerous miRspring documents generated from different library preparations and sequenced on different sequencing platforms, we identified a number of parameters that reflect the efficiency, variation and quality of those preparations. From the miRspring cumulative distribution XY-scatter plot, we noticed that in most data sets, a small number (<50) of miRNAs contribute to a large portion of miRBase mapped tags (Figure 2A). In the majority of data sets analyzed, the most abundant miRNA represented <35% of all miRBase mapped tags (Figure 2B). This was an informative parameter, as in a small number of data sets, the most abundant miRNA was >50%, which suggests that low abundant miRNAs may have been poorly sampled (Figure 2C).Figure 2.


miRspring: a compact standalone research tool for analyzing miRNA-seq data.

Humphreys DT, Suter CM - Nucleic Acids Res. (2013)

Quality control visualization parameters provided by the miRspring document. (A) Cumulative distribution of miRNAs for a typical data set. (B) Examination of numerous data sets identified that the most abundant miRNA represented <35% of all reads. Data sets where the majority of sequence tags are taken by a single miRNA as in (C) have to be treated with caution, as any low abundant miRNAs are poorly represented. (D) In individual data set, we noticed that the less abundant miRNAs have a large distribution in length, whereas abundant miRNAs have a more uniform length. Additionally, we also noticed low abundant miRNAs are not processed as defined in miRBase and therefore considered non-canonically processed (E). Furthermore, more of the low abundant miRNAs tend to have 1 nt mismatches (F). Average, (G) length, (H) non-cannonical processing and (I) mismatches for each rank centile were calculated from the analyzed data sets.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3753622&req=5

gkt485-F2: Quality control visualization parameters provided by the miRspring document. (A) Cumulative distribution of miRNAs for a typical data set. (B) Examination of numerous data sets identified that the most abundant miRNA represented <35% of all reads. Data sets where the majority of sequence tags are taken by a single miRNA as in (C) have to be treated with caution, as any low abundant miRNAs are poorly represented. (D) In individual data set, we noticed that the less abundant miRNAs have a large distribution in length, whereas abundant miRNAs have a more uniform length. Additionally, we also noticed low abundant miRNAs are not processed as defined in miRBase and therefore considered non-canonically processed (E). Furthermore, more of the low abundant miRNAs tend to have 1 nt mismatches (F). Average, (G) length, (H) non-cannonical processing and (I) mismatches for each rank centile were calculated from the analyzed data sets.
Mentions: A number of reports have highlighted systematic transcript selection biases introduced into library preparations (16–18), highlighting the importance of using the same preparative method when comparing different data sets. However, to our knowledge, there are no quality control measures that reflect the efficiency of individual library preparations. In examining numerous miRspring documents generated from different library preparations and sequenced on different sequencing platforms, we identified a number of parameters that reflect the efficiency, variation and quality of those preparations. From the miRspring cumulative distribution XY-scatter plot, we noticed that in most data sets, a small number (<50) of miRNAs contribute to a large portion of miRBase mapped tags (Figure 2A). In the majority of data sets analyzed, the most abundant miRNA represented <35% of all miRBase mapped tags (Figure 2B). This was an informative parameter, as in a small number of data sets, the most abundant miRNA was >50%, which suggests that low abundant miRNAs may have been poorly sampled (Figure 2C).Figure 2.

Bottom Line: Additionally, we report on a new class of miRNA variants, which we term seed-isomiRs, identified through the novel visualization tools of the miRspring document.Further investigation identified that ∼30% of human miRBase entries are likely to have a seed-isomiR.We believe that miRspring will be a highly useful research tool that will enhance the analysis of miRNA data sets and thus increase our understanding of miRNA biology.

View Article: PubMed Central - PubMed

Affiliation: Molecular Genetics Division, Victor Chang Cardiac Research Institute, Darlinghurst, New South Wales, 2010, Australia and St Vincent's Clinical School, University of New South Wales, 2052, Australia.

ABSTRACT
High-throughput sequencing for microRNA (miRNA) profiling has revealed a vast complexity of miRNA processing variants, but these are difficult to discern for those without bioinformatics expertise and large computing capability. In this article, we present miRNA Sequence Profiling (miRspring) (http://mirspring.victorchang.edu.au), a software solution that creates a small portable research document that visualizes, calculates and reports on the complexities of miRNA processing. We designed an index-compression algorithm that allows the miRspring document to reproduce a complete miRNA sequence data set while retaining a small file size (typically <3 MB). Through analysis of 73 public data sets, we demonstrate miRspring's features in assessing quality parameters, miRNA cluster expression levels and miRNA processing. Additionally, we report on a new class of miRNA variants, which we term seed-isomiRs, identified through the novel visualization tools of the miRspring document. Further investigation identified that ∼30% of human miRBase entries are likely to have a seed-isomiR. We believe that miRspring will be a highly useful research tool that will enhance the analysis of miRNA data sets and thus increase our understanding of miRNA biology.

Show MeSH
Related in: MedlinePlus