Limits...
Analysis of antisense expression by whole genome tiling microarrays and siRNAs suggests mis-annotation of Arabidopsis orphan protein-coding genes.

Richardson CR, Luo QJ, Gontcharova V, Jiang YW, Samanta M, Youn E, Rock CD - PLoS ONE (2010)

Bottom Line: We hypothesize that antisense transcription and associated smRNAs are biomarkers which can be computationally modeled for gene discovery.We explored rice (Oryza sativa) sense and antisense gene expression in publicly available whole genome tiling array transcriptome data and sequenced smRNA libraries (as well as C. elegans) and found evidence of transitivity of MIRNA genes similar to that found in Arabidopsis.Consistent with this hypothesis, we found novel Arabidopsis homologues of some MIRNA genes on the antisense strand of previously annotated protein-coding genes.

View Article: PubMed Central - PubMed

Affiliation: Department of Biological Sciences, Texas Tech University, Lubbock, Texas, USA.

ABSTRACT

Background: MicroRNAs (miRNAs) and trans-acting small-interfering RNAs (tasi-RNAs) are small (20-22 nt long) RNAs (smRNAs) generated from hairpin secondary structures or antisense transcripts, respectively, that regulate gene expression by Watson-Crick pairing to a target mRNA and altering expression by mechanisms related to RNA interference. The high sequence homology of plant miRNAs to their targets has been the mainstay of miRNA prediction algorithms, which are limited in their predictive power for other kingdoms because miRNA complementarity is less conserved yet transitive processes (production of antisense smRNAs) are active in eukaryotes. We hypothesize that antisense transcription and associated smRNAs are biomarkers which can be computationally modeled for gene discovery.

Principal findings: We explored rice (Oryza sativa) sense and antisense gene expression in publicly available whole genome tiling array transcriptome data and sequenced smRNA libraries (as well as C. elegans) and found evidence of transitivity of MIRNA genes similar to that found in Arabidopsis. Statistical analysis of antisense transcript abundances, presence of antisense ESTs, and association with smRNAs suggests several hundred Arabidopsis 'orphan' hypothetical genes are non-coding RNAs. Consistent with this hypothesis, we found novel Arabidopsis homologues of some MIRNA genes on the antisense strand of previously annotated protein-coding genes. A Support Vector Machine (SVM) was applied using thermodynamic energy of binding plus novel expression features of sense/antisense transcription topology and siRNA abundances to build a prediction model of miRNA targets. The SVM when trained on targets could predict the "ancient" (deeply conserved) class of validated Arabidopsis MIRNA genes with an accuracy of 84%, and 76% for "new" rapidly-evolving MIRNA genes.

Conclusions: Antisense and smRNA expression features and computational methods may identify novel MIRNA genes and other non-coding RNAs in plants and potentially other kingdoms, which can provide insight into antisense transcription, miRNA evolution, and post-transcriptional gene regulation.

Show MeSH
Normalized average percentage expression levels for 93 “ancient” (22 families) (A) and 68 recently evolved “new” (64 families) MIRNA genes (B), with miRNA* position as “0”.Sense strand is colored red and antisense blue. Note the abundant antisense signals mapping at or upstream to miRNA* sites (small arrow), and downstream sense signals for ancient MIRNA genes (large arrowhead) similar to miRNA target genes previously described [14]. See Datafile S2 for details.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2877095&req=5

pone-0010710-g002: Normalized average percentage expression levels for 93 “ancient” (22 families) (A) and 68 recently evolved “new” (64 families) MIRNA genes (B), with miRNA* position as “0”.Sense strand is colored red and antisense blue. Note the abundant antisense signals mapping at or upstream to miRNA* sites (small arrow), and downstream sense signals for ancient MIRNA genes (large arrowhead) similar to miRNA target genes previously described [14]. See Datafile S2 for details.

Mentions: We further mapped Arabidopsis whole genome tiling array sense and antisense transcript signals to 93 “ancient” MIRNA genes (those with at least one homolog in other distant plant species (27 families) and compared average normalized signal topology with 68 recently evolved “new” MIRNA genes (64 families) [77]–[79] by adding signals at each position of the data (Figure 2; Datafile S2). Ancient MIRNA genes had more abundant transcript signals on both sense and antisense strands, especially on the region of 200 n.t. upstream and downstream (relative to sense strand) of the miRNA* position (normalized expression > = 2.0, Figure 2A, arrows), whereas “new” MIRNA transcripts are not clearly evident above noise except for a peak signal precisely at the miRNA* position (normalized expression  = ∼1.2, Figure 2B, arrow). It is apparent that the ancient MIRNA genes have a ‘ping-pong-like’ expression topology (downstream sense, upstream antisense; Figure 2A, arrows) similar to that previously described for miRNA target mRNAs [14]. In order to extend the analysis to rice tiling array data, we analyzed whole genome tiling array signals for Arabidopsis and rice that had perfect matches to mature miRNAs, miRNAs*, siRNAs (17 nt reads from MPSS data [86]), and to probes mapping to other regions of the cognate hairpin. The results are shown in Table 1. Relative to the previously established signal cutoff of log2 >0.73 based on background signals from probes for both strands of promoters of ∼4,600 verified Arabidopsis genes [82], it is apparent that Arabidopsis MIRNA hairpin expression was low for most probes. Consistent with Figure 2A (upstream of miRNA* site), there was significantly more sense and antisense signals associated with miRNAs than elsewhere in the hairpins (Table 1; Datafile S3). For rice whole tiling array data there was higher signal associated with sense strand of miRNAs and antisense strand of miRNA* (Table 1), consistent with Arabidopsis data (Figure 2A), but the differences compared to other regions of the hairpin were not statistically significant and the rice tiling array data were not considered further. Supplemental Figures S2, S3, Supplemental Table S1, and Datafiles S4 and S5 document the quality of tiling array data by analyzing signal to noise ratios of ribosomal genes.


Analysis of antisense expression by whole genome tiling microarrays and siRNAs suggests mis-annotation of Arabidopsis orphan protein-coding genes.

Richardson CR, Luo QJ, Gontcharova V, Jiang YW, Samanta M, Youn E, Rock CD - PLoS ONE (2010)

Normalized average percentage expression levels for 93 “ancient” (22 families) (A) and 68 recently evolved “new” (64 families) MIRNA genes (B), with miRNA* position as “0”.Sense strand is colored red and antisense blue. Note the abundant antisense signals mapping at or upstream to miRNA* sites (small arrow), and downstream sense signals for ancient MIRNA genes (large arrowhead) similar to miRNA target genes previously described [14]. See Datafile S2 for details.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2877095&req=5

pone-0010710-g002: Normalized average percentage expression levels for 93 “ancient” (22 families) (A) and 68 recently evolved “new” (64 families) MIRNA genes (B), with miRNA* position as “0”.Sense strand is colored red and antisense blue. Note the abundant antisense signals mapping at or upstream to miRNA* sites (small arrow), and downstream sense signals for ancient MIRNA genes (large arrowhead) similar to miRNA target genes previously described [14]. See Datafile S2 for details.
Mentions: We further mapped Arabidopsis whole genome tiling array sense and antisense transcript signals to 93 “ancient” MIRNA genes (those with at least one homolog in other distant plant species (27 families) and compared average normalized signal topology with 68 recently evolved “new” MIRNA genes (64 families) [77]–[79] by adding signals at each position of the data (Figure 2; Datafile S2). Ancient MIRNA genes had more abundant transcript signals on both sense and antisense strands, especially on the region of 200 n.t. upstream and downstream (relative to sense strand) of the miRNA* position (normalized expression > = 2.0, Figure 2A, arrows), whereas “new” MIRNA transcripts are not clearly evident above noise except for a peak signal precisely at the miRNA* position (normalized expression  = ∼1.2, Figure 2B, arrow). It is apparent that the ancient MIRNA genes have a ‘ping-pong-like’ expression topology (downstream sense, upstream antisense; Figure 2A, arrows) similar to that previously described for miRNA target mRNAs [14]. In order to extend the analysis to rice tiling array data, we analyzed whole genome tiling array signals for Arabidopsis and rice that had perfect matches to mature miRNAs, miRNAs*, siRNAs (17 nt reads from MPSS data [86]), and to probes mapping to other regions of the cognate hairpin. The results are shown in Table 1. Relative to the previously established signal cutoff of log2 >0.73 based on background signals from probes for both strands of promoters of ∼4,600 verified Arabidopsis genes [82], it is apparent that Arabidopsis MIRNA hairpin expression was low for most probes. Consistent with Figure 2A (upstream of miRNA* site), there was significantly more sense and antisense signals associated with miRNAs than elsewhere in the hairpins (Table 1; Datafile S3). For rice whole tiling array data there was higher signal associated with sense strand of miRNAs and antisense strand of miRNA* (Table 1), consistent with Arabidopsis data (Figure 2A), but the differences compared to other regions of the hairpin were not statistically significant and the rice tiling array data were not considered further. Supplemental Figures S2, S3, Supplemental Table S1, and Datafiles S4 and S5 document the quality of tiling array data by analyzing signal to noise ratios of ribosomal genes.

Bottom Line: We hypothesize that antisense transcription and associated smRNAs are biomarkers which can be computationally modeled for gene discovery.We explored rice (Oryza sativa) sense and antisense gene expression in publicly available whole genome tiling array transcriptome data and sequenced smRNA libraries (as well as C. elegans) and found evidence of transitivity of MIRNA genes similar to that found in Arabidopsis.Consistent with this hypothesis, we found novel Arabidopsis homologues of some MIRNA genes on the antisense strand of previously annotated protein-coding genes.

View Article: PubMed Central - PubMed

Affiliation: Department of Biological Sciences, Texas Tech University, Lubbock, Texas, USA.

ABSTRACT

Background: MicroRNAs (miRNAs) and trans-acting small-interfering RNAs (tasi-RNAs) are small (20-22 nt long) RNAs (smRNAs) generated from hairpin secondary structures or antisense transcripts, respectively, that regulate gene expression by Watson-Crick pairing to a target mRNA and altering expression by mechanisms related to RNA interference. The high sequence homology of plant miRNAs to their targets has been the mainstay of miRNA prediction algorithms, which are limited in their predictive power for other kingdoms because miRNA complementarity is less conserved yet transitive processes (production of antisense smRNAs) are active in eukaryotes. We hypothesize that antisense transcription and associated smRNAs are biomarkers which can be computationally modeled for gene discovery.

Principal findings: We explored rice (Oryza sativa) sense and antisense gene expression in publicly available whole genome tiling array transcriptome data and sequenced smRNA libraries (as well as C. elegans) and found evidence of transitivity of MIRNA genes similar to that found in Arabidopsis. Statistical analysis of antisense transcript abundances, presence of antisense ESTs, and association with smRNAs suggests several hundred Arabidopsis 'orphan' hypothetical genes are non-coding RNAs. Consistent with this hypothesis, we found novel Arabidopsis homologues of some MIRNA genes on the antisense strand of previously annotated protein-coding genes. A Support Vector Machine (SVM) was applied using thermodynamic energy of binding plus novel expression features of sense/antisense transcription topology and siRNA abundances to build a prediction model of miRNA targets. The SVM when trained on targets could predict the "ancient" (deeply conserved) class of validated Arabidopsis MIRNA genes with an accuracy of 84%, and 76% for "new" rapidly-evolving MIRNA genes.

Conclusions: Antisense and smRNA expression features and computational methods may identify novel MIRNA genes and other non-coding RNAs in plants and potentially other kingdoms, which can provide insight into antisense transcription, miRNA evolution, and post-transcriptional gene regulation.

Show MeSH