Limits...
Biocomputational prediction of non-coding RNAs in model cyanobacteria.

Voss B, Georg J, Schön V, Ude S, Hess WR - BMC Genomics (2009)

Bottom Line: Removing also transposon-associated repeats, finally 78, 53, 42 and 168 sequences, respectively, are left belonging to 109 different clusters in the data set.Yfr2a promoter-luxAB fusions confirmed a very strong activity of this promoter and indicated a stimulation of expression if the cultures were exposed to elevated light intensities.Modelling of RNA secondary structures indicated two conserved single-stranded sequence motifs that might be involved in RNA-protein interactions or in the recognition of target RNAs.

View Article: PubMed Central - HTML - PubMed

Affiliation: University of Freiburg, Faculty of Biology, Genetics and Experimental Bioinformatics, Freiburg, Germany. bjoern.voss@biologie.uni-freiburg.de

ABSTRACT

Background: In bacteria, non-coding RNAs (ncRNA) are crucial regulators of gene expression, controlling various stress responses, virulence, and motility. Previous work revealed a relatively high number of ncRNAs in some marine cyanobacteria. However, for efficient genetic and biochemical analysis it would be desirable to identify a set of ncRNA candidate genes in model cyanobacteria that are easy to manipulate and for which extended mutant, transcriptomic and proteomic data sets are available.

Results: Here we have used comparative genome analysis for the biocomputational prediction of ncRNA genes and other sequence/structure-conserved elements in intergenic regions of the three unicellular model cyanobacteria Synechocystis PCC6803, Synechococcus elongatus PCC6301 and Thermosynechococcus elongatus BP1 plus the toxic Microcystis aeruginosa NIES843. The unfiltered numbers of predicted elements in these strains is 383, 168, 168, and 809, respectively, combined into 443 sequence clusters, whereas the numbers of individual elements with high support are 94, 56, 64, and 406, respectively. Removing also transposon-associated repeats, finally 78, 53, 42 and 168 sequences, respectively, are left belonging to 109 different clusters in the data set. Experimental analysis of selected ncRNA candidates in Synechocystis PCC6803 validated new ncRNAs originating from the fabF-hoxH and apcC-prmA intergenic spacers and three highly expressed ncRNAs belonging to the Yfr2 family of ncRNAs. Yfr2a promoter-luxAB fusions confirmed a very strong activity of this promoter and indicated a stimulation of expression if the cultures were exposed to elevated light intensities.

Conclusion: Comparison to entries in Rfam and experimental testing of selected ncRNA candidates in Synechocystis PCC6803 indicate a high reliability of the current prediction, despite some contamination by the high number of repetitive sequences in some of these species. In particular, we identified in the four species altogether 8 new ncRNA homologs belonging to the Yfr2 family of ncRNAs. Modelling of RNA secondary structures indicated two conserved single-stranded sequence motifs that might be involved in RNA-protein interactions or in the recognition of target RNAs. Since our analysis has been restricted to find ncRNA candidates with a reasonable high degree of conservation among these four cyanobacteria, there might be many more, requiring direct experimental approaches for their identification.

Show MeSH

Related in: MedlinePlus

Experimental verification of two differently scoring ncRNAs by Northern hybridization. A. The element predicted in cluster 159 is transcribed from the forward strand (green), in the same direction as the preceding fabF (slr1332) gene. Six 5'RACE sequences support the transcript start of SyR1 to be located 55 nt 3' of the fabF reading frame at position 1671919 (grey arrow). Two different blots are shown, one in which RNA was separated in an high-resolution polyacrylamide gel and one resulting from an agarose gel. B. The element predicted with the CLID 294 was named SyR2. This ncRNA is longer than the IGR where it is encoded (~140 nt versus 94 nt spacer length). A transcript start was found by 5' RACE within apcC, 49 nt before the end of apcC (grey arrow). The schemes are drawn to scale. All protein-coding genes are displayed in gray, all ncRNA genes in green; M, molecular mass marker, R, lane in the RNA gel before blotting, H, hybridization.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2662882&req=5

Figure 4: Experimental verification of two differently scoring ncRNAs by Northern hybridization. A. The element predicted in cluster 159 is transcribed from the forward strand (green), in the same direction as the preceding fabF (slr1332) gene. Six 5'RACE sequences support the transcript start of SyR1 to be located 55 nt 3' of the fabF reading frame at position 1671919 (grey arrow). Two different blots are shown, one in which RNA was separated in an high-resolution polyacrylamide gel and one resulting from an agarose gel. B. The element predicted with the CLID 294 was named SyR2. This ncRNA is longer than the IGR where it is encoded (~140 nt versus 94 nt spacer length). A transcript start was found by 5' RACE within apcC, 49 nt before the end of apcC (grey arrow). The schemes are drawn to scale. All protein-coding genes are displayed in gray, all ncRNA genes in green; M, molecular mass marker, R, lane in the RNA gel before blotting, H, hybridization.

Mentions: For exemplary experimental verification of predicted ncRNA genes we chose two very different examples, one well-supported candidate with three members from cluster 159 (probability 0.933 and Z-score -2.00; Table 1) and one from cluster 294 (probability 1.0 and Z-score -2.64; Table 1). Northern hybridization of total RNA from Synechocystis using strand-specific RNA probes confirmed the existence of both ncRNAs (Fig. 4). Since we verified the existence of both ncRNAs experimentally, we decided to name these two ncRNAs SyR1 and SyR2, for Synechocystis ncRNA 1 and 2. SyR1 is a strongly accumulating ncRNA transcribed from a gene in the fabF – hoxH IGR in the forward direction as the preceding gene fabF (Fig. 4A). The syr1 gene corresponds with a length of ~130 nt to about two thirds of the fabF-hoxH intergenic spacer (length 206 nt). Judged by Northern hybridization, there was no evidence for a possible cotranscription with fabF. The element predicted with the CLID 294 is located 3' to a protein-coding gene, too, and is transcribed from the forward strand in Synechocystis 6803. SyR2 is an ~140 nt ncRNA transcribed from a gene in the apcC (ssr3383) – prmA (sll1909) IGR in the same forward direction as the preceding gene apcC. SyR2 is accumulated to rather high amounts, too, but these appeared lower than in case of SyR1 (Fig. 4B). The preceding apcC gene (ssr3383) encodes a short phycobilisome LC linker polypeptide and is the ultimate gene of a three-gene operon for phycobiliproteins. Cotranscription between this operon and SyR2 cannot be excluded unambiguously. However, a SyR2 transcript start was mapped within apcC, 49 nt before the end of the reading frame. This fact is less exotic than it seems. At the expected spacing six nt upstream, the transcript start is preceded by a regular TATA element (CAAAAT). Moreover, several examples indicate the location of ncRNA promoters within the protein-coding part of a gene: Transcription of the ssrS gene for 6S RNA in E. coli is initiated at two promoters, from these the distally located promoter P2 responds to σ70 and σS RNA polymerase holoenzymes and is located within the ygfE reading frame [35,36]. An example from Synechocystis is provided with IsrR, the antisense RNA that is initiated from within the gene isiA, although from the reverse complementary strand [18].


Biocomputational prediction of non-coding RNAs in model cyanobacteria.

Voss B, Georg J, Schön V, Ude S, Hess WR - BMC Genomics (2009)

Experimental verification of two differently scoring ncRNAs by Northern hybridization. A. The element predicted in cluster 159 is transcribed from the forward strand (green), in the same direction as the preceding fabF (slr1332) gene. Six 5'RACE sequences support the transcript start of SyR1 to be located 55 nt 3' of the fabF reading frame at position 1671919 (grey arrow). Two different blots are shown, one in which RNA was separated in an high-resolution polyacrylamide gel and one resulting from an agarose gel. B. The element predicted with the CLID 294 was named SyR2. This ncRNA is longer than the IGR where it is encoded (~140 nt versus 94 nt spacer length). A transcript start was found by 5' RACE within apcC, 49 nt before the end of apcC (grey arrow). The schemes are drawn to scale. All protein-coding genes are displayed in gray, all ncRNA genes in green; M, molecular mass marker, R, lane in the RNA gel before blotting, H, hybridization.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2662882&req=5

Figure 4: Experimental verification of two differently scoring ncRNAs by Northern hybridization. A. The element predicted in cluster 159 is transcribed from the forward strand (green), in the same direction as the preceding fabF (slr1332) gene. Six 5'RACE sequences support the transcript start of SyR1 to be located 55 nt 3' of the fabF reading frame at position 1671919 (grey arrow). Two different blots are shown, one in which RNA was separated in an high-resolution polyacrylamide gel and one resulting from an agarose gel. B. The element predicted with the CLID 294 was named SyR2. This ncRNA is longer than the IGR where it is encoded (~140 nt versus 94 nt spacer length). A transcript start was found by 5' RACE within apcC, 49 nt before the end of apcC (grey arrow). The schemes are drawn to scale. All protein-coding genes are displayed in gray, all ncRNA genes in green; M, molecular mass marker, R, lane in the RNA gel before blotting, H, hybridization.
Mentions: For exemplary experimental verification of predicted ncRNA genes we chose two very different examples, one well-supported candidate with three members from cluster 159 (probability 0.933 and Z-score -2.00; Table 1) and one from cluster 294 (probability 1.0 and Z-score -2.64; Table 1). Northern hybridization of total RNA from Synechocystis using strand-specific RNA probes confirmed the existence of both ncRNAs (Fig. 4). Since we verified the existence of both ncRNAs experimentally, we decided to name these two ncRNAs SyR1 and SyR2, for Synechocystis ncRNA 1 and 2. SyR1 is a strongly accumulating ncRNA transcribed from a gene in the fabF – hoxH IGR in the forward direction as the preceding gene fabF (Fig. 4A). The syr1 gene corresponds with a length of ~130 nt to about two thirds of the fabF-hoxH intergenic spacer (length 206 nt). Judged by Northern hybridization, there was no evidence for a possible cotranscription with fabF. The element predicted with the CLID 294 is located 3' to a protein-coding gene, too, and is transcribed from the forward strand in Synechocystis 6803. SyR2 is an ~140 nt ncRNA transcribed from a gene in the apcC (ssr3383) – prmA (sll1909) IGR in the same forward direction as the preceding gene apcC. SyR2 is accumulated to rather high amounts, too, but these appeared lower than in case of SyR1 (Fig. 4B). The preceding apcC gene (ssr3383) encodes a short phycobilisome LC linker polypeptide and is the ultimate gene of a three-gene operon for phycobiliproteins. Cotranscription between this operon and SyR2 cannot be excluded unambiguously. However, a SyR2 transcript start was mapped within apcC, 49 nt before the end of the reading frame. This fact is less exotic than it seems. At the expected spacing six nt upstream, the transcript start is preceded by a regular TATA element (CAAAAT). Moreover, several examples indicate the location of ncRNA promoters within the protein-coding part of a gene: Transcription of the ssrS gene for 6S RNA in E. coli is initiated at two promoters, from these the distally located promoter P2 responds to σ70 and σS RNA polymerase holoenzymes and is located within the ygfE reading frame [35,36]. An example from Synechocystis is provided with IsrR, the antisense RNA that is initiated from within the gene isiA, although from the reverse complementary strand [18].

Bottom Line: Removing also transposon-associated repeats, finally 78, 53, 42 and 168 sequences, respectively, are left belonging to 109 different clusters in the data set.Yfr2a promoter-luxAB fusions confirmed a very strong activity of this promoter and indicated a stimulation of expression if the cultures were exposed to elevated light intensities.Modelling of RNA secondary structures indicated two conserved single-stranded sequence motifs that might be involved in RNA-protein interactions or in the recognition of target RNAs.

View Article: PubMed Central - HTML - PubMed

Affiliation: University of Freiburg, Faculty of Biology, Genetics and Experimental Bioinformatics, Freiburg, Germany. bjoern.voss@biologie.uni-freiburg.de

ABSTRACT

Background: In bacteria, non-coding RNAs (ncRNA) are crucial regulators of gene expression, controlling various stress responses, virulence, and motility. Previous work revealed a relatively high number of ncRNAs in some marine cyanobacteria. However, for efficient genetic and biochemical analysis it would be desirable to identify a set of ncRNA candidate genes in model cyanobacteria that are easy to manipulate and for which extended mutant, transcriptomic and proteomic data sets are available.

Results: Here we have used comparative genome analysis for the biocomputational prediction of ncRNA genes and other sequence/structure-conserved elements in intergenic regions of the three unicellular model cyanobacteria Synechocystis PCC6803, Synechococcus elongatus PCC6301 and Thermosynechococcus elongatus BP1 plus the toxic Microcystis aeruginosa NIES843. The unfiltered numbers of predicted elements in these strains is 383, 168, 168, and 809, respectively, combined into 443 sequence clusters, whereas the numbers of individual elements with high support are 94, 56, 64, and 406, respectively. Removing also transposon-associated repeats, finally 78, 53, 42 and 168 sequences, respectively, are left belonging to 109 different clusters in the data set. Experimental analysis of selected ncRNA candidates in Synechocystis PCC6803 validated new ncRNAs originating from the fabF-hoxH and apcC-prmA intergenic spacers and three highly expressed ncRNAs belonging to the Yfr2 family of ncRNAs. Yfr2a promoter-luxAB fusions confirmed a very strong activity of this promoter and indicated a stimulation of expression if the cultures were exposed to elevated light intensities.

Conclusion: Comparison to entries in Rfam and experimental testing of selected ncRNA candidates in Synechocystis PCC6803 indicate a high reliability of the current prediction, despite some contamination by the high number of repetitive sequences in some of these species. In particular, we identified in the four species altogether 8 new ncRNA homologs belonging to the Yfr2 family of ncRNAs. Modelling of RNA secondary structures indicated two conserved single-stranded sequence motifs that might be involved in RNA-protein interactions or in the recognition of target RNAs. Since our analysis has been restricted to find ncRNA candidates with a reasonable high degree of conservation among these four cyanobacteria, there might be many more, requiring direct experimental approaches for their identification.

Show MeSH
Related in: MedlinePlus