Limits...
Biocomputational prediction of non-coding RNAs in model cyanobacteria.

Voss B, Georg J, Schön V, Ude S, Hess WR - BMC Genomics (2009)

Bottom Line: Removing also transposon-associated repeats, finally 78, 53, 42 and 168 sequences, respectively, are left belonging to 109 different clusters in the data set.Yfr2a promoter-luxAB fusions confirmed a very strong activity of this promoter and indicated a stimulation of expression if the cultures were exposed to elevated light intensities.Modelling of RNA secondary structures indicated two conserved single-stranded sequence motifs that might be involved in RNA-protein interactions or in the recognition of target RNAs.

View Article: PubMed Central - HTML - PubMed

Affiliation: University of Freiburg, Faculty of Biology, Genetics and Experimental Bioinformatics, Freiburg, Germany. bjoern.voss@biologie.uni-freiburg.de

ABSTRACT

Background: In bacteria, non-coding RNAs (ncRNA) are crucial regulators of gene expression, controlling various stress responses, virulence, and motility. Previous work revealed a relatively high number of ncRNAs in some marine cyanobacteria. However, for efficient genetic and biochemical analysis it would be desirable to identify a set of ncRNA candidate genes in model cyanobacteria that are easy to manipulate and for which extended mutant, transcriptomic and proteomic data sets are available.

Results: Here we have used comparative genome analysis for the biocomputational prediction of ncRNA genes and other sequence/structure-conserved elements in intergenic regions of the three unicellular model cyanobacteria Synechocystis PCC6803, Synechococcus elongatus PCC6301 and Thermosynechococcus elongatus BP1 plus the toxic Microcystis aeruginosa NIES843. The unfiltered numbers of predicted elements in these strains is 383, 168, 168, and 809, respectively, combined into 443 sequence clusters, whereas the numbers of individual elements with high support are 94, 56, 64, and 406, respectively. Removing also transposon-associated repeats, finally 78, 53, 42 and 168 sequences, respectively, are left belonging to 109 different clusters in the data set. Experimental analysis of selected ncRNA candidates in Synechocystis PCC6803 validated new ncRNAs originating from the fabF-hoxH and apcC-prmA intergenic spacers and three highly expressed ncRNAs belonging to the Yfr2 family of ncRNAs. Yfr2a promoter-luxAB fusions confirmed a very strong activity of this promoter and indicated a stimulation of expression if the cultures were exposed to elevated light intensities.

Conclusion: Comparison to entries in Rfam and experimental testing of selected ncRNA candidates in Synechocystis PCC6803 indicate a high reliability of the current prediction, despite some contamination by the high number of repetitive sequences in some of these species. In particular, we identified in the four species altogether 8 new ncRNA homologs belonging to the Yfr2 family of ncRNAs. Modelling of RNA secondary structures indicated two conserved single-stranded sequence motifs that might be involved in RNA-protein interactions or in the recognition of target RNAs. Since our analysis has been restricted to find ncRNA candidates with a reasonable high degree of conservation among these four cyanobacteria, there might be many more, requiring direct experimental approaches for their identification.

Show MeSH

Related in: MedlinePlus

Types of predicted elements. A. The genomic location of predicted RNA element in cluster 80 and synteny around this element is shown. This element is slightly more likely to be transcribed from the forward strand as indicated by the direction of the arrow within the IGR. The length of the intergenic spacer is given in nt and homologous genes are colour-coded. In Synechococcus 6301 an ftrC gene has been inserted into this region relative to the other. The predicted consensus structure of the RNA element (bottom) consists of two stem-loops separated by a 17 nt single-stranded region. The degree of sequence conservation is colour-coded. B. Four of the five sequences in cluster 62 are located upstream of the groES operon. This region is known to contain the palindromic CIRCE element and indeed, this element constitutes a critical part of the conserved sequence and structure. The initiation site of transcription of the groES mRNA was mapped by 5' RACE to occur from the first G within the nine nt loop that is part of the CIRCE element (bold arrow). The fifth sequence has no CIRCE element but has been clustered into cluster 62 based on other sequence features. At the bottom right, the perfect conservation of the CIRCE element in the four compared cyanobacteria is shown.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2662882&req=5

Figure 3: Types of predicted elements. A. The genomic location of predicted RNA element in cluster 80 and synteny around this element is shown. This element is slightly more likely to be transcribed from the forward strand as indicated by the direction of the arrow within the IGR. The length of the intergenic spacer is given in nt and homologous genes are colour-coded. In Synechococcus 6301 an ftrC gene has been inserted into this region relative to the other. The predicted consensus structure of the RNA element (bottom) consists of two stem-loops separated by a 17 nt single-stranded region. The degree of sequence conservation is colour-coded. B. Four of the five sequences in cluster 62 are located upstream of the groES operon. This region is known to contain the palindromic CIRCE element and indeed, this element constitutes a critical part of the conserved sequence and structure. The initiation site of transcription of the groES mRNA was mapped by 5' RACE to occur from the first G within the nine nt loop that is part of the CIRCE element (bold arrow). The fifth sequence has no CIRCE element but has been clustered into cluster 62 based on other sequence features. At the bottom right, the perfect conservation of the CIRCE element in the four compared cyanobacteria is shown.

Mentions: The genomic location of a predicted ncRNA gene or RNA element in the same sequence neighbourhood in some or all of the studied cyanobacteria can also be a powerful tool for finding related ncRNAs. Among the 25 high-scoring sequence clusters in Table 1, 9 (36%) showed at least partial synteny. The high scoring element in cluster 80 illustrates this fact. The primary annotation gives no hint about the possible relatedness of the flanking genes. The flanking gene sufR annotated in Microcystis encodes an iron-sulfur cluster biosynthesis transcriptional regulator and similarity searches revealed that sll0088, syc2358d and sufR actually are orthologs of each other (Fig. 3A). Flanking the intergenic region with the predicted RNA element on the other side, genes ycf24 and sufB are clearly homologs of each other, whereas ftrC in Synechococcus is not. Yet, ftrC has been inserted in this genomic region as the proximate gene, syc2356_c, codes for the homolog of sufB and ycf24. Thus, the synteny among neighbouring genes clearly support the element predicted in cluster 80 as an orthologous RNA element between the three species. Other cases of partial synteny in flanking genes are observed in cluster 139 since trxA is present in 3 out of 4 cases and in cluster 216 with the orthologs speB (Synechocystis) and agmatinase (Microcystis), whereas all other genes are different. Special cases of synteny are exposed in cluster 207 (rpl10 leader), 149 (thiamine riboswitch upstream of thiC), 394 (rps2 leader) and 62 (upstream groES). These four examples represent structurally conserved sequence elements upstream of a protein-coding gene to whom they are functionally connected; among them one riboswitch and two ribosomal leaders, thus this position must be conserved. The fourth example, the element upstream of groES contains the palindromic CIRCE element (Fig. 3B) thought to bind the heat-shock repressor protein HrcA [33]. Here, we mapped the groES transcriptional start site to the first nt of the nine nt loop predicted by secondary structure analysis (Fig. 3B), confirming the previously determined start site [34]. These examples illustrate the variety of elements that become identified by our approach.


Biocomputational prediction of non-coding RNAs in model cyanobacteria.

Voss B, Georg J, Schön V, Ude S, Hess WR - BMC Genomics (2009)

Types of predicted elements. A. The genomic location of predicted RNA element in cluster 80 and synteny around this element is shown. This element is slightly more likely to be transcribed from the forward strand as indicated by the direction of the arrow within the IGR. The length of the intergenic spacer is given in nt and homologous genes are colour-coded. In Synechococcus 6301 an ftrC gene has been inserted into this region relative to the other. The predicted consensus structure of the RNA element (bottom) consists of two stem-loops separated by a 17 nt single-stranded region. The degree of sequence conservation is colour-coded. B. Four of the five sequences in cluster 62 are located upstream of the groES operon. This region is known to contain the palindromic CIRCE element and indeed, this element constitutes a critical part of the conserved sequence and structure. The initiation site of transcription of the groES mRNA was mapped by 5' RACE to occur from the first G within the nine nt loop that is part of the CIRCE element (bold arrow). The fifth sequence has no CIRCE element but has been clustered into cluster 62 based on other sequence features. At the bottom right, the perfect conservation of the CIRCE element in the four compared cyanobacteria is shown.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2662882&req=5

Figure 3: Types of predicted elements. A. The genomic location of predicted RNA element in cluster 80 and synteny around this element is shown. This element is slightly more likely to be transcribed from the forward strand as indicated by the direction of the arrow within the IGR. The length of the intergenic spacer is given in nt and homologous genes are colour-coded. In Synechococcus 6301 an ftrC gene has been inserted into this region relative to the other. The predicted consensus structure of the RNA element (bottom) consists of two stem-loops separated by a 17 nt single-stranded region. The degree of sequence conservation is colour-coded. B. Four of the five sequences in cluster 62 are located upstream of the groES operon. This region is known to contain the palindromic CIRCE element and indeed, this element constitutes a critical part of the conserved sequence and structure. The initiation site of transcription of the groES mRNA was mapped by 5' RACE to occur from the first G within the nine nt loop that is part of the CIRCE element (bold arrow). The fifth sequence has no CIRCE element but has been clustered into cluster 62 based on other sequence features. At the bottom right, the perfect conservation of the CIRCE element in the four compared cyanobacteria is shown.
Mentions: The genomic location of a predicted ncRNA gene or RNA element in the same sequence neighbourhood in some or all of the studied cyanobacteria can also be a powerful tool for finding related ncRNAs. Among the 25 high-scoring sequence clusters in Table 1, 9 (36%) showed at least partial synteny. The high scoring element in cluster 80 illustrates this fact. The primary annotation gives no hint about the possible relatedness of the flanking genes. The flanking gene sufR annotated in Microcystis encodes an iron-sulfur cluster biosynthesis transcriptional regulator and similarity searches revealed that sll0088, syc2358d and sufR actually are orthologs of each other (Fig. 3A). Flanking the intergenic region with the predicted RNA element on the other side, genes ycf24 and sufB are clearly homologs of each other, whereas ftrC in Synechococcus is not. Yet, ftrC has been inserted in this genomic region as the proximate gene, syc2356_c, codes for the homolog of sufB and ycf24. Thus, the synteny among neighbouring genes clearly support the element predicted in cluster 80 as an orthologous RNA element between the three species. Other cases of partial synteny in flanking genes are observed in cluster 139 since trxA is present in 3 out of 4 cases and in cluster 216 with the orthologs speB (Synechocystis) and agmatinase (Microcystis), whereas all other genes are different. Special cases of synteny are exposed in cluster 207 (rpl10 leader), 149 (thiamine riboswitch upstream of thiC), 394 (rps2 leader) and 62 (upstream groES). These four examples represent structurally conserved sequence elements upstream of a protein-coding gene to whom they are functionally connected; among them one riboswitch and two ribosomal leaders, thus this position must be conserved. The fourth example, the element upstream of groES contains the palindromic CIRCE element (Fig. 3B) thought to bind the heat-shock repressor protein HrcA [33]. Here, we mapped the groES transcriptional start site to the first nt of the nine nt loop predicted by secondary structure analysis (Fig. 3B), confirming the previously determined start site [34]. These examples illustrate the variety of elements that become identified by our approach.

Bottom Line: Removing also transposon-associated repeats, finally 78, 53, 42 and 168 sequences, respectively, are left belonging to 109 different clusters in the data set.Yfr2a promoter-luxAB fusions confirmed a very strong activity of this promoter and indicated a stimulation of expression if the cultures were exposed to elevated light intensities.Modelling of RNA secondary structures indicated two conserved single-stranded sequence motifs that might be involved in RNA-protein interactions or in the recognition of target RNAs.

View Article: PubMed Central - HTML - PubMed

Affiliation: University of Freiburg, Faculty of Biology, Genetics and Experimental Bioinformatics, Freiburg, Germany. bjoern.voss@biologie.uni-freiburg.de

ABSTRACT

Background: In bacteria, non-coding RNAs (ncRNA) are crucial regulators of gene expression, controlling various stress responses, virulence, and motility. Previous work revealed a relatively high number of ncRNAs in some marine cyanobacteria. However, for efficient genetic and biochemical analysis it would be desirable to identify a set of ncRNA candidate genes in model cyanobacteria that are easy to manipulate and for which extended mutant, transcriptomic and proteomic data sets are available.

Results: Here we have used comparative genome analysis for the biocomputational prediction of ncRNA genes and other sequence/structure-conserved elements in intergenic regions of the three unicellular model cyanobacteria Synechocystis PCC6803, Synechococcus elongatus PCC6301 and Thermosynechococcus elongatus BP1 plus the toxic Microcystis aeruginosa NIES843. The unfiltered numbers of predicted elements in these strains is 383, 168, 168, and 809, respectively, combined into 443 sequence clusters, whereas the numbers of individual elements with high support are 94, 56, 64, and 406, respectively. Removing also transposon-associated repeats, finally 78, 53, 42 and 168 sequences, respectively, are left belonging to 109 different clusters in the data set. Experimental analysis of selected ncRNA candidates in Synechocystis PCC6803 validated new ncRNAs originating from the fabF-hoxH and apcC-prmA intergenic spacers and three highly expressed ncRNAs belonging to the Yfr2 family of ncRNAs. Yfr2a promoter-luxAB fusions confirmed a very strong activity of this promoter and indicated a stimulation of expression if the cultures were exposed to elevated light intensities.

Conclusion: Comparison to entries in Rfam and experimental testing of selected ncRNA candidates in Synechocystis PCC6803 indicate a high reliability of the current prediction, despite some contamination by the high number of repetitive sequences in some of these species. In particular, we identified in the four species altogether 8 new ncRNA homologs belonging to the Yfr2 family of ncRNAs. Modelling of RNA secondary structures indicated two conserved single-stranded sequence motifs that might be involved in RNA-protein interactions or in the recognition of target RNAs. Since our analysis has been restricted to find ncRNA candidates with a reasonable high degree of conservation among these four cyanobacteria, there might be many more, requiring direct experimental approaches for their identification.

Show MeSH
Related in: MedlinePlus