Limits...
Stem-loop structures in prokaryotic genomes.

Petrillo M, Silvestro G, Di Nocera PP, Boccia A, Paolella G - BMC Genomics (2006)

Bottom Line: Relatively large stem-loops significantly contribute to the formation of more complex secondary structures, often important for the activity of sequence elements controlling gene expression.SLSs found in natural genomes are constantly more numerous and stable than those expected to randomly form in sequences of comparable size and composition.Some intergenic SLS regions are members of novel repeated sequence families.

View Article: PubMed Central - HTML - PubMed

Affiliation: CEINGE Biotecnologie Avanzate scarl Via Comunale Margherita 482, 80145 Napoli, Italy. petrillo@ceinge.unina.it

ABSTRACT

Background: Prediction of secondary structures in the expressed sequences of bacterial genomes allows to investigate spontaneous folding of the corresponding RNA. This is particularly relevant in untranslated mRNA regions, where base pairing is less affected by interactions with the translation machinery. Relatively large stem-loops significantly contribute to the formation of more complex secondary structures, often important for the activity of sequence elements controlling gene expression.

Results: Systematic analysis of the distribution of stem-loop structures (SLSs) in 40 wholly-sequenced bacterial genomes is presented. SLSs were searched as stems measuring at least 12 bp, bordering loops 5 to 100 nt in length. G-U pairing in the stems was allowed. SLSs found in natural genomes are constantly more numerous and stable than those expected to randomly form in sequences of comparable size and composition. The large majority of SLSs fall within protein-coding regions but enrichment of specific, non random, SLS sub-populations of higher stability was observed within the intergenic regions of the chromosomes of several species. In low-GC firmicutes, most higher stability intergenic SLSs resemble canonical rho-independent transcriptional terminators, but very frequently feature at the 5'-end an additional A-rich stretch complementary to the 3' uridines. In all species, a clearly biased SLS distribution was observed within the intergenic space, with most concentrating at the 3'-end side of flanking CDSs. Some intergenic SLS regions are members of novel repeated sequence families.

Conclusion: In depth analysis of SLS features and distribution in 40 different bacterial genomes showed the presence of non random populations of such structures in all species. Many of these structures are plausibly transcribed, and might be involved in the control of transcription termination, or might serve as RNA elements which can enhance either the stability or the turnover of cotranscribed mRNAs. Three previously undescribed families of repeated sequences were found in Yersiniae, Bordetellae and Enterococci.

Show MeSH

Related in: MedlinePlus

Classes of intergenic SLSs. Based on the orientation of flanking CDSs, higher stability intergenic SLSs (dG < -10 KCal/mole) have been sorted into three categories, as indicated at the bottom (see Results). The width of each stacked bar denotes the fraction of SLSs belonging to the three categories. The thickness of the bars is proportional to the cumulative sizes of IGRs (lengths below 25000 bp are not to scale, but are represented by a minimal bar width). Lines above bars represent the intergenic space, split by vertical dashes in three segments respectively corresponding, left to right, to the cumulative lengths of IGRs flanked by unidirectionally, divergently and convergently transcribed CDSs. According to the parameters adopted, no conv-IGS was found in the genome of M. genitalium (see row 13). Only IGRs ranging from 29 to 500 bp were taken into account, since smaller regions can not contain the shortest detectable SLSs, and bigger ones might derive from inaccurate genome annotation. Bacterial genomes are numbered 1 through to 40 as in Table 1.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1590033&req=5

Figure 6: Classes of intergenic SLSs. Based on the orientation of flanking CDSs, higher stability intergenic SLSs (dG < -10 KCal/mole) have been sorted into three categories, as indicated at the bottom (see Results). The width of each stacked bar denotes the fraction of SLSs belonging to the three categories. The thickness of the bars is proportional to the cumulative sizes of IGRs (lengths below 25000 bp are not to scale, but are represented by a minimal bar width). Lines above bars represent the intergenic space, split by vertical dashes in three segments respectively corresponding, left to right, to the cumulative lengths of IGRs flanked by unidirectionally, divergently and convergently transcribed CDSs. According to the parameters adopted, no conv-IGS was found in the genome of M. genitalium (see row 13). Only IGRs ranging from 29 to 500 bp were taken into account, since smaller regions can not contain the shortest detectable SLSs, and bigger ones might derive from inaccurate genome annotation. Bacterial genomes are numbered 1 through to 40 as in Table 1.

Mentions: The relative positions of higher stability SLSs within the IGRs were analyzed in all the species listed in Table 1. Based on the orientation of flanking CDSs, IGRs were combined (Fig. 6) to form three intergenic spaces (IGS): a) uni-IGS, between CDSs transcribed unidirectionally, i.e. along the same orientation; b) conv-IGS, between convergently transcribed CDSs; c) div-IGS, between divergently transcribed CDSs. SLSs falling within each intergenic space are accordingly referred to as uni-, conv- and div-SLSs. In all species uni-SLSs are the largest (around 60%) SLS fraction, but no enrichment is observed, as their number reflects the length of the uni-IGS. In contrast conv-SLSs, which represent 20 to 30% of total intergenic SLSs, are concentrated in a much smaller space, as the corresponding conv-IGS covers 8 to 12% of the overall intergenic space in practically all tested species. Conversely, div-IGS, which covers 25–35% of the intergenic space, only hosts about 10% of SLSs. A corollary of this distribution is that SLSs tend to favour, as a preferential location, the 3'- over the 5'- end of flanking CDSs. To test this hypothesis also on the uni-SLSs, a representative set of these regions were further sub-divided into three sub-regions corresponding to the two 50 base spans named left and right, respectively close to 3'- and 5'-ends of the flanking CDSs, and the remaining, variable length, intermediate subregion, named center. Short IGRs, which could not be split into appropriate subregions, were not included in the analysis. Similarly a small number of extremely long regions, which might derive from inaccurate genome annotation, were not used. The number of SLSs found in the described subregions (Fig. 7) shows that also the uni-SLSs clearly favour the 3'-end location: in the vast majority of species SLSs found within left subregions outnumber by 2- to 4-fold those found in the equally long right subregions.


Stem-loop structures in prokaryotic genomes.

Petrillo M, Silvestro G, Di Nocera PP, Boccia A, Paolella G - BMC Genomics (2006)

Classes of intergenic SLSs. Based on the orientation of flanking CDSs, higher stability intergenic SLSs (dG < -10 KCal/mole) have been sorted into three categories, as indicated at the bottom (see Results). The width of each stacked bar denotes the fraction of SLSs belonging to the three categories. The thickness of the bars is proportional to the cumulative sizes of IGRs (lengths below 25000 bp are not to scale, but are represented by a minimal bar width). Lines above bars represent the intergenic space, split by vertical dashes in three segments respectively corresponding, left to right, to the cumulative lengths of IGRs flanked by unidirectionally, divergently and convergently transcribed CDSs. According to the parameters adopted, no conv-IGS was found in the genome of M. genitalium (see row 13). Only IGRs ranging from 29 to 500 bp were taken into account, since smaller regions can not contain the shortest detectable SLSs, and bigger ones might derive from inaccurate genome annotation. Bacterial genomes are numbered 1 through to 40 as in Table 1.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1590033&req=5

Figure 6: Classes of intergenic SLSs. Based on the orientation of flanking CDSs, higher stability intergenic SLSs (dG < -10 KCal/mole) have been sorted into three categories, as indicated at the bottom (see Results). The width of each stacked bar denotes the fraction of SLSs belonging to the three categories. The thickness of the bars is proportional to the cumulative sizes of IGRs (lengths below 25000 bp are not to scale, but are represented by a minimal bar width). Lines above bars represent the intergenic space, split by vertical dashes in three segments respectively corresponding, left to right, to the cumulative lengths of IGRs flanked by unidirectionally, divergently and convergently transcribed CDSs. According to the parameters adopted, no conv-IGS was found in the genome of M. genitalium (see row 13). Only IGRs ranging from 29 to 500 bp were taken into account, since smaller regions can not contain the shortest detectable SLSs, and bigger ones might derive from inaccurate genome annotation. Bacterial genomes are numbered 1 through to 40 as in Table 1.
Mentions: The relative positions of higher stability SLSs within the IGRs were analyzed in all the species listed in Table 1. Based on the orientation of flanking CDSs, IGRs were combined (Fig. 6) to form three intergenic spaces (IGS): a) uni-IGS, between CDSs transcribed unidirectionally, i.e. along the same orientation; b) conv-IGS, between convergently transcribed CDSs; c) div-IGS, between divergently transcribed CDSs. SLSs falling within each intergenic space are accordingly referred to as uni-, conv- and div-SLSs. In all species uni-SLSs are the largest (around 60%) SLS fraction, but no enrichment is observed, as their number reflects the length of the uni-IGS. In contrast conv-SLSs, which represent 20 to 30% of total intergenic SLSs, are concentrated in a much smaller space, as the corresponding conv-IGS covers 8 to 12% of the overall intergenic space in practically all tested species. Conversely, div-IGS, which covers 25–35% of the intergenic space, only hosts about 10% of SLSs. A corollary of this distribution is that SLSs tend to favour, as a preferential location, the 3'- over the 5'- end of flanking CDSs. To test this hypothesis also on the uni-SLSs, a representative set of these regions were further sub-divided into three sub-regions corresponding to the two 50 base spans named left and right, respectively close to 3'- and 5'-ends of the flanking CDSs, and the remaining, variable length, intermediate subregion, named center. Short IGRs, which could not be split into appropriate subregions, were not included in the analysis. Similarly a small number of extremely long regions, which might derive from inaccurate genome annotation, were not used. The number of SLSs found in the described subregions (Fig. 7) shows that also the uni-SLSs clearly favour the 3'-end location: in the vast majority of species SLSs found within left subregions outnumber by 2- to 4-fold those found in the equally long right subregions.

Bottom Line: Relatively large stem-loops significantly contribute to the formation of more complex secondary structures, often important for the activity of sequence elements controlling gene expression.SLSs found in natural genomes are constantly more numerous and stable than those expected to randomly form in sequences of comparable size and composition.Some intergenic SLS regions are members of novel repeated sequence families.

View Article: PubMed Central - HTML - PubMed

Affiliation: CEINGE Biotecnologie Avanzate scarl Via Comunale Margherita 482, 80145 Napoli, Italy. petrillo@ceinge.unina.it

ABSTRACT

Background: Prediction of secondary structures in the expressed sequences of bacterial genomes allows to investigate spontaneous folding of the corresponding RNA. This is particularly relevant in untranslated mRNA regions, where base pairing is less affected by interactions with the translation machinery. Relatively large stem-loops significantly contribute to the formation of more complex secondary structures, often important for the activity of sequence elements controlling gene expression.

Results: Systematic analysis of the distribution of stem-loop structures (SLSs) in 40 wholly-sequenced bacterial genomes is presented. SLSs were searched as stems measuring at least 12 bp, bordering loops 5 to 100 nt in length. G-U pairing in the stems was allowed. SLSs found in natural genomes are constantly more numerous and stable than those expected to randomly form in sequences of comparable size and composition. The large majority of SLSs fall within protein-coding regions but enrichment of specific, non random, SLS sub-populations of higher stability was observed within the intergenic regions of the chromosomes of several species. In low-GC firmicutes, most higher stability intergenic SLSs resemble canonical rho-independent transcriptional terminators, but very frequently feature at the 5'-end an additional A-rich stretch complementary to the 3' uridines. In all species, a clearly biased SLS distribution was observed within the intergenic space, with most concentrating at the 3'-end side of flanking CDSs. Some intergenic SLS regions are members of novel repeated sequence families.

Conclusion: In depth analysis of SLS features and distribution in 40 different bacterial genomes showed the presence of non random populations of such structures in all species. Many of these structures are plausibly transcribed, and might be involved in the control of transcription termination, or might serve as RNA elements which can enhance either the stability or the turnover of cotranscribed mRNAs. Three previously undescribed families of repeated sequences were found in Yersiniae, Bordetellae and Enterococci.

Show MeSH
Related in: MedlinePlus