Limits...
Investigations of oligonucleotide usage variance within and between prokaryotes.

Bohlin J, Skjerve E, Ussery DW - PLoS Comput. Biol. (2008)

Bottom Line: Among the results found was that prokaryotic chromosomes can be described by hexanucleotide frequencies, suggesting that prokaryotic DNA is predominantly short range correlated, i.e., information in prokaryotic genomes is encoded in short oligonucleotides.Pronounced DNA compositional differences were found both within and between AT-rich and GC-rich genomes.The differences found between AT-rich and GC-rich genomes may possibly be attributed to lifestyle, since tetranucleotide usage within host-associated bacteria was, on average, more dissimilar and less biased than free-living archaea and bacteria.

View Article: PubMed Central - PubMed

Affiliation: Norwegian School of Veterinary Science, Oslo, Norway.

ABSTRACT
Oligonucleotide usage in archaeal and bacterial genomes can be linked to a number of properties, including codon usage (trinucleotides), DNA base-stacking energy (dinucleotides), and DNA structural conformation (di- to tetranucleotides). We wanted to assess the statistical information potential of different DNA 'word-sizes' and explore how oligonucleotide frequencies differ in coding and non-coding regions. In addition, we used oligonucleotide frequencies to investigate DNA composition and how DNA sequence patterns change within and between prokaryotic organisms. Among the results found was that prokaryotic chromosomes can be described by hexanucleotide frequencies, suggesting that prokaryotic DNA is predominantly short range correlated, i.e., information in prokaryotic genomes is encoded in short oligonucleotides. Oligonucleotide usage varied more within AT-rich and host-associated genomes than in GC-rich and free-living genomes, and this variation was mainly located in non-coding regions. Bias (selectional pressure) in tetranucleotide usage correlated with GC content, and coding regions were more biased than non-coding regions. Non-coding regions were also found to be approximately 5.5% more AT-rich than coding regions, on average, in the 402 chromosomes examined. Pronounced DNA compositional differences were found both within and between AT-rich and GC-rich genomes. GC-rich genomes were more similar and biased in terms of tetranucleotide usage in non-coding regions than AT-rich genomes. The differences found between AT-rich and GC-rich genomes may possibly be attributed to lifestyle, since tetranucleotide usage within host-associated bacteria was, on average, more dissimilar and less biased than free-living archaea and bacteria.

Show MeSH

Related in: MedlinePlus

Tetranucleotide usage variance measures of 402 archaeal and bacterial chromosomes.Prokaryotic chromosomes are sorted by increasing GC content from left to right (vertical axis), with red and blue regression lines representing OUV values (horizontal axis) for chromosomes and coding regions, respectively. Larger values imply more bias, or stronger selectional pressure, in genomic tetranucleotide usage. The surrounding dotted lines indicate 99% prediction intervals.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2289840&req=5

pcbi-1000057-g003: Tetranucleotide usage variance measures of 402 archaeal and bacterial chromosomes.Prokaryotic chromosomes are sorted by increasing GC content from left to right (vertical axis), with red and blue regression lines representing OUV values (horizontal axis) for chromosomes and coding regions, respectively. Larger values imply more bias, or stronger selectional pressure, in genomic tetranucleotide usage. The surrounding dotted lines indicate 99% prediction intervals.

Mentions: We measured how tetranucleotide usage varied in genomes compared with expected tetranucleotide usage. This expected tetranucleotide usage was calculated from mean genomic GC content, and implicitly assumes that each nucleotide in every tetranucleotide, and therefore also the whole chromosome, is independent of its neighbors. In other words, the more similar observed and expected tetranucleotide frequencies are, the more random (i.e. less biased) are the observed tetranucleotide frequencies, and thus the genomic DNA composition. Figure 3 shows how OUV varied between genomes compared to genomic GC content. Significant correlation was found between GC content and OUV values using the following regression equation:YOUV designates genomic OUV values (response) while the predictor, XGC, represents GC content. Our results showed that GC rich archaea and bacteria tended to have a less random DNA composition than AT rich. The reason for this is not known, but it has been argued [3] that thermodynamic properties of tetranucleotides may be important, i.e. base stacking energy and curvature. Tetranucleotide usage variance in coding regions was found to be even more strongly correlated with global GC content:YCOUV designates OUV values in coding regions (response), while XGC is global GC content (predictor).


Investigations of oligonucleotide usage variance within and between prokaryotes.

Bohlin J, Skjerve E, Ussery DW - PLoS Comput. Biol. (2008)

Tetranucleotide usage variance measures of 402 archaeal and bacterial chromosomes.Prokaryotic chromosomes are sorted by increasing GC content from left to right (vertical axis), with red and blue regression lines representing OUV values (horizontal axis) for chromosomes and coding regions, respectively. Larger values imply more bias, or stronger selectional pressure, in genomic tetranucleotide usage. The surrounding dotted lines indicate 99% prediction intervals.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2289840&req=5

pcbi-1000057-g003: Tetranucleotide usage variance measures of 402 archaeal and bacterial chromosomes.Prokaryotic chromosomes are sorted by increasing GC content from left to right (vertical axis), with red and blue regression lines representing OUV values (horizontal axis) for chromosomes and coding regions, respectively. Larger values imply more bias, or stronger selectional pressure, in genomic tetranucleotide usage. The surrounding dotted lines indicate 99% prediction intervals.
Mentions: We measured how tetranucleotide usage varied in genomes compared with expected tetranucleotide usage. This expected tetranucleotide usage was calculated from mean genomic GC content, and implicitly assumes that each nucleotide in every tetranucleotide, and therefore also the whole chromosome, is independent of its neighbors. In other words, the more similar observed and expected tetranucleotide frequencies are, the more random (i.e. less biased) are the observed tetranucleotide frequencies, and thus the genomic DNA composition. Figure 3 shows how OUV varied between genomes compared to genomic GC content. Significant correlation was found between GC content and OUV values using the following regression equation:YOUV designates genomic OUV values (response) while the predictor, XGC, represents GC content. Our results showed that GC rich archaea and bacteria tended to have a less random DNA composition than AT rich. The reason for this is not known, but it has been argued [3] that thermodynamic properties of tetranucleotides may be important, i.e. base stacking energy and curvature. Tetranucleotide usage variance in coding regions was found to be even more strongly correlated with global GC content:YCOUV designates OUV values in coding regions (response), while XGC is global GC content (predictor).

Bottom Line: Among the results found was that prokaryotic chromosomes can be described by hexanucleotide frequencies, suggesting that prokaryotic DNA is predominantly short range correlated, i.e., information in prokaryotic genomes is encoded in short oligonucleotides.Pronounced DNA compositional differences were found both within and between AT-rich and GC-rich genomes.The differences found between AT-rich and GC-rich genomes may possibly be attributed to lifestyle, since tetranucleotide usage within host-associated bacteria was, on average, more dissimilar and less biased than free-living archaea and bacteria.

View Article: PubMed Central - PubMed

Affiliation: Norwegian School of Veterinary Science, Oslo, Norway.

ABSTRACT
Oligonucleotide usage in archaeal and bacterial genomes can be linked to a number of properties, including codon usage (trinucleotides), DNA base-stacking energy (dinucleotides), and DNA structural conformation (di- to tetranucleotides). We wanted to assess the statistical information potential of different DNA 'word-sizes' and explore how oligonucleotide frequencies differ in coding and non-coding regions. In addition, we used oligonucleotide frequencies to investigate DNA composition and how DNA sequence patterns change within and between prokaryotic organisms. Among the results found was that prokaryotic chromosomes can be described by hexanucleotide frequencies, suggesting that prokaryotic DNA is predominantly short range correlated, i.e., information in prokaryotic genomes is encoded in short oligonucleotides. Oligonucleotide usage varied more within AT-rich and host-associated genomes than in GC-rich and free-living genomes, and this variation was mainly located in non-coding regions. Bias (selectional pressure) in tetranucleotide usage correlated with GC content, and coding regions were more biased than non-coding regions. Non-coding regions were also found to be approximately 5.5% more AT-rich than coding regions, on average, in the 402 chromosomes examined. Pronounced DNA compositional differences were found both within and between AT-rich and GC-rich genomes. GC-rich genomes were more similar and biased in terms of tetranucleotide usage in non-coding regions than AT-rich genomes. The differences found between AT-rich and GC-rich genomes may possibly be attributed to lifestyle, since tetranucleotide usage within host-associated bacteria was, on average, more dissimilar and less biased than free-living archaea and bacteria.

Show MeSH
Related in: MedlinePlus