Limits...
Investigations of oligonucleotide usage variance within and between prokaryotes.

Bohlin J, Skjerve E, Ussery DW - PLoS Comput. Biol. (2008)

Bottom Line: Among the results found was that prokaryotic chromosomes can be described by hexanucleotide frequencies, suggesting that prokaryotic DNA is predominantly short range correlated, i.e., information in prokaryotic genomes is encoded in short oligonucleotides.Pronounced DNA compositional differences were found both within and between AT-rich and GC-rich genomes.The differences found between AT-rich and GC-rich genomes may possibly be attributed to lifestyle, since tetranucleotide usage within host-associated bacteria was, on average, more dissimilar and less biased than free-living archaea and bacteria.

View Article: PubMed Central - PubMed

Affiliation: Norwegian School of Veterinary Science, Oslo, Norway.

ABSTRACT
Oligonucleotide usage in archaeal and bacterial genomes can be linked to a number of properties, including codon usage (trinucleotides), DNA base-stacking energy (dinucleotides), and DNA structural conformation (di- to tetranucleotides). We wanted to assess the statistical information potential of different DNA 'word-sizes' and explore how oligonucleotide frequencies differ in coding and non-coding regions. In addition, we used oligonucleotide frequencies to investigate DNA composition and how DNA sequence patterns change within and between prokaryotic organisms. Among the results found was that prokaryotic chromosomes can be described by hexanucleotide frequencies, suggesting that prokaryotic DNA is predominantly short range correlated, i.e., information in prokaryotic genomes is encoded in short oligonucleotides. Oligonucleotide usage varied more within AT-rich and host-associated genomes than in GC-rich and free-living genomes, and this variation was mainly located in non-coding regions. Bias (selectional pressure) in tetranucleotide usage correlated with GC content, and coding regions were more biased than non-coding regions. Non-coding regions were also found to be approximately 5.5% more AT-rich than coding regions, on average, in the 402 chromosomes examined. Pronounced DNA compositional differences were found both within and between AT-rich and GC-rich genomes. GC-rich genomes were more similar and biased in terms of tetranucleotide usage in non-coding regions than AT-rich genomes. The differences found between AT-rich and GC-rich genomes may possibly be attributed to lifestyle, since tetranucleotide usage within host-associated bacteria was, on average, more dissimilar and less biased than free-living archaea and bacteria.

Show MeSH
Variation of GC content within genomes.The vertical axis shows the variance of nucleotide frequencies within chromosomes (red line) and coding regions (blue line) compared with corresponding mean genomic GC content on the horizontal axis. Lower average nucleotide variance scores (vertical axis) means more similar distributions of GC content within chromosomes (and vice versa).
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2289840&req=5

pcbi-1000057-g006: Variation of GC content within genomes.The vertical axis shows the variance of nucleotide frequencies within chromosomes (red line) and coding regions (blue line) compared with corresponding mean genomic GC content on the horizontal axis. Lower average nucleotide variance scores (vertical axis) means more similar distributions of GC content within chromosomes (and vice versa).

Mentions: Predicted tetranucleotide usage based on genomic nucleotide frequencies was used to estimate variance in GC content within genomes (Figure 6). Since intrinsic tetranucleotide usage variance predictions were only based on nucleotide frequencies, these values were directly associated with fluctuations in local GC content obtained by comparing 40 kbp sliding windows with global (mean) GC content. We therefore wanted to investigate if fluctuations of intrinsic GC content showed any relation to global GC content, and whether there was a difference between coding and non-coding sections. Using regression analysis, significant correlation was found between global GC content and expected tetranucleotide usage variance, with the following equation:YE_OUV (response) represents expected OUV usage and XGC GC content (predictor).


Investigations of oligonucleotide usage variance within and between prokaryotes.

Bohlin J, Skjerve E, Ussery DW - PLoS Comput. Biol. (2008)

Variation of GC content within genomes.The vertical axis shows the variance of nucleotide frequencies within chromosomes (red line) and coding regions (blue line) compared with corresponding mean genomic GC content on the horizontal axis. Lower average nucleotide variance scores (vertical axis) means more similar distributions of GC content within chromosomes (and vice versa).
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2289840&req=5

pcbi-1000057-g006: Variation of GC content within genomes.The vertical axis shows the variance of nucleotide frequencies within chromosomes (red line) and coding regions (blue line) compared with corresponding mean genomic GC content on the horizontal axis. Lower average nucleotide variance scores (vertical axis) means more similar distributions of GC content within chromosomes (and vice versa).
Mentions: Predicted tetranucleotide usage based on genomic nucleotide frequencies was used to estimate variance in GC content within genomes (Figure 6). Since intrinsic tetranucleotide usage variance predictions were only based on nucleotide frequencies, these values were directly associated with fluctuations in local GC content obtained by comparing 40 kbp sliding windows with global (mean) GC content. We therefore wanted to investigate if fluctuations of intrinsic GC content showed any relation to global GC content, and whether there was a difference between coding and non-coding sections. Using regression analysis, significant correlation was found between global GC content and expected tetranucleotide usage variance, with the following equation:YE_OUV (response) represents expected OUV usage and XGC GC content (predictor).

Bottom Line: Among the results found was that prokaryotic chromosomes can be described by hexanucleotide frequencies, suggesting that prokaryotic DNA is predominantly short range correlated, i.e., information in prokaryotic genomes is encoded in short oligonucleotides.Pronounced DNA compositional differences were found both within and between AT-rich and GC-rich genomes.The differences found between AT-rich and GC-rich genomes may possibly be attributed to lifestyle, since tetranucleotide usage within host-associated bacteria was, on average, more dissimilar and less biased than free-living archaea and bacteria.

View Article: PubMed Central - PubMed

Affiliation: Norwegian School of Veterinary Science, Oslo, Norway.

ABSTRACT
Oligonucleotide usage in archaeal and bacterial genomes can be linked to a number of properties, including codon usage (trinucleotides), DNA base-stacking energy (dinucleotides), and DNA structural conformation (di- to tetranucleotides). We wanted to assess the statistical information potential of different DNA 'word-sizes' and explore how oligonucleotide frequencies differ in coding and non-coding regions. In addition, we used oligonucleotide frequencies to investigate DNA composition and how DNA sequence patterns change within and between prokaryotic organisms. Among the results found was that prokaryotic chromosomes can be described by hexanucleotide frequencies, suggesting that prokaryotic DNA is predominantly short range correlated, i.e., information in prokaryotic genomes is encoded in short oligonucleotides. Oligonucleotide usage varied more within AT-rich and host-associated genomes than in GC-rich and free-living genomes, and this variation was mainly located in non-coding regions. Bias (selectional pressure) in tetranucleotide usage correlated with GC content, and coding regions were more biased than non-coding regions. Non-coding regions were also found to be approximately 5.5% more AT-rich than coding regions, on average, in the 402 chromosomes examined. Pronounced DNA compositional differences were found both within and between AT-rich and GC-rich genomes. GC-rich genomes were more similar and biased in terms of tetranucleotide usage in non-coding regions than AT-rich genomes. The differences found between AT-rich and GC-rich genomes may possibly be attributed to lifestyle, since tetranucleotide usage within host-associated bacteria was, on average, more dissimilar and less biased than free-living archaea and bacteria.

Show MeSH