Limits...
Investigations of oligonucleotide usage variance within and between prokaryotes.

Bohlin J, Skjerve E, Ussery DW - PLoS Comput. Biol. (2008)

Bottom Line: Among the results found was that prokaryotic chromosomes can be described by hexanucleotide frequencies, suggesting that prokaryotic DNA is predominantly short range correlated, i.e., information in prokaryotic genomes is encoded in short oligonucleotides.Pronounced DNA compositional differences were found both within and between AT-rich and GC-rich genomes.The differences found between AT-rich and GC-rich genomes may possibly be attributed to lifestyle, since tetranucleotide usage within host-associated bacteria was, on average, more dissimilar and less biased than free-living archaea and bacteria.

View Article: PubMed Central - PubMed

Affiliation: Norwegian School of Veterinary Science, Oslo, Norway.

ABSTRACT
Oligonucleotide usage in archaeal and bacterial genomes can be linked to a number of properties, including codon usage (trinucleotides), DNA base-stacking energy (dinucleotides), and DNA structural conformation (di- to tetranucleotides). We wanted to assess the statistical information potential of different DNA 'word-sizes' and explore how oligonucleotide frequencies differ in coding and non-coding regions. In addition, we used oligonucleotide frequencies to investigate DNA composition and how DNA sequence patterns change within and between prokaryotic organisms. Among the results found was that prokaryotic chromosomes can be described by hexanucleotide frequencies, suggesting that prokaryotic DNA is predominantly short range correlated, i.e., information in prokaryotic genomes is encoded in short oligonucleotides. Oligonucleotide usage varied more within AT-rich and host-associated genomes than in GC-rich and free-living genomes, and this variation was mainly located in non-coding regions. Bias (selectional pressure) in tetranucleotide usage correlated with GC content, and coding regions were more biased than non-coding regions. Non-coding regions were also found to be approximately 5.5% more AT-rich than coding regions, on average, in the 402 chromosomes examined. Pronounced DNA compositional differences were found both within and between AT-rich and GC-rich genomes. GC-rich genomes were more similar and biased in terms of tetranucleotide usage in non-coding regions than AT-rich genomes. The differences found between AT-rich and GC-rich genomes may possibly be attributed to lifestyle, since tetranucleotide usage within host-associated bacteria was, on average, more dissimilar and less biased than free-living archaea and bacteria.

Show MeSH
Statistical information potential in differently sized oligonucleotides.Cumulative information potential is measured in di- to octanucleotide frequencies in prokaryotic genomes with GC content between 47% and 53%. These genomes were selected because of the increased sensitivity of the Pearson correlation measure for chromosomes with similar AT/GC content. The archaeal and bacterial chromosomes are represented along the horizontal axis, sorted by increasing GC content from left to right, with corresponding correlation scores between observed n-mer words and approximated n-mer words on the vertical axis. The n-mer words were approximated by observed (n–1)-mer words and genomic nucleotide frequencies. High correlation scores indicate increased similarity between observed and approximated oligonucleotide usage.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2289840&req=5

pcbi-1000057-g001: Statistical information potential in differently sized oligonucleotides.Cumulative information potential is measured in di- to octanucleotide frequencies in prokaryotic genomes with GC content between 47% and 53%. These genomes were selected because of the increased sensitivity of the Pearson correlation measure for chromosomes with similar AT/GC content. The archaeal and bacterial chromosomes are represented along the horizontal axis, sorted by increasing GC content from left to right, with corresponding correlation scores between observed n-mer words and approximated n-mer words on the vertical axis. The n-mer words were approximated by observed (n–1)-mer words and genomic nucleotide frequencies. High correlation scores indicate increased similarity between observed and approximated oligonucleotide usage.

Mentions: We measured the statistical information carried by the differently sized oligonucleotides from di- to octanucleotides in prokaryotes with GC contents between 47% and 53%. From Figure 1, it can be observed that the largest increase in information was obtained by going from nucleotide frequency approximation of dinucleotides to trinucleotide usage approximations based on dinucleotide frequencies and GC content (details can be found in Materials and Methods). A more careful investigation of Figure 1 revealed that progressively less information was gained from usage approximations of tetranucleotides up to heptanucleotides, and practically no additional information appeared to be present in approximated octanucleotide frequencies. Thus, oligonucleotide sizes larger than hexanucleotides possess little additional information potential, if any, in prokaryotic DNA.


Investigations of oligonucleotide usage variance within and between prokaryotes.

Bohlin J, Skjerve E, Ussery DW - PLoS Comput. Biol. (2008)

Statistical information potential in differently sized oligonucleotides.Cumulative information potential is measured in di- to octanucleotide frequencies in prokaryotic genomes with GC content between 47% and 53%. These genomes were selected because of the increased sensitivity of the Pearson correlation measure for chromosomes with similar AT/GC content. The archaeal and bacterial chromosomes are represented along the horizontal axis, sorted by increasing GC content from left to right, with corresponding correlation scores between observed n-mer words and approximated n-mer words on the vertical axis. The n-mer words were approximated by observed (n–1)-mer words and genomic nucleotide frequencies. High correlation scores indicate increased similarity between observed and approximated oligonucleotide usage.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2289840&req=5

pcbi-1000057-g001: Statistical information potential in differently sized oligonucleotides.Cumulative information potential is measured in di- to octanucleotide frequencies in prokaryotic genomes with GC content between 47% and 53%. These genomes were selected because of the increased sensitivity of the Pearson correlation measure for chromosomes with similar AT/GC content. The archaeal and bacterial chromosomes are represented along the horizontal axis, sorted by increasing GC content from left to right, with corresponding correlation scores between observed n-mer words and approximated n-mer words on the vertical axis. The n-mer words were approximated by observed (n–1)-mer words and genomic nucleotide frequencies. High correlation scores indicate increased similarity between observed and approximated oligonucleotide usage.
Mentions: We measured the statistical information carried by the differently sized oligonucleotides from di- to octanucleotides in prokaryotes with GC contents between 47% and 53%. From Figure 1, it can be observed that the largest increase in information was obtained by going from nucleotide frequency approximation of dinucleotides to trinucleotide usage approximations based on dinucleotide frequencies and GC content (details can be found in Materials and Methods). A more careful investigation of Figure 1 revealed that progressively less information was gained from usage approximations of tetranucleotides up to heptanucleotides, and practically no additional information appeared to be present in approximated octanucleotide frequencies. Thus, oligonucleotide sizes larger than hexanucleotides possess little additional information potential, if any, in prokaryotic DNA.

Bottom Line: Among the results found was that prokaryotic chromosomes can be described by hexanucleotide frequencies, suggesting that prokaryotic DNA is predominantly short range correlated, i.e., information in prokaryotic genomes is encoded in short oligonucleotides.Pronounced DNA compositional differences were found both within and between AT-rich and GC-rich genomes.The differences found between AT-rich and GC-rich genomes may possibly be attributed to lifestyle, since tetranucleotide usage within host-associated bacteria was, on average, more dissimilar and less biased than free-living archaea and bacteria.

View Article: PubMed Central - PubMed

Affiliation: Norwegian School of Veterinary Science, Oslo, Norway.

ABSTRACT
Oligonucleotide usage in archaeal and bacterial genomes can be linked to a number of properties, including codon usage (trinucleotides), DNA base-stacking energy (dinucleotides), and DNA structural conformation (di- to tetranucleotides). We wanted to assess the statistical information potential of different DNA 'word-sizes' and explore how oligonucleotide frequencies differ in coding and non-coding regions. In addition, we used oligonucleotide frequencies to investigate DNA composition and how DNA sequence patterns change within and between prokaryotic organisms. Among the results found was that prokaryotic chromosomes can be described by hexanucleotide frequencies, suggesting that prokaryotic DNA is predominantly short range correlated, i.e., information in prokaryotic genomes is encoded in short oligonucleotides. Oligonucleotide usage varied more within AT-rich and host-associated genomes than in GC-rich and free-living genomes, and this variation was mainly located in non-coding regions. Bias (selectional pressure) in tetranucleotide usage correlated with GC content, and coding regions were more biased than non-coding regions. Non-coding regions were also found to be approximately 5.5% more AT-rich than coding regions, on average, in the 402 chromosomes examined. Pronounced DNA compositional differences were found both within and between AT-rich and GC-rich genomes. GC-rich genomes were more similar and biased in terms of tetranucleotide usage in non-coding regions than AT-rich genomes. The differences found between AT-rich and GC-rich genomes may possibly be attributed to lifestyle, since tetranucleotide usage within host-associated bacteria was, on average, more dissimilar and less biased than free-living archaea and bacteria.

Show MeSH