Limits...
Applying Shannon's information theory to bacterial and phage genomes and metagenomes.

Akhter S, Bailey BA, Salamon P, Aziz RK, Edwards RA - Sci Rep (2013)

Bottom Line: The information content of a genome was found to be highly dependent on the genome length, GC content, and sequence word size.In metagenomic sequences, the amount of information correlated with the number of matches found by comparison to sequence databases.A sequence with more information (higher uncertainty) has a higher probability of being significantly similar to other sequences in the database.

View Article: PubMed Central - PubMed

Affiliation: Computational Science Research Center, San Diego State University, San Diego, CA, USA. sakhter@sciences.sdsu.edu

ABSTRACT
All sequence data contain inherent information that can be measured by Shannon's uncertainty theory. Such measurement is valuable in evaluating large data sets, such as metagenomic libraries, to prioritize their analysis and annotation, thus saving computational resources. Here, Shannon's index of complete phage and bacterial genomes was examined. The information content of a genome was found to be highly dependent on the genome length, GC content, and sequence word size. In metagenomic sequences, the amount of information correlated with the number of matches found by comparison to sequence databases. A sequence with more information (higher uncertainty) has a higher probability of being significantly similar to other sequences in the database. Measuring uncertainty may be used for rapid screening for sequences with matches in available database, prioritizing computational resources, and indicating which sequences with no known similarities are likely to be important for more detailed analysis.

Show MeSH

Related in: MedlinePlus

Shannon's indices of 600 complete phage genomes and 94 complete bacterial genomes.Blue crosses represent phage genomes and red circles represent bacterial genomes. As the word length increases, Shannon's index is more discriminatory between phage and bacterial genomes.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3539204&req=5

f1: Shannon's indices of 600 complete phage genomes and 94 complete bacterial genomes.Blue crosses represent phage genomes and red circles represent bacterial genomes. As the word length increases, Shannon's index is more discriminatory between phage and bacterial genomes.

Mentions: Shannon's index was calculated for 600 complete phage genomes and 94 complete bacterial genomes (listed in Supplementary Table S1) using word lengths ranging from 1 to 12 nucleotides (nt). Shannon's indices of phage and bacterial genomes were similar up to word length 7 nt (Figure 1), implying an even distribution of all possible sequence words in phage and bacterial genomes. From word length 8 to 12 nt, the rate of increase of Shannon's index is higher in bacterial genomes than in phage genomes. Moreover, for word lengths greater than 10 nt, Shannon's index can differentiate bacteria and phages (Figure 1).


Applying Shannon's information theory to bacterial and phage genomes and metagenomes.

Akhter S, Bailey BA, Salamon P, Aziz RK, Edwards RA - Sci Rep (2013)

Shannon's indices of 600 complete phage genomes and 94 complete bacterial genomes.Blue crosses represent phage genomes and red circles represent bacterial genomes. As the word length increases, Shannon's index is more discriminatory between phage and bacterial genomes.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3539204&req=5

f1: Shannon's indices of 600 complete phage genomes and 94 complete bacterial genomes.Blue crosses represent phage genomes and red circles represent bacterial genomes. As the word length increases, Shannon's index is more discriminatory between phage and bacterial genomes.
Mentions: Shannon's index was calculated for 600 complete phage genomes and 94 complete bacterial genomes (listed in Supplementary Table S1) using word lengths ranging from 1 to 12 nucleotides (nt). Shannon's indices of phage and bacterial genomes were similar up to word length 7 nt (Figure 1), implying an even distribution of all possible sequence words in phage and bacterial genomes. From word length 8 to 12 nt, the rate of increase of Shannon's index is higher in bacterial genomes than in phage genomes. Moreover, for word lengths greater than 10 nt, Shannon's index can differentiate bacteria and phages (Figure 1).

Bottom Line: The information content of a genome was found to be highly dependent on the genome length, GC content, and sequence word size.In metagenomic sequences, the amount of information correlated with the number of matches found by comparison to sequence databases.A sequence with more information (higher uncertainty) has a higher probability of being significantly similar to other sequences in the database.

View Article: PubMed Central - PubMed

Affiliation: Computational Science Research Center, San Diego State University, San Diego, CA, USA. sakhter@sciences.sdsu.edu

ABSTRACT
All sequence data contain inherent information that can be measured by Shannon's uncertainty theory. Such measurement is valuable in evaluating large data sets, such as metagenomic libraries, to prioritize their analysis and annotation, thus saving computational resources. Here, Shannon's index of complete phage and bacterial genomes was examined. The information content of a genome was found to be highly dependent on the genome length, GC content, and sequence word size. In metagenomic sequences, the amount of information correlated with the number of matches found by comparison to sequence databases. A sequence with more information (higher uncertainty) has a higher probability of being significantly similar to other sequences in the database. Measuring uncertainty may be used for rapid screening for sequences with matches in available database, prioritizing computational resources, and indicating which sequences with no known similarities are likely to be important for more detailed analysis.

Show MeSH
Related in: MedlinePlus