Limits...
Average genome size estimation improves comparative metagenomics and sheds light on the functional ecology of the human microbiome.

Nayfach S, Pollard KS - Genome Biol. (2015)

Bottom Line: We developed MicrobeCensus to rapidly and accurately estimate average genome size from shotgun metagenomic data and applied our tool to 1,352 human microbiome samples.We found that average genome size differs significantly within and between body sites and tracks with major functional and taxonomic differences.In the gut, average genome size is positively correlated with the abundance of Bacteroides and genes related to carbohydrate metabolism.

View Article: PubMed Central - PubMed

ABSTRACT
Average genome size is an important, yet often overlooked, property of microbial communities. We developed MicrobeCensus to rapidly and accurately estimate average genome size from shotgun metagenomic data and applied our tool to 1,352 human microbiome samples. We found that average genome size differs significantly within and between body sites and tracks with major functional and taxonomic differences. In the gut, average genome size is positively correlated with the abundance of Bacteroides and genes related to carbohydrate metabolism. Importantly, we found that average genome size variation can bias comparative analyses, and that normalization improves detection of differentially abundant genes.

Show MeSH
The effect of sequencing error on estimation accuracy. Unsigned error is defined as: /AĜS - AGS//AGS. (A) MicrobeCensus was used to estimate the AGS of 20 metagenomes that were simulated with up to a 5% sequencing error rate. Metagenomes were composed of 100-bp reads from prokaryotes. (B) MicrobeCensus was used to estimate the AGS of 10 metagenomes that were composed of real Illumina reads pooled from 10 randomly chosen isolate sequencing projects.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4389708&req=5

Fig3: The effect of sequencing error on estimation accuracy. Unsigned error is defined as: /AĜS - AGS//AGS. (A) MicrobeCensus was used to estimate the AGS of 20 metagenomes that were simulated with up to a 5% sequencing error rate. Metagenomes were composed of 100-bp reads from prokaryotes. (B) MicrobeCensus was used to estimate the AGS of 10 metagenomes that were composed of real Illumina reads pooled from 10 randomly chosen isolate sequencing projects.

Mentions: While we were able to obtain accurate estimates of average genome size for error-free libraries, it was not clear if we could accurately estimate AGS from libraries that contained sequencing error, artificially duplicated reads, and a non-uniform distribution of coverage. To evaluate the effect of sequencing error, we simulated 100-bp metagenomes with up to 5% sequencing error rates (Materials and methods). We found that we could accurately estimate AGS from libraries that contained up to 2% sequencing error - beyond this, estimation error quickly increased (Figure 3A). Luckily, most current sequencing platforms have raw error rates below 2%, including Illumina MiSeq (0.80%), Ion Torrent PGM (1.71%), Illumina GAIIx (0.76%), Illumina HiSeq 2000 (0.26%), and 454 GS-FLX Titanium (1.07%) [30,31].Figure 3


Average genome size estimation improves comparative metagenomics and sheds light on the functional ecology of the human microbiome.

Nayfach S, Pollard KS - Genome Biol. (2015)

The effect of sequencing error on estimation accuracy. Unsigned error is defined as: /AĜS - AGS//AGS. (A) MicrobeCensus was used to estimate the AGS of 20 metagenomes that were simulated with up to a 5% sequencing error rate. Metagenomes were composed of 100-bp reads from prokaryotes. (B) MicrobeCensus was used to estimate the AGS of 10 metagenomes that were composed of real Illumina reads pooled from 10 randomly chosen isolate sequencing projects.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4389708&req=5

Fig3: The effect of sequencing error on estimation accuracy. Unsigned error is defined as: /AĜS - AGS//AGS. (A) MicrobeCensus was used to estimate the AGS of 20 metagenomes that were simulated with up to a 5% sequencing error rate. Metagenomes were composed of 100-bp reads from prokaryotes. (B) MicrobeCensus was used to estimate the AGS of 10 metagenomes that were composed of real Illumina reads pooled from 10 randomly chosen isolate sequencing projects.
Mentions: While we were able to obtain accurate estimates of average genome size for error-free libraries, it was not clear if we could accurately estimate AGS from libraries that contained sequencing error, artificially duplicated reads, and a non-uniform distribution of coverage. To evaluate the effect of sequencing error, we simulated 100-bp metagenomes with up to 5% sequencing error rates (Materials and methods). We found that we could accurately estimate AGS from libraries that contained up to 2% sequencing error - beyond this, estimation error quickly increased (Figure 3A). Luckily, most current sequencing platforms have raw error rates below 2%, including Illumina MiSeq (0.80%), Ion Torrent PGM (1.71%), Illumina GAIIx (0.76%), Illumina HiSeq 2000 (0.26%), and 454 GS-FLX Titanium (1.07%) [30,31].Figure 3

Bottom Line: We developed MicrobeCensus to rapidly and accurately estimate average genome size from shotgun metagenomic data and applied our tool to 1,352 human microbiome samples.We found that average genome size differs significantly within and between body sites and tracks with major functional and taxonomic differences.In the gut, average genome size is positively correlated with the abundance of Bacteroides and genes related to carbohydrate metabolism.

View Article: PubMed Central - PubMed

ABSTRACT
Average genome size is an important, yet often overlooked, property of microbial communities. We developed MicrobeCensus to rapidly and accurately estimate average genome size from shotgun metagenomic data and applied our tool to 1,352 human microbiome samples. We found that average genome size differs significantly within and between body sites and tracks with major functional and taxonomic differences. In the gut, average genome size is positively correlated with the abundance of Bacteroides and genes related to carbohydrate metabolism. Importantly, we found that average genome size variation can bias comparative analyses, and that normalization improves detection of differentially abundant genes.

Show MeSH