Limits...
Average genome size estimation improves comparative metagenomics and sheds light on the functional ecology of the human microbiome.

Nayfach S, Pollard KS - Genome Biol. (2015)

Bottom Line: We developed MicrobeCensus to rapidly and accurately estimate average genome size from shotgun metagenomic data and applied our tool to 1,352 human microbiome samples.We found that average genome size differs significantly within and between body sites and tracks with major functional and taxonomic differences.In the gut, average genome size is positively correlated with the abundance of Bacteroides and genes related to carbohydrate metabolism.

View Article: PubMed Central - PubMed

ABSTRACT
Average genome size is an important, yet often overlooked, property of microbial communities. We developed MicrobeCensus to rapidly and accurately estimate average genome size from shotgun metagenomic data and applied our tool to 1,352 human microbiome samples. We found that average genome size differs significantly within and between body sites and tracks with major functional and taxonomic differences. In the gut, average genome size is positively correlated with the abundance of Bacteroides and genes related to carbohydrate metabolism. Importantly, we found that average genome size variation can bias comparative analyses, and that normalization improves detection of differentially abundant genes.

Show MeSH

Related in: MedlinePlus

Estimation accuracy in the presence of microbial eukaryotes and viruses. (A) MicrobeCensus was used to estimate the AGS of 20 simulated 100-bp metagenomes that contained up to 50% of Fungi, representing up to 94% of total reads. Note that axes are plotted on a log scale. (B) MicrobeCensus was used to estimate the AGS of 20 simulated 100-bp metagenomes that contained up to 50% of reads from viruses. Signed error is defined as: (AĜS - AGS)/AGS. (C) AGS estimates from (B) were used to estimate the total coverage of microbial genomes present in the simulated metagenomes. Estimated coverage of microbial genomes was obtained by dividing the number of total base pairs in a metagenome by the estimated AGS for that metagenome.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4389708&req=5

Fig4: Estimation accuracy in the presence of microbial eukaryotes and viruses. (A) MicrobeCensus was used to estimate the AGS of 20 simulated 100-bp metagenomes that contained up to 50% of Fungi, representing up to 94% of total reads. Note that axes are plotted on a log scale. (B) MicrobeCensus was used to estimate the AGS of 20 simulated 100-bp metagenomes that contained up to 50% of reads from viruses. Signed error is defined as: (AĜS - AGS)/AGS. (C) AGS estimates from (B) were used to estimate the total coverage of microbial genomes present in the simulated metagenomes. Estimated coverage of microbial genomes was obtained by dividing the number of total base pairs in a metagenome by the estimated AGS for that metagenome.

Mentions: Next, we evaluated whether accurate estimates of AGS could be made in the presence of microbial eukaryotes. Fungi are typically minority members of human microbiome communities but can occasionally constitute a significant proportion of metagenomic libraries [1,20,33]. To address this, we simulated 20 communities in which 0 to 50% of genomes were Fungi and used MicrobeCensus to estimate AGS for these communities (Additional file 6; Materials and methods). Fungal genome sizes ranged from 2.5 to 66.3 Mb with an average of 20.4 Mb. We used Fungi as a proxy for microbial eukaryotes due to the availability of complete genome sequences and the presence of these taxa in the human microbiome [34,35]. Because microbial eukaryotes were not included in our database and were not used to train MicrobeCensus, we wondered if the presence of these taxa would lead to inaccurate estimates of AGS. Surprisingly, AGS estimates for most of these communities were quite accurate, although not as accurate as for the bacterial and archaeal communities (Figure 4A). Even when Fungi were at 50% relative abundance, representing, on average, 79% of total reads, the median unsigned error was only 5% for the 20 communities. Nonetheless, in the future, when more complete genome sequences of microbial eukaryotes are available, particularly for Protists, it should be possible to retrain MicrobeCensus to achieve optimal performance for these types of communities. We have included training code in our software package for such extensions.Figure 4


Average genome size estimation improves comparative metagenomics and sheds light on the functional ecology of the human microbiome.

Nayfach S, Pollard KS - Genome Biol. (2015)

Estimation accuracy in the presence of microbial eukaryotes and viruses. (A) MicrobeCensus was used to estimate the AGS of 20 simulated 100-bp metagenomes that contained up to 50% of Fungi, representing up to 94% of total reads. Note that axes are plotted on a log scale. (B) MicrobeCensus was used to estimate the AGS of 20 simulated 100-bp metagenomes that contained up to 50% of reads from viruses. Signed error is defined as: (AĜS - AGS)/AGS. (C) AGS estimates from (B) were used to estimate the total coverage of microbial genomes present in the simulated metagenomes. Estimated coverage of microbial genomes was obtained by dividing the number of total base pairs in a metagenome by the estimated AGS for that metagenome.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4389708&req=5

Fig4: Estimation accuracy in the presence of microbial eukaryotes and viruses. (A) MicrobeCensus was used to estimate the AGS of 20 simulated 100-bp metagenomes that contained up to 50% of Fungi, representing up to 94% of total reads. Note that axes are plotted on a log scale. (B) MicrobeCensus was used to estimate the AGS of 20 simulated 100-bp metagenomes that contained up to 50% of reads from viruses. Signed error is defined as: (AĜS - AGS)/AGS. (C) AGS estimates from (B) were used to estimate the total coverage of microbial genomes present in the simulated metagenomes. Estimated coverage of microbial genomes was obtained by dividing the number of total base pairs in a metagenome by the estimated AGS for that metagenome.
Mentions: Next, we evaluated whether accurate estimates of AGS could be made in the presence of microbial eukaryotes. Fungi are typically minority members of human microbiome communities but can occasionally constitute a significant proportion of metagenomic libraries [1,20,33]. To address this, we simulated 20 communities in which 0 to 50% of genomes were Fungi and used MicrobeCensus to estimate AGS for these communities (Additional file 6; Materials and methods). Fungal genome sizes ranged from 2.5 to 66.3 Mb with an average of 20.4 Mb. We used Fungi as a proxy for microbial eukaryotes due to the availability of complete genome sequences and the presence of these taxa in the human microbiome [34,35]. Because microbial eukaryotes were not included in our database and were not used to train MicrobeCensus, we wondered if the presence of these taxa would lead to inaccurate estimates of AGS. Surprisingly, AGS estimates for most of these communities were quite accurate, although not as accurate as for the bacterial and archaeal communities (Figure 4A). Even when Fungi were at 50% relative abundance, representing, on average, 79% of total reads, the median unsigned error was only 5% for the 20 communities. Nonetheless, in the future, when more complete genome sequences of microbial eukaryotes are available, particularly for Protists, it should be possible to retrain MicrobeCensus to achieve optimal performance for these types of communities. We have included training code in our software package for such extensions.Figure 4

Bottom Line: We developed MicrobeCensus to rapidly and accurately estimate average genome size from shotgun metagenomic data and applied our tool to 1,352 human microbiome samples.We found that average genome size differs significantly within and between body sites and tracks with major functional and taxonomic differences.In the gut, average genome size is positively correlated with the abundance of Bacteroides and genes related to carbohydrate metabolism.

View Article: PubMed Central - PubMed

ABSTRACT
Average genome size is an important, yet often overlooked, property of microbial communities. We developed MicrobeCensus to rapidly and accurately estimate average genome size from shotgun metagenomic data and applied our tool to 1,352 human microbiome samples. We found that average genome size differs significantly within and between body sites and tracks with major functional and taxonomic differences. In the gut, average genome size is positively correlated with the abundance of Bacteroides and genes related to carbohydrate metabolism. Importantly, we found that average genome size variation can bias comparative analyses, and that normalization improves detection of differentially abundant genes.

Show MeSH
Related in: MedlinePlus