Average genome size estimation improves comparative metagenomics and sheds light on the functional ecology of the human microbiome.
Bottom Line: We developed MicrobeCensus to rapidly and accurately estimate average genome size from shotgun metagenomic data and applied our tool to 1,352 human microbiome samples.We found that average genome size differs significantly within and between body sites and tracks with major functional and taxonomic differences.In the gut, average genome size is positively correlated with the abundance of Bacteroides and genes related to carbohydrate metabolism.
Average genome size is an important, yet often overlooked, property of microbial communities. We developed MicrobeCensus to rapidly and accurately estimate average genome size from shotgun metagenomic data and applied our tool to 1,352 human microbiome samples. We found that average genome size differs significantly within and between body sites and tracks with major functional and taxonomic differences. In the gut, average genome size is positively correlated with the abundance of Bacteroides and genes related to carbohydrate metabolism. Importantly, we found that average genome size variation can bias comparative analyses, and that normalization improves detection of differentially abundant genes.
Related in: MedlinePlus
License 1 - License 2
Mentions: Next, we evaluated whether accurate estimates of AGS could be made in the presence of microbial eukaryotes. Fungi are typically minority members of human microbiome communities but can occasionally constitute a significant proportion of metagenomic libraries [1,20,33]. To address this, we simulated 20 communities in which 0 to 50% of genomes were Fungi and used MicrobeCensus to estimate AGS for these communities (Additional file 6; Materials and methods). Fungal genome sizes ranged from 2.5 to 66.3 Mb with an average of 20.4 Mb. We used Fungi as a proxy for microbial eukaryotes due to the availability of complete genome sequences and the presence of these taxa in the human microbiome [34,35]. Because microbial eukaryotes were not included in our database and were not used to train MicrobeCensus, we wondered if the presence of these taxa would lead to inaccurate estimates of AGS. Surprisingly, AGS estimates for most of these communities were quite accurate, although not as accurate as for the bacterial and archaeal communities (Figure 4A). Even when Fungi were at 50% relative abundance, representing, on average, 79% of total reads, the median unsigned error was only 5% for the 20 communities. Nonetheless, in the future, when more complete genome sequences of microbial eukaryotes are available, particularly for Protists, it should be possible to retrain MicrobeCensus to achieve optimal performance for these types of communities. We have included training code in our software package for such extensions.Figure 4