Limits...
Average genome size estimation improves comparative metagenomics and sheds light on the functional ecology of the human microbiome.

Nayfach S, Pollard KS - Genome Biol. (2015)

Bottom Line: We developed MicrobeCensus to rapidly and accurately estimate average genome size from shotgun metagenomic data and applied our tool to 1,352 human microbiome samples.We found that average genome size differs significantly within and between body sites and tracks with major functional and taxonomic differences.In the gut, average genome size is positively correlated with the abundance of Bacteroides and genes related to carbohydrate metabolism.

View Article: PubMed Central - PubMed

ABSTRACT
Average genome size is an important, yet often overlooked, property of microbial communities. We developed MicrobeCensus to rapidly and accurately estimate average genome size from shotgun metagenomic data and applied our tool to 1,352 human microbiome samples. We found that average genome size differs significantly within and between body sites and tracks with major functional and taxonomic differences. In the gut, average genome size is positively correlated with the abundance of Bacteroides and genes related to carbohydrate metabolism. Importantly, we found that average genome size variation can bias comparative analyses, and that normalization improves detection of differentially abundant genes.

Show MeSH

Related in: MedlinePlus

Comparison of MicrobeCensus to existing methods. (A,B) Performance of MicrobeCensus was compared with that of existing methods using 20 simulated metagenomes. Unsigned error is defined as: /AĜS - AGS//AGS. (A) MicrobeCensus versus GAAS at different levels of taxonomic exclusion. To simulate the presence of novel taxa, we held back reference sequences belonging to organisms from the same taxonomic group as organisms in the metagenome, which is indicated on the x-axis. 'None' indicates that no reference sequences were held back. Metagenomes were composed of 100-bp reads. (B) Estimation error for MicrobeCensus versus the method described by Raes et al. [9] for metagenomes of various read length. 'NA' indicates that AGS could not be estimated. (C) Speed (reads/second) of MicrobeCensus compared with existing methods on a simulated 150-bp library.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4389708&req=5

Fig2: Comparison of MicrobeCensus to existing methods. (A,B) Performance of MicrobeCensus was compared with that of existing methods using 20 simulated metagenomes. Unsigned error is defined as: /AĜS - AGS//AGS. (A) MicrobeCensus versus GAAS at different levels of taxonomic exclusion. To simulate the presence of novel taxa, we held back reference sequences belonging to organisms from the same taxonomic group as organisms in the metagenome, which is indicated on the x-axis. 'None' indicates that no reference sequences were held back. Metagenomes were composed of 100-bp reads. (B) Estimation error for MicrobeCensus versus the method described by Raes et al. [9] for metagenomes of various read length. 'NA' indicates that AGS could not be estimated. (C) Speed (reads/second) of MicrobeCensus compared with existing methods on a simulated 150-bp library.

Mentions: We quantified AGS estimation accuracy using the median unsigned error, which summarizes absolute errors to account for both over- and under-estimation. When we did not exclude any reference sequences, both GAAS and MicrobeCensus performed well for the 20 datasets (labeled 'none' in Figure 2A), indicating that both methods can accurately estimate AGS for metagenomes composed of taxa that are represented in the reference database. However, when species from the metagenome were excluded from the reference database, the median unsigned error for GAAS increased to 13.5%, while error for MicrobeCensus remained 2% and did not rise above 3% even at higher levels of taxonomic exclusion, confirming our hypothesis that MicrobeCensus would be robust to the presence of novel taxa. Using MicrobeCensus, we were able to obtain reasonable estimates of AGS even for metagenomes composed of taxa with no representatives in the reference database at the phylum level (8.6% median unsigned error), while error for GAAS was over 20%.Figure 2


Average genome size estimation improves comparative metagenomics and sheds light on the functional ecology of the human microbiome.

Nayfach S, Pollard KS - Genome Biol. (2015)

Comparison of MicrobeCensus to existing methods. (A,B) Performance of MicrobeCensus was compared with that of existing methods using 20 simulated metagenomes. Unsigned error is defined as: /AĜS - AGS//AGS. (A) MicrobeCensus versus GAAS at different levels of taxonomic exclusion. To simulate the presence of novel taxa, we held back reference sequences belonging to organisms from the same taxonomic group as organisms in the metagenome, which is indicated on the x-axis. 'None' indicates that no reference sequences were held back. Metagenomes were composed of 100-bp reads. (B) Estimation error for MicrobeCensus versus the method described by Raes et al. [9] for metagenomes of various read length. 'NA' indicates that AGS could not be estimated. (C) Speed (reads/second) of MicrobeCensus compared with existing methods on a simulated 150-bp library.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4389708&req=5

Fig2: Comparison of MicrobeCensus to existing methods. (A,B) Performance of MicrobeCensus was compared with that of existing methods using 20 simulated metagenomes. Unsigned error is defined as: /AĜS - AGS//AGS. (A) MicrobeCensus versus GAAS at different levels of taxonomic exclusion. To simulate the presence of novel taxa, we held back reference sequences belonging to organisms from the same taxonomic group as organisms in the metagenome, which is indicated on the x-axis. 'None' indicates that no reference sequences were held back. Metagenomes were composed of 100-bp reads. (B) Estimation error for MicrobeCensus versus the method described by Raes et al. [9] for metagenomes of various read length. 'NA' indicates that AGS could not be estimated. (C) Speed (reads/second) of MicrobeCensus compared with existing methods on a simulated 150-bp library.
Mentions: We quantified AGS estimation accuracy using the median unsigned error, which summarizes absolute errors to account for both over- and under-estimation. When we did not exclude any reference sequences, both GAAS and MicrobeCensus performed well for the 20 datasets (labeled 'none' in Figure 2A), indicating that both methods can accurately estimate AGS for metagenomes composed of taxa that are represented in the reference database. However, when species from the metagenome were excluded from the reference database, the median unsigned error for GAAS increased to 13.5%, while error for MicrobeCensus remained 2% and did not rise above 3% even at higher levels of taxonomic exclusion, confirming our hypothesis that MicrobeCensus would be robust to the presence of novel taxa. Using MicrobeCensus, we were able to obtain reasonable estimates of AGS even for metagenomes composed of taxa with no representatives in the reference database at the phylum level (8.6% median unsigned error), while error for GAAS was over 20%.Figure 2

Bottom Line: We developed MicrobeCensus to rapidly and accurately estimate average genome size from shotgun metagenomic data and applied our tool to 1,352 human microbiome samples.We found that average genome size differs significantly within and between body sites and tracks with major functional and taxonomic differences.In the gut, average genome size is positively correlated with the abundance of Bacteroides and genes related to carbohydrate metabolism.

View Article: PubMed Central - PubMed

ABSTRACT
Average genome size is an important, yet often overlooked, property of microbial communities. We developed MicrobeCensus to rapidly and accurately estimate average genome size from shotgun metagenomic data and applied our tool to 1,352 human microbiome samples. We found that average genome size differs significantly within and between body sites and tracks with major functional and taxonomic differences. In the gut, average genome size is positively correlated with the abundance of Bacteroides and genes related to carbohydrate metabolism. Importantly, we found that average genome size variation can bias comparative analyses, and that normalization improves detection of differentially abundant genes.

Show MeSH
Related in: MedlinePlus