Limits...
A statistical toolbox for metagenomics: assessing functional diversity in microbial communities.

Schloss PD, Handelsman J - BMC Bioinformatics (2008)

Bottom Line: More robust statistical methods are needed to make inferences from metagenomic data.In this study, we developed and applied a suite of tools to describe and compare the richness, membership, and structure of microbial communities using peptide fragment sequences extracted from metagenomic sequence data.These tools will facilitate the use of metagenomics to pursue statistically sound genome-based ecological analyses.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Microbiology, University of Massachusetts - Amherst, Amherst, MA 01003, USA. pschloss@microbio.umass.edu

ABSTRACT

Background: The 99% of bacteria in the environment that are recalcitrant to culturing have spurred the development of metagenomics, a culture-independent approach to sample and characterize microbial genomes. Massive datasets of metagenomic sequences have been accumulated, but analysis of these sequences has focused primarily on the descriptive comparison of the relative abundance of proteins that belong to specific functional categories. More robust statistical methods are needed to make inferences from metagenomic data. In this study, we developed and applied a suite of tools to describe and compare the richness, membership, and structure of microbial communities using peptide fragment sequences extracted from metagenomic sequence data.

Results: Application of these tools to acid mine drainage, soil, and whale fall metagenomic sequence collections revealed groups of peptide fragments with a relatively high abundance and no known function. When combined with analysis of 16S rRNA gene fragments from the same communities these tools enabled us to demonstrate that although there was no overlap in the types of 16S rRNA gene sequence observed, there was a core collection of operational protein families that was shared among the three environments.

Conclusion: The results of comparisons between the three habitats were surprising considering the relatively low overlap of membership and the distinctively different characteristics of the three habitats. These tools will facilitate the use of metagenomics to pursue statistically sound genome-based ecological analyses.

Show MeSH
Collector's curves for the OTU (A) and OPF (B) richness observed and estimated using DNA extracted from an AMD biofilm community.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2238731&req=5

Figure 2: Collector's curves for the OTU (A) and OPF (B) richness observed and estimated using DNA extracted from an AMD biofilm community.

Mentions: Tyson et al. [15] used metagenomic sequencing to analyze a biofilm growing in acid mine drainage (AMD) that had a pH below 1.0. They obtained 322 archaeal and bacterial 16S rRNA gene sequences and 103,462 random paired sequence reads, which represented 76.2 Gbp of DNA. We used DOTUR to assign 16S rRNA gene sequences to nine OTUs and predicted there were an additional three OTUs (95% confidence interval [95% CI] = 0 to 8) that were not observed (Fig. 2A). The most abundant OTU was similar to Leptospirillum ferriphilum (n = 247) 16S rRNA gene sequences.


A statistical toolbox for metagenomics: assessing functional diversity in microbial communities.

Schloss PD, Handelsman J - BMC Bioinformatics (2008)

Collector's curves for the OTU (A) and OPF (B) richness observed and estimated using DNA extracted from an AMD biofilm community.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2238731&req=5

Figure 2: Collector's curves for the OTU (A) and OPF (B) richness observed and estimated using DNA extracted from an AMD biofilm community.
Mentions: Tyson et al. [15] used metagenomic sequencing to analyze a biofilm growing in acid mine drainage (AMD) that had a pH below 1.0. They obtained 322 archaeal and bacterial 16S rRNA gene sequences and 103,462 random paired sequence reads, which represented 76.2 Gbp of DNA. We used DOTUR to assign 16S rRNA gene sequences to nine OTUs and predicted there were an additional three OTUs (95% confidence interval [95% CI] = 0 to 8) that were not observed (Fig. 2A). The most abundant OTU was similar to Leptospirillum ferriphilum (n = 247) 16S rRNA gene sequences.

Bottom Line: More robust statistical methods are needed to make inferences from metagenomic data.In this study, we developed and applied a suite of tools to describe and compare the richness, membership, and structure of microbial communities using peptide fragment sequences extracted from metagenomic sequence data.These tools will facilitate the use of metagenomics to pursue statistically sound genome-based ecological analyses.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Microbiology, University of Massachusetts - Amherst, Amherst, MA 01003, USA. pschloss@microbio.umass.edu

ABSTRACT

Background: The 99% of bacteria in the environment that are recalcitrant to culturing have spurred the development of metagenomics, a culture-independent approach to sample and characterize microbial genomes. Massive datasets of metagenomic sequences have been accumulated, but analysis of these sequences has focused primarily on the descriptive comparison of the relative abundance of proteins that belong to specific functional categories. More robust statistical methods are needed to make inferences from metagenomic data. In this study, we developed and applied a suite of tools to describe and compare the richness, membership, and structure of microbial communities using peptide fragment sequences extracted from metagenomic sequence data.

Results: Application of these tools to acid mine drainage, soil, and whale fall metagenomic sequence collections revealed groups of peptide fragments with a relatively high abundance and no known function. When combined with analysis of 16S rRNA gene fragments from the same communities these tools enabled us to demonstrate that although there was no overlap in the types of 16S rRNA gene sequence observed, there was a core collection of operational protein families that was shared among the three environments.

Conclusion: The results of comparisons between the three habitats were surprising considering the relatively low overlap of membership and the distinctively different characteristics of the three habitats. These tools will facilitate the use of metagenomics to pursue statistically sound genome-based ecological analyses.

Show MeSH