Limits...
MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm.

Wu YW, Tang YH, Tringe SG, Simmons BA, Singer SW - Microbiome (2014)

Bottom Line: The automatic binning software that we developed successfully classifies assembled sequences in metagenomic datasets into recovered individual genomes.The isolation of dozens of species in cellulolytic microbial consortia, including a novel species of myxobacteria that has the smallest genome among all sequenced aerobic myxobacteria, was easily achieved using the binning software.This work demonstrates that the processes required for recovering genomes from assembled metagenomic datasets can be readily automated, an important advance in understanding the metabolic potential of microbes in natural environments.

View Article: PubMed Central - HTML - PubMed

Affiliation: Joint BioEnergy Institute, Emeryville, CA 94608, USA ; Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.

ABSTRACT

Background: Recovering individual genomes from metagenomic datasets allows access to uncultivated microbial populations that may have important roles in natural and engineered ecosystems. Understanding the roles of these uncultivated populations has broad application in ecology, evolution, biotechnology and medicine. Accurate binning of assembled metagenomic sequences is an essential step in recovering the genomes and understanding microbial functions.

Results: We have developed a binning algorithm, MaxBin, which automates the binning of assembled metagenomic scaffolds using an expectation-maximization algorithm after the assembly of metagenomic sequencing reads. Binning of simulated metagenomic datasets demonstrated that MaxBin had high levels of accuracy in binning microbial genomes. MaxBin was used to recover genomes from metagenomic data obtained through the Human Microbiome Project, which demonstrated its ability to recover genomes from real metagenomic datasets with variable sequencing coverages. Application of MaxBin to metagenomes obtained from microbial consortia adapted to grow on cellulose allowed genomic analysis of new, uncultivated, cellulolytic bacterial populations, including an abundant myxobacterial population distantly related to Sorangium cellulosum that possessed a much smaller genome (5 MB versus 13 to 14 MB) but has a more extensive set of genes for biomass deconstruction. For the cellulolytic consortia, the MaxBin results were compared to binning using emergent self-organizing maps (ESOMs) and differential coverage binning, demonstrating that it performed comparably to these methods but had distinct advantages in automation, resolution of related genomes and sensitivity.

Conclusions: The automatic binning software that we developed successfully classifies assembled sequences in metagenomic datasets into recovered individual genomes. The isolation of dozens of species in cellulolytic microbial consortia, including a novel species of myxobacteria that has the smallest genome among all sequenced aerobic myxobacteria, was easily achieved using the binning software. This work demonstrates that the processes required for recovering genomes from assembled metagenomic datasets can be readily automated, an important advance in understanding the metabolic potential of microbes in natural environments. MaxBin is available at https://sourceforge.net/projects/maxbin/.

No MeSH data available.


Related in: MedlinePlus

Phylogenetic trees built for the species Sorangium sp. found in 37A and 37B. Arrowheads indicate the whereabouts of Sorangium sp. in the trees. (A) 16S ribosomal RNA gene tree. (B) Concatenated gene tree for 35 protein-coding marker genes.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4129434&req=5

Figure 5: Phylogenetic trees built for the species Sorangium sp. found in 37A and 37B. Arrowheads indicate the whereabouts of Sorangium sp. in the trees. (A) 16S ribosomal RNA gene tree. (B) Concatenated gene tree for 35 protein-coding marker genes.

Mentions: A complete 16S rRNA gene (90% identical to S. cellulosum) was recovered from the bin containing the Sorangium sp. genome and a phylogenetic tree was constructed to classify the bin (Figure 5(A)). Analysis of the phylogenetic tree demonstrated that the novel myxobacterial population was a Deltaproteobacterium in the Myxococcales order affiliated with the suborder Sorangiineae, but was distinct from the family Polyangiaceae, which contains the validated species Sorangium cellulosum, Byssovorax cruenta and Chondromyces apiculatus[46]. This new family in the Sorangiineae has no cultivated members and consists of 16S rRNA clones representing uncultivated species. The two 16S rRNA clones in this family that are most similar (99% identity) to that of Sorangium sp. were recovered separately from earthworm guts and large-discharge carbonate springs [Genbank: HM459718 and KC358117]. The phylogenetic classification of this bin was confirmed by construction of a concatenated gene tree from the genomic bin with 35 single-copy marker genes, which confirmed that it was distantly related to Sorangium cellulosum (Figure 5(B)). Surprisingly, the MaxBin binning results, supported by complementary binning by ESOM and differential coverage binning methods, demonstrated that the Sorangium sp. genome was approximately 5 MB, while the two sequenced strains of Sorangium cellulosum have genomes of 13.0 MB (strain So ce56) and 14.7 MB (strain So0157-2). Genomes of 11 myxobacterial genomes were compared, and 193 genes were identified as universally shared. For those 193 genes, 158 genes were found to be present in Sorangium sp., suggesting that despite its significantly smaller size, this genome still contains most of the common genes found in myxobacteria.


MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm.

Wu YW, Tang YH, Tringe SG, Simmons BA, Singer SW - Microbiome (2014)

Phylogenetic trees built for the species Sorangium sp. found in 37A and 37B. Arrowheads indicate the whereabouts of Sorangium sp. in the trees. (A) 16S ribosomal RNA gene tree. (B) Concatenated gene tree for 35 protein-coding marker genes.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4129434&req=5

Figure 5: Phylogenetic trees built for the species Sorangium sp. found in 37A and 37B. Arrowheads indicate the whereabouts of Sorangium sp. in the trees. (A) 16S ribosomal RNA gene tree. (B) Concatenated gene tree for 35 protein-coding marker genes.
Mentions: A complete 16S rRNA gene (90% identical to S. cellulosum) was recovered from the bin containing the Sorangium sp. genome and a phylogenetic tree was constructed to classify the bin (Figure 5(A)). Analysis of the phylogenetic tree demonstrated that the novel myxobacterial population was a Deltaproteobacterium in the Myxococcales order affiliated with the suborder Sorangiineae, but was distinct from the family Polyangiaceae, which contains the validated species Sorangium cellulosum, Byssovorax cruenta and Chondromyces apiculatus[46]. This new family in the Sorangiineae has no cultivated members and consists of 16S rRNA clones representing uncultivated species. The two 16S rRNA clones in this family that are most similar (99% identity) to that of Sorangium sp. were recovered separately from earthworm guts and large-discharge carbonate springs [Genbank: HM459718 and KC358117]. The phylogenetic classification of this bin was confirmed by construction of a concatenated gene tree from the genomic bin with 35 single-copy marker genes, which confirmed that it was distantly related to Sorangium cellulosum (Figure 5(B)). Surprisingly, the MaxBin binning results, supported by complementary binning by ESOM and differential coverage binning methods, demonstrated that the Sorangium sp. genome was approximately 5 MB, while the two sequenced strains of Sorangium cellulosum have genomes of 13.0 MB (strain So ce56) and 14.7 MB (strain So0157-2). Genomes of 11 myxobacterial genomes were compared, and 193 genes were identified as universally shared. For those 193 genes, 158 genes were found to be present in Sorangium sp., suggesting that despite its significantly smaller size, this genome still contains most of the common genes found in myxobacteria.

Bottom Line: The automatic binning software that we developed successfully classifies assembled sequences in metagenomic datasets into recovered individual genomes.The isolation of dozens of species in cellulolytic microbial consortia, including a novel species of myxobacteria that has the smallest genome among all sequenced aerobic myxobacteria, was easily achieved using the binning software.This work demonstrates that the processes required for recovering genomes from assembled metagenomic datasets can be readily automated, an important advance in understanding the metabolic potential of microbes in natural environments.

View Article: PubMed Central - HTML - PubMed

Affiliation: Joint BioEnergy Institute, Emeryville, CA 94608, USA ; Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.

ABSTRACT

Background: Recovering individual genomes from metagenomic datasets allows access to uncultivated microbial populations that may have important roles in natural and engineered ecosystems. Understanding the roles of these uncultivated populations has broad application in ecology, evolution, biotechnology and medicine. Accurate binning of assembled metagenomic sequences is an essential step in recovering the genomes and understanding microbial functions.

Results: We have developed a binning algorithm, MaxBin, which automates the binning of assembled metagenomic scaffolds using an expectation-maximization algorithm after the assembly of metagenomic sequencing reads. Binning of simulated metagenomic datasets demonstrated that MaxBin had high levels of accuracy in binning microbial genomes. MaxBin was used to recover genomes from metagenomic data obtained through the Human Microbiome Project, which demonstrated its ability to recover genomes from real metagenomic datasets with variable sequencing coverages. Application of MaxBin to metagenomes obtained from microbial consortia adapted to grow on cellulose allowed genomic analysis of new, uncultivated, cellulolytic bacterial populations, including an abundant myxobacterial population distantly related to Sorangium cellulosum that possessed a much smaller genome (5 MB versus 13 to 14 MB) but has a more extensive set of genes for biomass deconstruction. For the cellulolytic consortia, the MaxBin results were compared to binning using emergent self-organizing maps (ESOMs) and differential coverage binning, demonstrating that it performed comparably to these methods but had distinct advantages in automation, resolution of related genomes and sensitivity.

Conclusions: The automatic binning software that we developed successfully classifies assembled sequences in metagenomic datasets into recovered individual genomes. The isolation of dozens of species in cellulolytic microbial consortia, including a novel species of myxobacteria that has the smallest genome among all sequenced aerobic myxobacteria, was easily achieved using the binning software. This work demonstrates that the processes required for recovering genomes from assembled metagenomic datasets can be readily automated, an important advance in understanding the metabolic potential of microbes in natural environments. MaxBin is available at https://sourceforge.net/projects/maxbin/.

No MeSH data available.


Related in: MedlinePlus