Limits...
MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities.

Kang DD, Froula J, Egan R, Wang Z - PeerJ (2015)

Bottom Line: In addition, most of the tools are not scalable to large datasets.Here we introduce automated software called MetaBAT that integrates empirical probabilistic distances of genome abundance and tetranucleotide frequency for accurate metagenome binning.It automatically forms hundreds of high quality genome bins on a very large assembly consisting millions of contigs in a matter of hours on a single node.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Energy Joint Genome Institute , Walnut Creek, CA , USA ; Genomics Division, Lawrence Berkeley National Laboratory , Berkeley, CA , USA.

ABSTRACT
Grouping large genomic fragments assembled from shotgun metagenomic sequences to deconvolute complex microbial communities, or metagenome binning, enables the study of individual organisms and their interactions. Because of the complex nature of these communities, existing metagenome binning methods often miss a large number of microbial species. In addition, most of the tools are not scalable to large datasets. Here we introduce automated software called MetaBAT that integrates empirical probabilistic distances of genome abundance and tetranucleotide frequency for accurate metagenome binning. MetaBAT outperforms alternative methods in accuracy and computational efficiency on both synthetic and real metagenome datasets. It automatically forms hundreds of high quality genome bins on a very large assembly consisting millions of contigs in a matter of hours on a single node. MetaBAT is open source software and available at https://bitbucket.org/berkeleylab/metabat.

No MeSH data available.


Overview of the MetaBAT pipeline.There are three preprocessing steps before MetaBAT is applied: (1) A typical metagenome experiment may contain many spatial or time-series samples, each consisting of many different genomes (different color circles). (2) Each sample is sequenced by next-generation sequencing technology to form a sequencing library with many short reads. (3) The libraries may be combined before de novo assembly. After assembly, the reads from each sample must be aligned in separate BAM files. MetaBAT then automatically performs the remaining steps: (4) For each contig pair, a tetranucleotide frequency distance probability (TDP) is calculated from a distribution modelled from 1,414 reference genomes. (5) For each contig pair, an abundance distance probability (ADP) across all the samples is calculated. (6) The TDP and ADP of each contig pair are then combined, and the resulting distance for all pairs form a distance matrix. (7) Each bin will be formed iteratively and exhaustively from the distance matrix.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4556158&req=5

fig-1: Overview of the MetaBAT pipeline.There are three preprocessing steps before MetaBAT is applied: (1) A typical metagenome experiment may contain many spatial or time-series samples, each consisting of many different genomes (different color circles). (2) Each sample is sequenced by next-generation sequencing technology to form a sequencing library with many short reads. (3) The libraries may be combined before de novo assembly. After assembly, the reads from each sample must be aligned in separate BAM files. MetaBAT then automatically performs the remaining steps: (4) For each contig pair, a tetranucleotide frequency distance probability (TDP) is calculated from a distribution modelled from 1,414 reference genomes. (5) For each contig pair, an abundance distance probability (ADP) across all the samples is calculated. (6) The TDP and ADP of each contig pair are then combined, and the resulting distance for all pairs form a distance matrix. (7) Each bin will be formed iteratively and exhaustively from the distance matrix.

Mentions: As a pre-requisite for binning, the user must create BAM files by aligning the reads of each sample separately to the assembled metagenome (Fig. 1 steps from 1 to 3). MetaBAT takes an assembly file (fasta format, required) and sorted bam files (one per sample, optional) as inputs. For each pair of contigs in a metagenome assembly, MetaBAT calculates their probabilistic distances based on tetranucleotide frequency (TNF) and abundance (i.e., mean base coverage), then the two distances are integrated into one composite distance. All the pairwise distances form a matrix, which then is supplied to a modified k-medoid clustering algorithm to bin contigs iteratively and exhaustively into genome bins (Fig. 1).


MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities.

Kang DD, Froula J, Egan R, Wang Z - PeerJ (2015)

Overview of the MetaBAT pipeline.There are three preprocessing steps before MetaBAT is applied: (1) A typical metagenome experiment may contain many spatial or time-series samples, each consisting of many different genomes (different color circles). (2) Each sample is sequenced by next-generation sequencing technology to form a sequencing library with many short reads. (3) The libraries may be combined before de novo assembly. After assembly, the reads from each sample must be aligned in separate BAM files. MetaBAT then automatically performs the remaining steps: (4) For each contig pair, a tetranucleotide frequency distance probability (TDP) is calculated from a distribution modelled from 1,414 reference genomes. (5) For each contig pair, an abundance distance probability (ADP) across all the samples is calculated. (6) The TDP and ADP of each contig pair are then combined, and the resulting distance for all pairs form a distance matrix. (7) Each bin will be formed iteratively and exhaustively from the distance matrix.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4556158&req=5

fig-1: Overview of the MetaBAT pipeline.There are three preprocessing steps before MetaBAT is applied: (1) A typical metagenome experiment may contain many spatial or time-series samples, each consisting of many different genomes (different color circles). (2) Each sample is sequenced by next-generation sequencing technology to form a sequencing library with many short reads. (3) The libraries may be combined before de novo assembly. After assembly, the reads from each sample must be aligned in separate BAM files. MetaBAT then automatically performs the remaining steps: (4) For each contig pair, a tetranucleotide frequency distance probability (TDP) is calculated from a distribution modelled from 1,414 reference genomes. (5) For each contig pair, an abundance distance probability (ADP) across all the samples is calculated. (6) The TDP and ADP of each contig pair are then combined, and the resulting distance for all pairs form a distance matrix. (7) Each bin will be formed iteratively and exhaustively from the distance matrix.
Mentions: As a pre-requisite for binning, the user must create BAM files by aligning the reads of each sample separately to the assembled metagenome (Fig. 1 steps from 1 to 3). MetaBAT takes an assembly file (fasta format, required) and sorted bam files (one per sample, optional) as inputs. For each pair of contigs in a metagenome assembly, MetaBAT calculates their probabilistic distances based on tetranucleotide frequency (TNF) and abundance (i.e., mean base coverage), then the two distances are integrated into one composite distance. All the pairwise distances form a matrix, which then is supplied to a modified k-medoid clustering algorithm to bin contigs iteratively and exhaustively into genome bins (Fig. 1).

Bottom Line: In addition, most of the tools are not scalable to large datasets.Here we introduce automated software called MetaBAT that integrates empirical probabilistic distances of genome abundance and tetranucleotide frequency for accurate metagenome binning.It automatically forms hundreds of high quality genome bins on a very large assembly consisting millions of contigs in a matter of hours on a single node.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Energy Joint Genome Institute , Walnut Creek, CA , USA ; Genomics Division, Lawrence Berkeley National Laboratory , Berkeley, CA , USA.

ABSTRACT
Grouping large genomic fragments assembled from shotgun metagenomic sequences to deconvolute complex microbial communities, or metagenome binning, enables the study of individual organisms and their interactions. Because of the complex nature of these communities, existing metagenome binning methods often miss a large number of microbial species. In addition, most of the tools are not scalable to large datasets. Here we introduce automated software called MetaBAT that integrates empirical probabilistic distances of genome abundance and tetranucleotide frequency for accurate metagenome binning. MetaBAT outperforms alternative methods in accuracy and computational efficiency on both synthetic and real metagenome datasets. It automatically forms hundreds of high quality genome bins on a very large assembly consisting millions of contigs in a matter of hours on a single node. MetaBAT is open source software and available at https://bitbucket.org/berkeleylab/metabat.

No MeSH data available.