Limits...
Xander: employing a novel method for efficient gene-targeted metagenomic assembly.

Wang Q, Fish JA, Gilman M, Sun Y, Brown CT, Tiedje JM, Cole JR - Microbiome (2015)

Bottom Line: However, assembling metagenomic datasets has proven to be computationally challenging.We compared our method to a recently published bulk metagenome assembly method and a recently published gene-targeted assembler and found our method produced more, longer, and higher quality gene sequences.HMMs used for assembly can be tailored to the targeted genes, allowing flexibility to improve annotation over generic annotation pipelines.

View Article: PubMed Central - PubMed

Affiliation: Center for Microbial Ecology, Michigan State University, East Lansing, MI USA.

ABSTRACT

Background: Metagenomics can provide important insight into microbial communities. However, assembling metagenomic datasets has proven to be computationally challenging. Current methods often assemble only fragmented partial genes.

Results: We present a novel method for targeting assembly of specific protein-coding genes. This method combines a de Bruijn graph, as used in standard assembly approaches, and a protein profile hidden Markov model (HMM) for the gene of interest, as used in standard annotation approaches. These are used to create a novel combined weighted assembly graph. Xander performs both assembly and annotation concomitantly using information incorporated in this graph. We demonstrate the utility of this approach by assembling contigs for one phylogenetic marker gene and for two functional marker genes, first on Human Microbiome Project (HMP)-defined community Illumina data and then on 21 rhizosphere soil metagenomic datasets from three different crops totaling over 800 Gbp of unassembled data. We compared our method to a recently published bulk metagenome assembly method and a recently published gene-targeted assembler and found our method produced more, longer, and higher quality gene sequences.

Conclusion: Xander combines gene assignment with the rapid assembly of full-length or near full-length functional genes from metagenomic data without requiring bulk assembly or post-processing to find genes of interest. HMMs used for assembly can be tailored to the targeted genes, allowing flexibility to improve annotation over generic annotation pipelines. This method is implemented as open source software and is available at https://github.com/rdpstaff/Xander_assembler.

No MeSH data available.


Kmer abundance of nitrite reductase gene (nirK) representative contigs assembled by Xander from the pooled rhizosphere samples. The representative contigs were chosen from clusters at 99 % aa identity. X-axis indicates the number of times (abundance) a kmer in the contigs occurred in the reads. Y-axis represents the fraction of total unique kmers with this abundance
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4526283&req=5

Fig3: Kmer abundance of nitrite reductase gene (nirK) representative contigs assembled by Xander from the pooled rhizosphere samples. The representative contigs were chosen from clusters at 99 % aa identity. X-axis indicates the number of times (abundance) a kmer in the contigs occurred in the reads. Y-axis represents the fraction of total unique kmers with this abundance

Mentions: We examined the kmer abundance and mean kmer coverage for each representative nirK and rplB contigs. More than half the kmers in the three samples occurred only once or twice. The corn sample had more high-coverage kmers than Miscanthus or switchgrass (Fig. 3, Additional file 1: Fig. S1). The corn sample also had more contigs with higher mean coverage than Miscanthus and switchgrass (Additional file 1: Fig. S2). Using the pooled samples, we estimated about 10 % of the organisms had nirK genes in these soil samples and only about 1 in 200 to 300 had nifH genes. These estimates were very similar between the three crops, and close to those obtained from one sample alone, but lower than those estimated by the bulk assembly (Table 3).Fig. 3


Xander: employing a novel method for efficient gene-targeted metagenomic assembly.

Wang Q, Fish JA, Gilman M, Sun Y, Brown CT, Tiedje JM, Cole JR - Microbiome (2015)

Kmer abundance of nitrite reductase gene (nirK) representative contigs assembled by Xander from the pooled rhizosphere samples. The representative contigs were chosen from clusters at 99 % aa identity. X-axis indicates the number of times (abundance) a kmer in the contigs occurred in the reads. Y-axis represents the fraction of total unique kmers with this abundance
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4526283&req=5

Fig3: Kmer abundance of nitrite reductase gene (nirK) representative contigs assembled by Xander from the pooled rhizosphere samples. The representative contigs were chosen from clusters at 99 % aa identity. X-axis indicates the number of times (abundance) a kmer in the contigs occurred in the reads. Y-axis represents the fraction of total unique kmers with this abundance
Mentions: We examined the kmer abundance and mean kmer coverage for each representative nirK and rplB contigs. More than half the kmers in the three samples occurred only once or twice. The corn sample had more high-coverage kmers than Miscanthus or switchgrass (Fig. 3, Additional file 1: Fig. S1). The corn sample also had more contigs with higher mean coverage than Miscanthus and switchgrass (Additional file 1: Fig. S2). Using the pooled samples, we estimated about 10 % of the organisms had nirK genes in these soil samples and only about 1 in 200 to 300 had nifH genes. These estimates were very similar between the three crops, and close to those obtained from one sample alone, but lower than those estimated by the bulk assembly (Table 3).Fig. 3

Bottom Line: However, assembling metagenomic datasets has proven to be computationally challenging.We compared our method to a recently published bulk metagenome assembly method and a recently published gene-targeted assembler and found our method produced more, longer, and higher quality gene sequences.HMMs used for assembly can be tailored to the targeted genes, allowing flexibility to improve annotation over generic annotation pipelines.

View Article: PubMed Central - PubMed

Affiliation: Center for Microbial Ecology, Michigan State University, East Lansing, MI USA.

ABSTRACT

Background: Metagenomics can provide important insight into microbial communities. However, assembling metagenomic datasets has proven to be computationally challenging. Current methods often assemble only fragmented partial genes.

Results: We present a novel method for targeting assembly of specific protein-coding genes. This method combines a de Bruijn graph, as used in standard assembly approaches, and a protein profile hidden Markov model (HMM) for the gene of interest, as used in standard annotation approaches. These are used to create a novel combined weighted assembly graph. Xander performs both assembly and annotation concomitantly using information incorporated in this graph. We demonstrate the utility of this approach by assembling contigs for one phylogenetic marker gene and for two functional marker genes, first on Human Microbiome Project (HMP)-defined community Illumina data and then on 21 rhizosphere soil metagenomic datasets from three different crops totaling over 800 Gbp of unassembled data. We compared our method to a recently published bulk metagenome assembly method and a recently published gene-targeted assembler and found our method produced more, longer, and higher quality gene sequences.

Conclusion: Xander combines gene assignment with the rapid assembly of full-length or near full-length functional genes from metagenomic data without requiring bulk assembly or post-processing to find genes of interest. HMMs used for assembly can be tailored to the targeted genes, allowing flexibility to improve annotation over generic annotation pipelines. This method is implemented as open source software and is available at https://github.com/rdpstaff/Xander_assembler.

No MeSH data available.