Limits...
Xander: employing a novel method for efficient gene-targeted metagenomic assembly.

Wang Q, Fish JA, Gilman M, Sun Y, Brown CT, Tiedje JM, Cole JR - Microbiome (2015)

Bottom Line: However, assembling metagenomic datasets has proven to be computationally challenging.We compared our method to a recently published bulk metagenome assembly method and a recently published gene-targeted assembler and found our method produced more, longer, and higher quality gene sequences.HMMs used for assembly can be tailored to the targeted genes, allowing flexibility to improve annotation over generic annotation pipelines.

View Article: PubMed Central - PubMed

Affiliation: Center for Microbial Ecology, Michigan State University, East Lansing, MI USA.

ABSTRACT

Background: Metagenomics can provide important insight into microbial communities. However, assembling metagenomic datasets has proven to be computationally challenging. Current methods often assemble only fragmented partial genes.

Results: We present a novel method for targeting assembly of specific protein-coding genes. This method combines a de Bruijn graph, as used in standard assembly approaches, and a protein profile hidden Markov model (HMM) for the gene of interest, as used in standard annotation approaches. These are used to create a novel combined weighted assembly graph. Xander performs both assembly and annotation concomitantly using information incorporated in this graph. We demonstrate the utility of this approach by assembling contigs for one phylogenetic marker gene and for two functional marker genes, first on Human Microbiome Project (HMP)-defined community Illumina data and then on 21 rhizosphere soil metagenomic datasets from three different crops totaling over 800 Gbp of unassembled data. We compared our method to a recently published bulk metagenome assembly method and a recently published gene-targeted assembler and found our method produced more, longer, and higher quality gene sequences.

Conclusion: Xander combines gene assignment with the rapid assembly of full-length or near full-length functional genes from metagenomic data without requiring bulk assembly or post-processing to find genes of interest. HMMs used for assembly can be tailored to the targeted genes, allowing flexibility to improve annotation over generic annotation pipelines. This method is implemented as open source software and is available at https://github.com/rdpstaff/Xander_assembler.

No MeSH data available.


The Xander combined weighted assembly graph structure. M, I, and D represent HMM match, insert, and delete states, respectively. Numbers represent state position on the HMM. For simplicity, a kmer length of 6 is used and weights of the edges are not shown. The vertices shown in bold on the de Bruijn graph and profile hidden Markov model are combined to form the bold vertex in the combined graph. The green solid arrows represent all possible outgoing edges from these vertices. Boxes with ellipses indicate additional omitted graph structure. The delete HMM state is combined with the de Bruijn graph vertex from the last match; this carries forward the state information necessary to correctly form subsequent vertices in the combined graph. During path search, if this combined vertex becomes the best scoring vertex in the open set, it is removed from the open set and the adjacent combined vertices are instantiated and added to the open set
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4526283&req=5

Fig1: The Xander combined weighted assembly graph structure. M, I, and D represent HMM match, insert, and delete states, respectively. Numbers represent state position on the HMM. For simplicity, a kmer length of 6 is used and weights of the edges are not shown. The vertices shown in bold on the de Bruijn graph and profile hidden Markov model are combined to form the bold vertex in the combined graph. The green solid arrows represent all possible outgoing edges from these vertices. Boxes with ellipses indicate additional omitted graph structure. The delete HMM state is combined with the de Bruijn graph vertex from the last match; this carries forward the state information necessary to correctly form subsequent vertices in the combined graph. During path search, if this combined vertex becomes the best scoring vertex in the open set, it is removed from the open set and the adjacent combined vertices are instantiated and added to the open set

Mentions: Xander requires two sets of input sequences: a set of reference sequences of the targeted genes to build a protein profile HMM and one or more metagenomic read files to build a de Bruijn graph (DG). An HMM can be considered as a directed probabilistic graph with transition and emission probabilities between states. A novel graph structure was created to combine the DG and HMM together into a single combined weighted assembly graph (CAG) (Fig. 1).Fig. 1


Xander: employing a novel method for efficient gene-targeted metagenomic assembly.

Wang Q, Fish JA, Gilman M, Sun Y, Brown CT, Tiedje JM, Cole JR - Microbiome (2015)

The Xander combined weighted assembly graph structure. M, I, and D represent HMM match, insert, and delete states, respectively. Numbers represent state position on the HMM. For simplicity, a kmer length of 6 is used and weights of the edges are not shown. The vertices shown in bold on the de Bruijn graph and profile hidden Markov model are combined to form the bold vertex in the combined graph. The green solid arrows represent all possible outgoing edges from these vertices. Boxes with ellipses indicate additional omitted graph structure. The delete HMM state is combined with the de Bruijn graph vertex from the last match; this carries forward the state information necessary to correctly form subsequent vertices in the combined graph. During path search, if this combined vertex becomes the best scoring vertex in the open set, it is removed from the open set and the adjacent combined vertices are instantiated and added to the open set
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4526283&req=5

Fig1: The Xander combined weighted assembly graph structure. M, I, and D represent HMM match, insert, and delete states, respectively. Numbers represent state position on the HMM. For simplicity, a kmer length of 6 is used and weights of the edges are not shown. The vertices shown in bold on the de Bruijn graph and profile hidden Markov model are combined to form the bold vertex in the combined graph. The green solid arrows represent all possible outgoing edges from these vertices. Boxes with ellipses indicate additional omitted graph structure. The delete HMM state is combined with the de Bruijn graph vertex from the last match; this carries forward the state information necessary to correctly form subsequent vertices in the combined graph. During path search, if this combined vertex becomes the best scoring vertex in the open set, it is removed from the open set and the adjacent combined vertices are instantiated and added to the open set
Mentions: Xander requires two sets of input sequences: a set of reference sequences of the targeted genes to build a protein profile HMM and one or more metagenomic read files to build a de Bruijn graph (DG). An HMM can be considered as a directed probabilistic graph with transition and emission probabilities between states. A novel graph structure was created to combine the DG and HMM together into a single combined weighted assembly graph (CAG) (Fig. 1).Fig. 1

Bottom Line: However, assembling metagenomic datasets has proven to be computationally challenging.We compared our method to a recently published bulk metagenome assembly method and a recently published gene-targeted assembler and found our method produced more, longer, and higher quality gene sequences.HMMs used for assembly can be tailored to the targeted genes, allowing flexibility to improve annotation over generic annotation pipelines.

View Article: PubMed Central - PubMed

Affiliation: Center for Microbial Ecology, Michigan State University, East Lansing, MI USA.

ABSTRACT

Background: Metagenomics can provide important insight into microbial communities. However, assembling metagenomic datasets has proven to be computationally challenging. Current methods often assemble only fragmented partial genes.

Results: We present a novel method for targeting assembly of specific protein-coding genes. This method combines a de Bruijn graph, as used in standard assembly approaches, and a protein profile hidden Markov model (HMM) for the gene of interest, as used in standard annotation approaches. These are used to create a novel combined weighted assembly graph. Xander performs both assembly and annotation concomitantly using information incorporated in this graph. We demonstrate the utility of this approach by assembling contigs for one phylogenetic marker gene and for two functional marker genes, first on Human Microbiome Project (HMP)-defined community Illumina data and then on 21 rhizosphere soil metagenomic datasets from three different crops totaling over 800 Gbp of unassembled data. We compared our method to a recently published bulk metagenome assembly method and a recently published gene-targeted assembler and found our method produced more, longer, and higher quality gene sequences.

Conclusion: Xander combines gene assignment with the rapid assembly of full-length or near full-length functional genes from metagenomic data without requiring bulk assembly or post-processing to find genes of interest. HMMs used for assembly can be tailored to the targeted genes, allowing flexibility to improve annotation over generic annotation pipelines. This method is implemented as open source software and is available at https://github.com/rdpstaff/Xander_assembler.

No MeSH data available.