Limits...
Exploiting sparseness in de novo genome assembly.

Ye C, Ma ZS, Cannon CH, Pop M, Yu DW - BMC Bioinformatics (2012)

Bottom Line: The very large memory requirements for the construction of assembly graphs for de novo genome assembly limit current algorithms to super-computing environments.We implement this sparse graph concept in a proof-of-principle software package, SparseAssembler, utilizing a new sparse k-mer graph structure evolved from the de Bruijn graph.We test our SparseAssembler with both simulated and real data, achieving ~90% memory savings and retaining high assembly accuracy, without sacrificing speed in comparison to existing de novo assemblers.

View Article: PubMed Central - HTML - PubMed

Affiliation: Ecology & Evolution of Plant-Animal Interaction Group, Xishuangbanna Tropical Botanic Garden, Chinese Academy of Sciences, Menglun, Yunnan 666303 China. cxy@umd.edu

ABSTRACT

Background: The very large memory requirements for the construction of assembly graphs for de novo genome assembly limit current algorithms to super-computing environments.

Methods: In this paper, we demonstrate that constructing a sparse assembly graph which stores only a small fraction of the observed k-mers as nodes and the links between these nodes allows the de novo assembly of even moderately-sized genomes (~500 M) on a typical laptop computer.

Results: We implement this sparse graph concept in a proof-of-principle software package, SparseAssembler, utilizing a new sparse k-mer graph structure evolved from the de Bruijn graph. We test our SparseAssembler with both simulated and real data, achieving ~90% memory savings and retaining high assembly accuracy, without sacrificing speed in comparison to existing de novo assemblers.

Show MeSH
Breadth-first search bubble removal in the sparse k-mer graph. Removing unwanted structures in the sparse de Bruijn graph. (a) Before removal. (b) After removal.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3369186&req=5

Figure 3: Breadth-first search bubble removal in the sparse k-mer graph. Removing unwanted structures in the sparse de Bruijn graph. (a) Before removal. (b) After removal.

Mentions: Sequencing errors and polymorphisms can result in tips or bubbles [8] in the assembly graphs, irrespective of the underlying paradigm (de Bruijn, sparse k-mer, or overlap). To remove these unwanted structures, we first remove the low-coverage nodes and edges. After that, like in Velvet [8], we developed a Dijkstra-like breadth-first search algorithm to detect bubbles and tips, using the distance in bp to traverse the branches from near to far. The search backtracks to the last branching node upon reaching a visited node or a tip end. Upon a bubble we choose the higher-coverage branch and remove the weaker branch. Tips are directly removed. After this step, spurious paths and redundant structures like tiny loops and bubbles are removed (Figure 3).


Exploiting sparseness in de novo genome assembly.

Ye C, Ma ZS, Cannon CH, Pop M, Yu DW - BMC Bioinformatics (2012)

Breadth-first search bubble removal in the sparse k-mer graph. Removing unwanted structures in the sparse de Bruijn graph. (a) Before removal. (b) After removal.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3369186&req=5

Figure 3: Breadth-first search bubble removal in the sparse k-mer graph. Removing unwanted structures in the sparse de Bruijn graph. (a) Before removal. (b) After removal.
Mentions: Sequencing errors and polymorphisms can result in tips or bubbles [8] in the assembly graphs, irrespective of the underlying paradigm (de Bruijn, sparse k-mer, or overlap). To remove these unwanted structures, we first remove the low-coverage nodes and edges. After that, like in Velvet [8], we developed a Dijkstra-like breadth-first search algorithm to detect bubbles and tips, using the distance in bp to traverse the branches from near to far. The search backtracks to the last branching node upon reaching a visited node or a tip end. Upon a bubble we choose the higher-coverage branch and remove the weaker branch. Tips are directly removed. After this step, spurious paths and redundant structures like tiny loops and bubbles are removed (Figure 3).

Bottom Line: The very large memory requirements for the construction of assembly graphs for de novo genome assembly limit current algorithms to super-computing environments.We implement this sparse graph concept in a proof-of-principle software package, SparseAssembler, utilizing a new sparse k-mer graph structure evolved from the de Bruijn graph.We test our SparseAssembler with both simulated and real data, achieving ~90% memory savings and retaining high assembly accuracy, without sacrificing speed in comparison to existing de novo assemblers.

View Article: PubMed Central - HTML - PubMed

Affiliation: Ecology & Evolution of Plant-Animal Interaction Group, Xishuangbanna Tropical Botanic Garden, Chinese Academy of Sciences, Menglun, Yunnan 666303 China. cxy@umd.edu

ABSTRACT

Background: The very large memory requirements for the construction of assembly graphs for de novo genome assembly limit current algorithms to super-computing environments.

Methods: In this paper, we demonstrate that constructing a sparse assembly graph which stores only a small fraction of the observed k-mers as nodes and the links between these nodes allows the de novo assembly of even moderately-sized genomes (~500 M) on a typical laptop computer.

Results: We implement this sparse graph concept in a proof-of-principle software package, SparseAssembler, utilizing a new sparse k-mer graph structure evolved from the de Bruijn graph. We test our SparseAssembler with both simulated and real data, achieving ~90% memory savings and retaining high assembly accuracy, without sacrificing speed in comparison to existing de novo assemblers.

Show MeSH