Limits...
RAMBO-K: Rapid and Sensitive Removal of Background Sequences from Next Generation Sequencing Data.

Tausch SH, Renard BY, Nitsche A, Dabrowski PW - PLoS ONE (2015)

Bottom Line: These reads increase the memory and CPU time usage of the assembler and can lead to misassemblies.RAMBO-K rapidly and reliably separates reads from different species without data preprocessing.It is suitable as a straightforward standard solution for workflows dealing with mixed datasets.

View Article: PubMed Central - PubMed

Affiliation: Centre for Biological Threats and Special Pathogens, Robert Koch Institute, 13353, Berlin, Germany.

ABSTRACT

Background: The assembly of viral or endosymbiont genomes from Next Generation Sequencing (NGS) data is often hampered by the predominant abundance of reads originating from the host organism. These reads increase the memory and CPU time usage of the assembler and can lead to misassemblies.

Results: We developed RAMBO-K (Read Assignment Method Based On K-mers), a tool which allows rapid and sensitive removal of unwanted host sequences from NGS datasets. Reaching a speed of 10 Megabases/s on 4 CPU cores and a standard hard drive, RAMBO-K is faster than any tool we tested, while showing a consistently high sensitivity and specificity across different datasets.

Conclusions: RAMBO-K rapidly and reliably separates reads from different species without data preprocessing. It is suitable as a straightforward standard solution for workflows dealing with mixed datasets. Binaries and source code (java and python) are available from http://sourceforge.net/projects/rambok/.

No MeSH data available.


Graphical representation of RAMBO-K’s workflow.Reads are simulated from the reference genomes and used to train a foreground and background Markov chain. The simulated sequences and a subset of the real reads are assigned based on these matrices and a preview of the results is presented to the user. If this preview proves satisfactory, the same parameters are used to assign all reads.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4574938&req=5

pone.0137896.g001: Graphical representation of RAMBO-K’s workflow.Reads are simulated from the reference genomes and used to train a foreground and background Markov chain. The simulated sequences and a subset of the real reads are assigned based on these matrices and a preview of the results is presented to the user. If this preview proves satisfactory, the same parameters are used to assign all reads.

Mentions: In order to separate reads, RAMBO-K uses a reference-driven approach. The user must provide FASTA files containing sequences related to both the foreground (usually the virus or endosymbiont of interest) and the background (usually the host organism). The reference sequences do not have to represent finished genomes; collections of contigs from a draft genome or lists of sequences from different related organisms can be provided if no exact reference is known. Based on these inputs, RAMBO-K performs the sorting of reads in three steps: (i) simulation of reads from reference sequences; (ii) calculation of two Markov chains, one for the foreground and one for the background, from the simulated reads; and (iii) classification of real reads based on their conformance with the Markov chains. This workflow is visualized in Fig 1.


RAMBO-K: Rapid and Sensitive Removal of Background Sequences from Next Generation Sequencing Data.

Tausch SH, Renard BY, Nitsche A, Dabrowski PW - PLoS ONE (2015)

Graphical representation of RAMBO-K’s workflow.Reads are simulated from the reference genomes and used to train a foreground and background Markov chain. The simulated sequences and a subset of the real reads are assigned based on these matrices and a preview of the results is presented to the user. If this preview proves satisfactory, the same parameters are used to assign all reads.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4574938&req=5

pone.0137896.g001: Graphical representation of RAMBO-K’s workflow.Reads are simulated from the reference genomes and used to train a foreground and background Markov chain. The simulated sequences and a subset of the real reads are assigned based on these matrices and a preview of the results is presented to the user. If this preview proves satisfactory, the same parameters are used to assign all reads.
Mentions: In order to separate reads, RAMBO-K uses a reference-driven approach. The user must provide FASTA files containing sequences related to both the foreground (usually the virus or endosymbiont of interest) and the background (usually the host organism). The reference sequences do not have to represent finished genomes; collections of contigs from a draft genome or lists of sequences from different related organisms can be provided if no exact reference is known. Based on these inputs, RAMBO-K performs the sorting of reads in three steps: (i) simulation of reads from reference sequences; (ii) calculation of two Markov chains, one for the foreground and one for the background, from the simulated reads; and (iii) classification of real reads based on their conformance with the Markov chains. This workflow is visualized in Fig 1.

Bottom Line: These reads increase the memory and CPU time usage of the assembler and can lead to misassemblies.RAMBO-K rapidly and reliably separates reads from different species without data preprocessing.It is suitable as a straightforward standard solution for workflows dealing with mixed datasets.

View Article: PubMed Central - PubMed

Affiliation: Centre for Biological Threats and Special Pathogens, Robert Koch Institute, 13353, Berlin, Germany.

ABSTRACT

Background: The assembly of viral or endosymbiont genomes from Next Generation Sequencing (NGS) data is often hampered by the predominant abundance of reads originating from the host organism. These reads increase the memory and CPU time usage of the assembler and can lead to misassemblies.

Results: We developed RAMBO-K (Read Assignment Method Based On K-mers), a tool which allows rapid and sensitive removal of unwanted host sequences from NGS datasets. Reaching a speed of 10 Megabases/s on 4 CPU cores and a standard hard drive, RAMBO-K is faster than any tool we tested, while showing a consistently high sensitivity and specificity across different datasets.

Conclusions: RAMBO-K rapidly and reliably separates reads from different species without data preprocessing. It is suitable as a straightforward standard solution for workflows dealing with mixed datasets. Binaries and source code (java and python) are available from http://sourceforge.net/projects/rambok/.

No MeSH data available.