An ensemble strategy that significantly improves de novo assembly of microbial genomes from metagenomic next-generation sequencing data.
Bottom Line: Such recognition of highly divergent homologues can be improved by reference-free (de novo) assembly of short overlapping sequence reads into larger contigs.We also proposed new quality metrics that are suitable for evaluating metagenome de novo assembly.We demonstrate that this new ensemble strategy tested using in silico spike-in, clinical and environmental NGS datasets achieved significantly better contigs than current approaches.
Affiliation: Blood Systems Research Institute, San Francisco, CA 94118, USA Department of Laboratory Medicine, University of California at San Francisco, San Francisco, CA 94107, USA firstname.lastname@example.org.Show MeSH
Related in: MedlinePlus
Mentions: Most DBG assemblers require that a k-mer size be provided as a configurable parameter. As the choice of an optimal k-mer value is not clear with metagenome assembly, we tested S and A using the ‘in silico-virus spiked’ datasets at increasing k-mer values of 31, 41, 51 and 61 (V does not support k-mer values >31) (Figure 2A). K-mer values ranging from 31 to 61 have previously been shown to be useful for DBG assemblers, whereas shorter k-mer values below 31 seem to generate shorter contigs (35). Using the ‘in silico-virus spiked’ dataset, A performed better than S or V. For the S or A algorithms, no significant differences were observed by varying the k-mer values from 31 to 61 ( P > 0.05, Kraskal–Wallis test). Since k-mer values must be smaller than the read length, we chose k = 31 as providing the greatest flexibility in analysis of very short reads and keeping the parameter constant for comparative benchmarking of the S, A and V algorithms. It should be noted that the choice of optimal k-mer depends on the data being applied. Here we use k = 31 for this study, but it may not be optimal on other datasets.
Affiliation: Blood Systems Research Institute, San Francisco, CA 94118, USA Department of Laboratory Medicine, University of California at San Francisco, San Francisco, CA 94107, USA email@example.com.