An ensemble strategy that significantly improves de novo assembly of microbial genomes from metagenomic next-generation sequencing data.
Bottom Line: Such recognition of highly divergent homologues can be improved by reference-free (de novo) assembly of short overlapping sequence reads into larger contigs.We also proposed new quality metrics that are suitable for evaluating metagenome de novo assembly.We demonstrate that this new ensemble strategy tested using in silico spike-in, clinical and environmental NGS datasets achieved significantly better contigs than current approaches.
Affiliation: Blood Systems Research Institute, San Francisco, CA 94118, USA Department of Laboratory Medicine, University of California at San Francisco, San Francisco, CA 94107, USA firstname.lastname@example.org.Show MeSH
Related in: MedlinePlus
Mentions: As a proof of concept that longer viral contigs can be better detected, sequences of various lengths (200, 500, 1000 and 2000 bp) were extracted from Virus RefSeq (Release 61) and mutated at various probabilities (0, 0.1, 0.2, 0.3, 0.4, 0.5) for each base. We then applied blastx and blastn with E-value 0.01 as cutoff on the simulated contigs against Virus RefSeq protein or nucleotide database. Figure 1A shows that longer contigs clearly have a better chance to be detected by blastx, which is especially true for highly divergent contigs. Figure 1B shows the same pattern using blastn. However, amino acid-based search shows better detection rate for highly divergent organisms than nucleotide-based homology search.
Affiliation: Blood Systems Research Institute, San Francisco, CA 94118, USA Department of Laboratory Medicine, University of California at San Francisco, San Francisco, CA 94107, USA email@example.com.