MetaVelvetSL: an extension of the Velvet assembler to a de novo metagenomic assembler utilizing supervised learning.
Bottom Line:
We have tackled this problem of classifying chimeric nodes using supervised machine learning to significantly improve the performance of MetaVelvet and developed a new tool, called MetaVelvetSL.A Support Vector Machine is used for learning the classification model based on 94 features extracted from candidate nodes.In extensive experiments, MetaVelvetSL outperformed the original MetaVelvet and other stateoftheart metagenomic assemblers, IDBAUD, Ray Meta and Omega, to reconstruct accurate longer assemblies with higher N50 scores for both simulated data sets and real data sets of human gut microbial sequences.
View Article:
PubMed Central  PubMed
Affiliation: Department of Biosciences and Informatics, Keio University, 3141 Hiyoshi, Kohokuku, Yokohama 2238522, Japan.
Show MeSH

Related In:
Results 
Collection
License getmorefigures.php?uid=PMC4379979&req=5
Mentions: When the total scaffold lengths of two assemblies are quite different in the human gut microbial data sets, the naive use of N50 score is inadequate, because the longer total length decreases the N50 score. The generalized score Nlen(x) is more appropriate for comparing scaffold integrity than the raw N50 score.2 Nlen(x) is defined by(1)Nlen(x)=/Si/suchthat∑j=1i/Sj/≥xand∑j=1i−1/Sj/<x,where S1, S2, … , Sn denote the list of scaffolds in descending order of length as output by an assembler. The N50 score corresponds to the Nlen(x) score for x = L/2 (x is 50% of L), where L denotes the total scaffold length. The Nlen(x) plots for the MH0006 data sets produced by MetaVelvetSL, MetaVelvet, IDBAUD, SOAPdenovo2, Ray Meta and Omega are shown in Fig. 4. MetaVelvetSL significantly increased the scaffold integrity. For example, when x = 5,000,000, the Nlen(x) score of MetaVelvetSL was 306,496, the Nlen(x) score of MetaVelvet was 24,554, the Nlen(x) score of IDBAUD was 178,659, the Nlen(x) score of SOAPdenovo2 was 90,861, the Nlen(x) score of Ray Meta was 101,726 and the Nlen(x) score of Omega was 117,010. (The Nlen(x) plots for the MH0012, MH0047, SRS017227, and SRS018661 data sets are shown in Supplementary Figs S1–S4.) As in the MetaVelvet paper, we calculated the area under the curve (AUC) of Nlen(x) for 0 < x ≤ L in units of 1,000,000 bp; that is, the cumulative sum of Nlen(x) scores (0 < x ≤ L), where L denotes the total scaffold length.Figure 4. 
View Article: PubMed Central  PubMed
Affiliation: Department of Biosciences and Informatics, Keio University, 3141 Hiyoshi, Kohokuku, Yokohama 2238522, Japan.