Open-i Logo
Submit this form Advanced Search
Query By Image

Internet Explorer requires you to use Upload Image button. Other browsers support the ability to drag and drop the image to anywhere in the browser window to perform an Image Search or use the Upload Image button.

Supported File Types are: .jpeg, .jpg, .gif and .png.
Results 1-1   << Back

 
Contig entropy at different taxonomic levels The entropy of contigs versus the contig length (in log scale) for the datasets (A) simLC. (B) simMC. (C) simHC.

Figure 3: Contig entropy at different taxonomic levels The entropy of contigs versus the contig length (in log scale) for the datasets (A) simLC. (B) simMC. (C) simHC.

Mentions: Figure 3 shows a plot of contig entropy versus length of the contigs across the datasets of different complexities. The entropy metric is computed at four levels: (i) sequence, (ii) species, (iii) genus and (iv) phylum, derived from the NCBI taxonomy tree. The simHC dataset produces a large number of smaller inhomogeneous contigs due to insufficient coverage of the source sequences. The proportion of inhomogeneous contigs is comparatively lower in the MC and significantly lower in LC datasets. The contigs were more homogeneous at higher phylogenetic levels. Because the genomes which are phylogenetically close together share significant sequence similarity, there is a greater chance of assembling reads belonging to related sequences into the same contig.

Evaluation of short read metagenomic assembly

Charuvaka A, Rangwala H - BMC Genomics (2011)

Bottom Line: We have also studied the effect of k-mer size used in de Bruijn graph on metagenomic assembly and developed a clustering solution to pool the contigs obtained from different assembly runs, which allowed us to obtain longer contigs.We have also assessed the degree of chimericity of the assembled contigs using an entropy/impurity metric and compared the metagenomic assemblies to assemblies of isolated individual source genomes.Our results show that accuracy of the assembled contigs was better than expected for the metagenomic samples with a few dominant organisms and was especially poor in samples containing many closely related strains.Clustering contigs from different k-mer parameter of the de Bruijn graph allowed us to obtain longer contigs, however the clustering resulted in accumulation of erroneous contigs thus increasing the error rate in clustered contigs.

Affiliation: Computer Science Department, George Mason University, Fairfax, Virginia, USA.

Abstract: Metagenomic assembly is a challenging problem due to the presence of genetic material from multiple organisms. The problem becomes even more difficult when short reads produced by next generation sequencing technologies are used. Although whole genome assemblers are not designed to assemble metagenomic samples, they are being used for metagenomics due to the lack of assemblers capable of dealing with metagenomic samples. We present an evaluation of assembly of simulated short-read metagenomic samples using a state-of-art de Bruijn graph based assembler.We assembled simulated metagenomic reads from datasets of various complexities using a state-of-art de Bruijn graph based parallel assembler. We have also studied the effect of k-mer size used in de Bruijn graph on metagenomic assembly and developed a clustering solution to pool the contigs obtained from different assembly runs, which allowed us to obtain longer contigs. We have also assessed the degree of chimericity of the assembled contigs using an entropy/impurity metric and compared the metagenomic assemblies to assemblies of isolated individual source genomes.Our results show that accuracy of the assembled contigs was better than expected for the metagenomic samples with a few dominant organisms and was especially poor in samples containing many closely related strains. Clustering contigs from different k-mer parameter of the de Bruijn graph allowed us to obtain longer contigs, however the clustering resulted in accumulation of erroneous contigs thus increasing the error rate in clustered contigs.

View Similar Images In: Results Collection              View Article: Medline Plus Pubmed Central HTML PubMed   Show All Figures
http://openi.nlm.nih.gov/iti/search?pmc=3194239&rFormat=json&query=the&fields=all&favor=none&it=none&sub=none&sp=none&req=5

Lister Hill National Center for Biomedical Communications
U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda, MD 20894
National Institutes of Health, Department of Health & Human Services
Privacy, Accessibility, Frequently Asked Questions, Contact Us, Collection
Freedom of Information Act, USA.gov