Limits...
Host Subtraction, Filtering and Assembly Validations for Novel Viral Discovery Using Next Generation Sequencing Data.

Daly GM, Leggett RM, Rowe W, Stubbs S, Wilkinson M, Ramirez-Gonzalez RH, Caccamo M, Bernal W, Heeney JL - PLoS ONE (2015)

Bottom Line: Comparison of assembly algorithms with pre-assembly host-mapping subtraction using a short-read mapping tool, a k-mer frequency based filter and a low complexity filter, has been validated for viral discovery with Illumina data derived from naturally infected liver tissue and simulated data.Assembled contig numbers were significantly reduced (up to 99.97%) by the application of these pre-assembly filtering methods.This approach provides a validated method for maximizing viral contig size as well as reducing the total number of assembled contigs that require down-stream analysis as putative viral nucleic acids.

View Article: PubMed Central - PubMed

Affiliation: Lab of Viral Zoonotics, Department of Veterinary Medicine, University of Cambridge, Madingley Road, Cambridge, CB30ES, United Kingdom.

ABSTRACT
The use of next generation sequencing (NGS) to identify novel viral sequences from eukaryotic tissue samples is challenging. Issues can include the low proportion and copy number of viral reads and the high number of contigs (post-assembly), making subsequent viral analysis difficult. Comparison of assembly algorithms with pre-assembly host-mapping subtraction using a short-read mapping tool, a k-mer frequency based filter and a low complexity filter, has been validated for viral discovery with Illumina data derived from naturally infected liver tissue and simulated data. Assembled contig numbers were significantly reduced (up to 99.97%) by the application of these pre-assembly filtering methods. This approach provides a validated method for maximizing viral contig size as well as reducing the total number of assembled contigs that require down-stream analysis as putative viral nucleic acids.

No MeSH data available.


Effect of k-mer filtering (K-mer)/ mapper subtraction (Map) on post-assembly contig number using multiple optimized assemblers with the HCV 9x mean coverage Illumina read dataset.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4476701&req=5

pone.0129059.g007: Effect of k-mer filtering (K-mer)/ mapper subtraction (Map) on post-assembly contig number using multiple optimized assemblers with the HCV 9x mean coverage Illumina read dataset.

Mentions: Our primary objective was not only to determine and validate the optimal assembler algorithm, assembly parameters and the effects of read filtering methods but to assess whether the bulk read subtraction processes would consequently reduce the number of contigs assembled thus resulting in fewer contigs to analyse for putative virus. Assembled contigs less than, or equal to, the largest trimmed Illumina sequence read were discarded. For all assemblers tested, k-mer filtering reduced the number of contigs assembled by 98.7–99.6% and host-mapping subtraction reduced the number of contigs assembled by 98.5–99.6% [Fig 7]. There was an additive effect when combining both the k-mer filtering and host-mapping subtraction that reduced the number of assembled contigs by 99.3–99.8%.


Host Subtraction, Filtering and Assembly Validations for Novel Viral Discovery Using Next Generation Sequencing Data.

Daly GM, Leggett RM, Rowe W, Stubbs S, Wilkinson M, Ramirez-Gonzalez RH, Caccamo M, Bernal W, Heeney JL - PLoS ONE (2015)

Effect of k-mer filtering (K-mer)/ mapper subtraction (Map) on post-assembly contig number using multiple optimized assemblers with the HCV 9x mean coverage Illumina read dataset.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4476701&req=5

pone.0129059.g007: Effect of k-mer filtering (K-mer)/ mapper subtraction (Map) on post-assembly contig number using multiple optimized assemblers with the HCV 9x mean coverage Illumina read dataset.
Mentions: Our primary objective was not only to determine and validate the optimal assembler algorithm, assembly parameters and the effects of read filtering methods but to assess whether the bulk read subtraction processes would consequently reduce the number of contigs assembled thus resulting in fewer contigs to analyse for putative virus. Assembled contigs less than, or equal to, the largest trimmed Illumina sequence read were discarded. For all assemblers tested, k-mer filtering reduced the number of contigs assembled by 98.7–99.6% and host-mapping subtraction reduced the number of contigs assembled by 98.5–99.6% [Fig 7]. There was an additive effect when combining both the k-mer filtering and host-mapping subtraction that reduced the number of assembled contigs by 99.3–99.8%.

Bottom Line: Comparison of assembly algorithms with pre-assembly host-mapping subtraction using a short-read mapping tool, a k-mer frequency based filter and a low complexity filter, has been validated for viral discovery with Illumina data derived from naturally infected liver tissue and simulated data.Assembled contig numbers were significantly reduced (up to 99.97%) by the application of these pre-assembly filtering methods.This approach provides a validated method for maximizing viral contig size as well as reducing the total number of assembled contigs that require down-stream analysis as putative viral nucleic acids.

View Article: PubMed Central - PubMed

Affiliation: Lab of Viral Zoonotics, Department of Veterinary Medicine, University of Cambridge, Madingley Road, Cambridge, CB30ES, United Kingdom.

ABSTRACT
The use of next generation sequencing (NGS) to identify novel viral sequences from eukaryotic tissue samples is challenging. Issues can include the low proportion and copy number of viral reads and the high number of contigs (post-assembly), making subsequent viral analysis difficult. Comparison of assembly algorithms with pre-assembly host-mapping subtraction using a short-read mapping tool, a k-mer frequency based filter and a low complexity filter, has been validated for viral discovery with Illumina data derived from naturally infected liver tissue and simulated data. Assembled contig numbers were significantly reduced (up to 99.97%) by the application of these pre-assembly filtering methods. This approach provides a validated method for maximizing viral contig size as well as reducing the total number of assembled contigs that require down-stream analysis as putative viral nucleic acids.

No MeSH data available.