Limits...
Host Subtraction, Filtering and Assembly Validations for Novel Viral Discovery Using Next Generation Sequencing Data.

Daly GM, Leggett RM, Rowe W, Stubbs S, Wilkinson M, Ramirez-Gonzalez RH, Caccamo M, Bernal W, Heeney JL - PLoS ONE (2015)

Bottom Line: Comparison of assembly algorithms with pre-assembly host-mapping subtraction using a short-read mapping tool, a k-mer frequency based filter and a low complexity filter, has been validated for viral discovery with Illumina data derived from naturally infected liver tissue and simulated data.Assembled contig numbers were significantly reduced (up to 99.97%) by the application of these pre-assembly filtering methods.This approach provides a validated method for maximizing viral contig size as well as reducing the total number of assembled contigs that require down-stream analysis as putative viral nucleic acids.

View Article: PubMed Central - PubMed

Affiliation: Lab of Viral Zoonotics, Department of Veterinary Medicine, University of Cambridge, Madingley Road, Cambridge, CB30ES, United Kingdom.

ABSTRACT
The use of next generation sequencing (NGS) to identify novel viral sequences from eukaryotic tissue samples is challenging. Issues can include the low proportion and copy number of viral reads and the high number of contigs (post-assembly), making subsequent viral analysis difficult. Comparison of assembly algorithms with pre-assembly host-mapping subtraction using a short-read mapping tool, a k-mer frequency based filter and a low complexity filter, has been validated for viral discovery with Illumina data derived from naturally infected liver tissue and simulated data. Assembled contig numbers were significantly reduced (up to 99.97%) by the application of these pre-assembly filtering methods. This approach provides a validated method for maximizing viral contig size as well as reducing the total number of assembled contigs that require down-stream analysis as putative viral nucleic acids.

No MeSH data available.


SURPI assembled contigs comparison: a) contig coverage of viral references (artificial metagenomics viral dataset) range and mean.SURPI SD = 28, Mapper SD = 15.5, MAP+k-mer SD = 25.9. b) HCV viral infected liver tissue NGS datasets at 9x and 0.7x coverage with Largest viral assembled contig (blue) and total viral reference coverage of all contigs (red).
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4476701&req=5

pone.0129059.g013: SURPI assembled contigs comparison: a) contig coverage of viral references (artificial metagenomics viral dataset) range and mean.SURPI SD = 28, Mapper SD = 15.5, MAP+k-mer SD = 25.9. b) HCV viral infected liver tissue NGS datasets at 9x and 0.7x coverage with Largest viral assembled contig (blue) and total viral reference coverage of all contigs (red).

Mentions: The SURPI pipeline uses ABySS + Minimo to assemble reads negatively selected by SNAP to pathogens together with viral SNAP aligned reads. The assembled contigs generated were aligned by us to the reference sequences to ascertain the largest contigs and the total reference coverage of all the assembled contigs and compared to contigs generated by our processes as described [Fig 13].


Host Subtraction, Filtering and Assembly Validations for Novel Viral Discovery Using Next Generation Sequencing Data.

Daly GM, Leggett RM, Rowe W, Stubbs S, Wilkinson M, Ramirez-Gonzalez RH, Caccamo M, Bernal W, Heeney JL - PLoS ONE (2015)

SURPI assembled contigs comparison: a) contig coverage of viral references (artificial metagenomics viral dataset) range and mean.SURPI SD = 28, Mapper SD = 15.5, MAP+k-mer SD = 25.9. b) HCV viral infected liver tissue NGS datasets at 9x and 0.7x coverage with Largest viral assembled contig (blue) and total viral reference coverage of all contigs (red).
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4476701&req=5

pone.0129059.g013: SURPI assembled contigs comparison: a) contig coverage of viral references (artificial metagenomics viral dataset) range and mean.SURPI SD = 28, Mapper SD = 15.5, MAP+k-mer SD = 25.9. b) HCV viral infected liver tissue NGS datasets at 9x and 0.7x coverage with Largest viral assembled contig (blue) and total viral reference coverage of all contigs (red).
Mentions: The SURPI pipeline uses ABySS + Minimo to assemble reads negatively selected by SNAP to pathogens together with viral SNAP aligned reads. The assembled contigs generated were aligned by us to the reference sequences to ascertain the largest contigs and the total reference coverage of all the assembled contigs and compared to contigs generated by our processes as described [Fig 13].

Bottom Line: Comparison of assembly algorithms with pre-assembly host-mapping subtraction using a short-read mapping tool, a k-mer frequency based filter and a low complexity filter, has been validated for viral discovery with Illumina data derived from naturally infected liver tissue and simulated data.Assembled contig numbers were significantly reduced (up to 99.97%) by the application of these pre-assembly filtering methods.This approach provides a validated method for maximizing viral contig size as well as reducing the total number of assembled contigs that require down-stream analysis as putative viral nucleic acids.

View Article: PubMed Central - PubMed

Affiliation: Lab of Viral Zoonotics, Department of Veterinary Medicine, University of Cambridge, Madingley Road, Cambridge, CB30ES, United Kingdom.

ABSTRACT
The use of next generation sequencing (NGS) to identify novel viral sequences from eukaryotic tissue samples is challenging. Issues can include the low proportion and copy number of viral reads and the high number of contigs (post-assembly), making subsequent viral analysis difficult. Comparison of assembly algorithms with pre-assembly host-mapping subtraction using a short-read mapping tool, a k-mer frequency based filter and a low complexity filter, has been validated for viral discovery with Illumina data derived from naturally infected liver tissue and simulated data. Assembled contig numbers were significantly reduced (up to 99.97%) by the application of these pre-assembly filtering methods. This approach provides a validated method for maximizing viral contig size as well as reducing the total number of assembled contigs that require down-stream analysis as putative viral nucleic acids.

No MeSH data available.