Limits...
KvarQ: targeted and direct variant calling from fastq reads of bacterial genomes.

Steiner A, Stucki D, Coscolla M, Borrell S, Gagneux S - BMC Genomics (2014)

Bottom Line: Instead, KvarQ loads "testsuites" that define specific SNPs or short regions of interest in a reference genome, and directly synthesizes the relevant results based on the occurrence of these markers in the fastq files.In this article, we demonstrate how KvarQ can be used to successfully detect all main drug resistance mutations and phylogenetic markers in 880 bacterial whole genome sequences.This enables researchers and laboratory technicians with limited bioinformatics expertise to scan and analyze raw sequencing data in a matter of minutes.

View Article: PubMed Central - PubMed

Affiliation: Swiss Tropical and Public Health Institute, Socinstrasse 57, Basel 4051, Switzerland. sebastien.gagneux@unibas.ch.

ABSTRACT

Background: High-throughput DNA sequencing produces vast amounts of data, with millions of short reads that usually have to be mapped to a reference genome or newly assembled. Both reference-based mapping and de novo assembly are computationally intensive, generating large intermediary data files, and thus require bioinformatics skills that are often lacking in the laboratories producing the data. Moreover, many research and practical applications in microbiology require only a small fraction of the whole genome data.

Results: We developed KvarQ, a new tool that directly scans fastq files of bacterial genome sequences for known variants, such as single nucleotide polymorphisms (SNP), bypassing the need of mapping all sequencing reads to a reference genome and de novo assembly. Instead, KvarQ loads "testsuites" that define specific SNPs or short regions of interest in a reference genome, and directly synthesizes the relevant results based on the occurrence of these markers in the fastq files. KvarQ has a versatile command line interface and a graphical user interface. KvarQ currently ships with two "testsuites" for Mycobacterium tuberculosis, but new "testsuites" for other organisms can easily be created and distributed. In this article, we demonstrate how KvarQ can be used to successfully detect all main drug resistance mutations and phylogenetic markers in 880 bacterial whole genome sequences. The average scanning time per genome sequence was two minutes. The variant calls of a subset of these genomes were validated with a standard bioinformatics pipeline and revealed >99% congruency.

Conclusion: KvarQ is a user-friendly tool that directly extracts relevant information from fastq files. This enables researchers and laboratory technicians with limited bioinformatics expertise to scan and analyze raw sequencing data in a matter of minutes. KvarQ is open-source, and pre-compiled packages with a graphical user interface are available at http://www.swisstph.ch/kvarq.

Show MeSH

Related in: MedlinePlus

Interactive inspection of json file. The upper pane of each window in these screenshots shows the main categories of data contained in the json file. In the left window, the drug resistance section is selected and the lower pane shows details about all target sequences in this testsuite (the “+” in front of the “Isoniazide resistance” indicates a non-synonymous mutation). In the right window, the phylogenetic section is selected, showing that all SNPs for “lineage 2” and “beijing sublineage” were found.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4197298&req=5

Fig3: Interactive inspection of json file. The upper pane of each window in these screenshots shows the main categories of data contained in the json file. In the left window, the drug resistance section is selected and the lower pane shows details about all target sequences in this testsuite (the “+” in front of the “Isoniazide resistance” indicates a non-synonymous mutation). In the right window, the phylogenetic section is selected, showing that all SNPs for “lineage 2” and “beijing sublineage” were found.

Mentions: To facilitate the extraction of relevant results, KvarQ provides data analysis tools to inspect the data interactively from the command line or with a hierarchical menu-driven graphical user interface. The “json explorer” shows the summarized test results for every testsuite (KvarQ’s main goal is to be user-friendly) as well as detailed information about the coverage of every target sequence that was used by the different testsuites (Figures 2 and3). The interactive exploration of this wealth of information is intended for the advanced operator to get, for example, a better impression of the usefulness of newly designed target sequences or the fastq quality. Additionally, the “json explorer” displays overall number of reads in a fastq file and the length of the reads.Figure 3


KvarQ: targeted and direct variant calling from fastq reads of bacterial genomes.

Steiner A, Stucki D, Coscolla M, Borrell S, Gagneux S - BMC Genomics (2014)

Interactive inspection of json file. The upper pane of each window in these screenshots shows the main categories of data contained in the json file. In the left window, the drug resistance section is selected and the lower pane shows details about all target sequences in this testsuite (the “+” in front of the “Isoniazide resistance” indicates a non-synonymous mutation). In the right window, the phylogenetic section is selected, showing that all SNPs for “lineage 2” and “beijing sublineage” were found.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4197298&req=5

Fig3: Interactive inspection of json file. The upper pane of each window in these screenshots shows the main categories of data contained in the json file. In the left window, the drug resistance section is selected and the lower pane shows details about all target sequences in this testsuite (the “+” in front of the “Isoniazide resistance” indicates a non-synonymous mutation). In the right window, the phylogenetic section is selected, showing that all SNPs for “lineage 2” and “beijing sublineage” were found.
Mentions: To facilitate the extraction of relevant results, KvarQ provides data analysis tools to inspect the data interactively from the command line or with a hierarchical menu-driven graphical user interface. The “json explorer” shows the summarized test results for every testsuite (KvarQ’s main goal is to be user-friendly) as well as detailed information about the coverage of every target sequence that was used by the different testsuites (Figures 2 and3). The interactive exploration of this wealth of information is intended for the advanced operator to get, for example, a better impression of the usefulness of newly designed target sequences or the fastq quality. Additionally, the “json explorer” displays overall number of reads in a fastq file and the length of the reads.Figure 3

Bottom Line: Instead, KvarQ loads "testsuites" that define specific SNPs or short regions of interest in a reference genome, and directly synthesizes the relevant results based on the occurrence of these markers in the fastq files.In this article, we demonstrate how KvarQ can be used to successfully detect all main drug resistance mutations and phylogenetic markers in 880 bacterial whole genome sequences.This enables researchers and laboratory technicians with limited bioinformatics expertise to scan and analyze raw sequencing data in a matter of minutes.

View Article: PubMed Central - PubMed

Affiliation: Swiss Tropical and Public Health Institute, Socinstrasse 57, Basel 4051, Switzerland. sebastien.gagneux@unibas.ch.

ABSTRACT

Background: High-throughput DNA sequencing produces vast amounts of data, with millions of short reads that usually have to be mapped to a reference genome or newly assembled. Both reference-based mapping and de novo assembly are computationally intensive, generating large intermediary data files, and thus require bioinformatics skills that are often lacking in the laboratories producing the data. Moreover, many research and practical applications in microbiology require only a small fraction of the whole genome data.

Results: We developed KvarQ, a new tool that directly scans fastq files of bacterial genome sequences for known variants, such as single nucleotide polymorphisms (SNP), bypassing the need of mapping all sequencing reads to a reference genome and de novo assembly. Instead, KvarQ loads "testsuites" that define specific SNPs or short regions of interest in a reference genome, and directly synthesizes the relevant results based on the occurrence of these markers in the fastq files. KvarQ has a versatile command line interface and a graphical user interface. KvarQ currently ships with two "testsuites" for Mycobacterium tuberculosis, but new "testsuites" for other organisms can easily be created and distributed. In this article, we demonstrate how KvarQ can be used to successfully detect all main drug resistance mutations and phylogenetic markers in 880 bacterial whole genome sequences. The average scanning time per genome sequence was two minutes. The variant calls of a subset of these genomes were validated with a standard bioinformatics pipeline and revealed >99% congruency.

Conclusion: KvarQ is a user-friendly tool that directly extracts relevant information from fastq files. This enables researchers and laboratory technicians with limited bioinformatics expertise to scan and analyze raw sequencing data in a matter of minutes. KvarQ is open-source, and pre-compiled packages with a graphical user interface are available at http://www.swisstph.ch/kvarq.

Show MeSH
Related in: MedlinePlus