Limits...
Fast and sensitive mapping of nanopore sequencing reads with GraphMap.

Sović I, Šikić M, Wilm A, Fenlon SN, Chen S, Nagarajan N - Nat Commun (2016)

Bottom Line: Realizing the democratic promise of nanopore sequencing requires the development of new bioinformatics approaches to deal with its specific error characteristics.Here we present GraphMap, a mapping algorithm designed to analyse nanopore sequencing reads, which progressively refines candidate alignments to robustly handle potentially high-error rates and a fast graph traversal to align long reads with speed and high precision (>95%).GraphMap alignments enabled single-nucleotide variant calling on the human genome with increased sensitivity (15%) over the next best mapper, precise detection of structural variants from length 100 bp to 4 kbp, and species and strain-specific identification of pathogens using MinION reads.

View Article: PubMed Central - PubMed

Affiliation: Computational &Systems Biology, Genome Institute of Singapore, 60 Biopolis Street, #02-01 Genome, Singapore 138672, Singapore.

ABSTRACT
Realizing the democratic promise of nanopore sequencing requires the development of new bioinformatics approaches to deal with its specific error characteristics. Here we present GraphMap, a mapping algorithm designed to analyse nanopore sequencing reads, which progressively refines candidate alignments to robustly handle potentially high-error rates and a fast graph traversal to align long reads with speed and high precision (>95%). Evaluation on MinION sequencing data sets against short- and long-read mappers indicates that GraphMap increases mapping sensitivity by 10-80% and maps >95% of bases. GraphMap alignments enabled single-nucleotide variant calling on the human genome with increased sensitivity (15%) over the next best mapper, precise detection of structural variants from length 100 bp to 4 kbp, and species and strain-specific identification of pathogens using MinION reads. GraphMap is available open source under the MIT license at https://github.com/isovic/graphmap.

No MeSH data available.


Related in: MedlinePlus

Sensitivity and mapping accuracy on nanopore sequencing data.(a) Visualization of GraphMap and LAST alignments for a lambda phage MinION sequencing data set12 (using integrative genomics viewer (IGV) (ref. 36)). Grey columns represent confident consensus calls while coloured columns indicate lower quality calls. (b) Mapped coverage of the lambda phage12 and the E. coli K-12 genome31 (R7.3 data) using MinION sequencing data and different mappers. (c) Consensus calling errors and uncalled bases using a MinION lambda phage data set12 and different mappers. (d) Consensus calling errors and uncalled bases using a MinION E. coli K-12 data set (R7.3) and different mappers.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4835549&req=5

f3: Sensitivity and mapping accuracy on nanopore sequencing data.(a) Visualization of GraphMap and LAST alignments for a lambda phage MinION sequencing data set12 (using integrative genomics viewer (IGV) (ref. 36)). Grey columns represent confident consensus calls while coloured columns indicate lower quality calls. (b) Mapped coverage of the lambda phage12 and the E. coli K-12 genome31 (R7.3 data) using MinION sequencing data and different mappers. (c) Consensus calling errors and uncalled bases using a MinION lambda phage data set12 and different mappers. (d) Consensus calling errors and uncalled bases using a MinION E. coli K-12 data set (R7.3) and different mappers.

Mentions: GraphMap was further benchmarked on several published ONT data sets against mappers and aligners that have previously been used for this task (LAST, BWA-MEM and BLASR; Methods section), as well as a highly sensitive overlapper for which we tuned settings (DALIGNER; Methods section). In the absence of ground truth for these data sets, mappers were compared on the total number of reads mapped (sensitivity), and their ability to provide accurate (to measure precision of mapping and alignment) as well as complete consensus sequences (as a measure of recall). Overall, as seen in the simulated data sets, LAST was the closest in terms of mapping sensitivity compared with GraphMap, though GraphMap showed notable improvements. The differences between GraphMap and LAST were apparent even when comparing their results visually, with LAST alignments having low consensus quality even in a high coverage setting (Fig. 3a). Across data sets, GraphMap mapped the most reads and aligned the most bases, improving sensitivity by 10–80% over LAST and even more compared with other tools (Fig. 3b; Supplementary Fig. 2; Supplementary Note 2). This led to fewer uncalled bases compared with LAST, BWA-MEM, BLASR, DALIGNER and marginAlign even in an otherwise high-coverage data set (Fig. 3c,d). In addition, GraphMap analysis resulted in >10-fold reduction in errors on the lambda phage and E. coli genome (Fig. 3c) and reported <40 errors on the E. coli genome compared with more than a 1,000 errors for LAST and BWA-MEM (Fig. 3d). With ∼80 × coverage of the E. coli genome, GraphMap mapped ∼90% of the reads and called consensus bases for the whole genome with <1 error in 100,000 bases (Q50 quality). The next best aligner, that is, LAST did not have sufficient coverage (20 ×) on >7,000 bases and reported consensus with a quality of ∼Q36. BWA-MEM aligned <60% of the reads and resulted in the calling of >200 deletion errors in the consensus genome. Similar results were replicated in other genomes and data sets as well (Supplementary Fig. 2).


Fast and sensitive mapping of nanopore sequencing reads with GraphMap.

Sović I, Šikić M, Wilm A, Fenlon SN, Chen S, Nagarajan N - Nat Commun (2016)

Sensitivity and mapping accuracy on nanopore sequencing data.(a) Visualization of GraphMap and LAST alignments for a lambda phage MinION sequencing data set12 (using integrative genomics viewer (IGV) (ref. 36)). Grey columns represent confident consensus calls while coloured columns indicate lower quality calls. (b) Mapped coverage of the lambda phage12 and the E. coli K-12 genome31 (R7.3 data) using MinION sequencing data and different mappers. (c) Consensus calling errors and uncalled bases using a MinION lambda phage data set12 and different mappers. (d) Consensus calling errors and uncalled bases using a MinION E. coli K-12 data set (R7.3) and different mappers.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4835549&req=5

f3: Sensitivity and mapping accuracy on nanopore sequencing data.(a) Visualization of GraphMap and LAST alignments for a lambda phage MinION sequencing data set12 (using integrative genomics viewer (IGV) (ref. 36)). Grey columns represent confident consensus calls while coloured columns indicate lower quality calls. (b) Mapped coverage of the lambda phage12 and the E. coli K-12 genome31 (R7.3 data) using MinION sequencing data and different mappers. (c) Consensus calling errors and uncalled bases using a MinION lambda phage data set12 and different mappers. (d) Consensus calling errors and uncalled bases using a MinION E. coli K-12 data set (R7.3) and different mappers.
Mentions: GraphMap was further benchmarked on several published ONT data sets against mappers and aligners that have previously been used for this task (LAST, BWA-MEM and BLASR; Methods section), as well as a highly sensitive overlapper for which we tuned settings (DALIGNER; Methods section). In the absence of ground truth for these data sets, mappers were compared on the total number of reads mapped (sensitivity), and their ability to provide accurate (to measure precision of mapping and alignment) as well as complete consensus sequences (as a measure of recall). Overall, as seen in the simulated data sets, LAST was the closest in terms of mapping sensitivity compared with GraphMap, though GraphMap showed notable improvements. The differences between GraphMap and LAST were apparent even when comparing their results visually, with LAST alignments having low consensus quality even in a high coverage setting (Fig. 3a). Across data sets, GraphMap mapped the most reads and aligned the most bases, improving sensitivity by 10–80% over LAST and even more compared with other tools (Fig. 3b; Supplementary Fig. 2; Supplementary Note 2). This led to fewer uncalled bases compared with LAST, BWA-MEM, BLASR, DALIGNER and marginAlign even in an otherwise high-coverage data set (Fig. 3c,d). In addition, GraphMap analysis resulted in >10-fold reduction in errors on the lambda phage and E. coli genome (Fig. 3c) and reported <40 errors on the E. coli genome compared with more than a 1,000 errors for LAST and BWA-MEM (Fig. 3d). With ∼80 × coverage of the E. coli genome, GraphMap mapped ∼90% of the reads and called consensus bases for the whole genome with <1 error in 100,000 bases (Q50 quality). The next best aligner, that is, LAST did not have sufficient coverage (20 ×) on >7,000 bases and reported consensus with a quality of ∼Q36. BWA-MEM aligned <60% of the reads and resulted in the calling of >200 deletion errors in the consensus genome. Similar results were replicated in other genomes and data sets as well (Supplementary Fig. 2).

Bottom Line: Realizing the democratic promise of nanopore sequencing requires the development of new bioinformatics approaches to deal with its specific error characteristics.Here we present GraphMap, a mapping algorithm designed to analyse nanopore sequencing reads, which progressively refines candidate alignments to robustly handle potentially high-error rates and a fast graph traversal to align long reads with speed and high precision (>95%).GraphMap alignments enabled single-nucleotide variant calling on the human genome with increased sensitivity (15%) over the next best mapper, precise detection of structural variants from length 100 bp to 4 kbp, and species and strain-specific identification of pathogens using MinION reads.

View Article: PubMed Central - PubMed

Affiliation: Computational &Systems Biology, Genome Institute of Singapore, 60 Biopolis Street, #02-01 Genome, Singapore 138672, Singapore.

ABSTRACT
Realizing the democratic promise of nanopore sequencing requires the development of new bioinformatics approaches to deal with its specific error characteristics. Here we present GraphMap, a mapping algorithm designed to analyse nanopore sequencing reads, which progressively refines candidate alignments to robustly handle potentially high-error rates and a fast graph traversal to align long reads with speed and high precision (>95%). Evaluation on MinION sequencing data sets against short- and long-read mappers indicates that GraphMap increases mapping sensitivity by 10-80% and maps >95% of bases. GraphMap alignments enabled single-nucleotide variant calling on the human genome with increased sensitivity (15%) over the next best mapper, precise detection of structural variants from length 100 bp to 4 kbp, and species and strain-specific identification of pathogens using MinION reads. GraphMap is available open source under the MIT license at https://github.com/isovic/graphmap.

No MeSH data available.


Related in: MedlinePlus