Limits...
Fast and sensitive mapping of nanopore sequencing reads with GraphMap.

Sović I, Šikić M, Wilm A, Fenlon SN, Chen S, Nagarajan N - Nat Commun (2016)

Bottom Line: Realizing the democratic promise of nanopore sequencing requires the development of new bioinformatics approaches to deal with its specific error characteristics.Here we present GraphMap, a mapping algorithm designed to analyse nanopore sequencing reads, which progressively refines candidate alignments to robustly handle potentially high-error rates and a fast graph traversal to align long reads with speed and high precision (>95%).GraphMap alignments enabled single-nucleotide variant calling on the human genome with increased sensitivity (15%) over the next best mapper, precise detection of structural variants from length 100 bp to 4 kbp, and species and strain-specific identification of pathogens using MinION reads.

View Article: PubMed Central - PubMed

Affiliation: Computational &Systems Biology, Genome Institute of Singapore, 60 Biopolis Street, #02-01 Genome, Singapore 138672, Singapore.

ABSTRACT
Realizing the democratic promise of nanopore sequencing requires the development of new bioinformatics approaches to deal with its specific error characteristics. Here we present GraphMap, a mapping algorithm designed to analyse nanopore sequencing reads, which progressively refines candidate alignments to robustly handle potentially high-error rates and a fast graph traversal to align long reads with speed and high precision (>95%). Evaluation on MinION sequencing data sets against short- and long-read mappers indicates that GraphMap increases mapping sensitivity by 10-80% and maps >95% of bases. GraphMap alignments enabled single-nucleotide variant calling on the human genome with increased sensitivity (15%) over the next best mapper, precise detection of structural variants from length 100 bp to 4 kbp, and species and strain-specific identification of pathogens using MinION reads. GraphMap is available open source under the MIT license at https://github.com/isovic/graphmap.

No MeSH data available.


Related in: MedlinePlus

Variant calling and species identification using nanopore sequencing data and GraphMap.(a) An IGV view of GraphMap alignments that enabled the direct detection of a 200-bp deletion (delineated by red lines). (b) GraphMap alignments spanning a ∼4-kbp deletion (delineated by red lines). Number of reads mapping to various genomes in a database (sorted by GraphMap counts and showing top 10 genomes) using different mappers (GraphMap, BWA-MEM, LAST, DALIGNER and BLASR) and three MinION sequencing data sets for (c) E. coli K-12 (R7.3) (d) S. enterica Typhi and (e) E. coli UTI89. Note that GraphMap typically maps the most reads to the right reference genome (at the strain level) and the S. enterica Typhi data set is a mixture of sequencing data for two different strains for which we do not have reference genomes in the database. Results for marginAlign were nearly identical to that of LAST (within 1%) and have therefore been omitted.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4835549&req=5

f4: Variant calling and species identification using nanopore sequencing data and GraphMap.(a) An IGV view of GraphMap alignments that enabled the direct detection of a 200-bp deletion (delineated by red lines). (b) GraphMap alignments spanning a ∼4-kbp deletion (delineated by red lines). Number of reads mapping to various genomes in a database (sorted by GraphMap counts and showing top 10 genomes) using different mappers (GraphMap, BWA-MEM, LAST, DALIGNER and BLASR) and three MinION sequencing data sets for (c) E. coli K-12 (R7.3) (d) S. enterica Typhi and (e) E. coli UTI89. Note that GraphMap typically maps the most reads to the right reference genome (at the strain level) and the S. enterica Typhi data set is a mixture of sequencing data for two different strains for which we do not have reference genomes in the database. Results for marginAlign were nearly identical to that of LAST (within 1%) and have therefore been omitted.

Mentions: Long reads from the MinION sequencer are, in principle, ideal for the identification of large SVs in the genome22, but existing mappers have not been systematically evaluated for this application1. Read alignments produced by mappers are a critical input for SV callers. To compare the utility of various mappers, their ability to produce spanning alignments or split alignments indicative of a structural variation (insertions or deletions) was evaluated using real E. coli data mapped to a mutated reference (Methods section). As shown in Table 2, mappers showed variable performance in their ability to detect SVs through spanning alignments. In comparison, GraphMap's spanning alignments readily detected insertions and deletions over a range of event sizes (100 bp–4 kbp), providing perfect precision and a 35% improvement in recall over the next best mapper (BLASR; Table 2). LAST alignments were unable to detect any events under a range of parameter settings but post-processing with marginAlign improved recall slightly (5%; Table 2). BWA-MEM alignments natively provided 10% recall at 67% precision. Post-processing BWA-MEM alignments with LUMPY improved recall to 45%, using information from split reads to predict events. GraphMap produced spanning alignments natively that accurately demarcated the alignment event and did this without reporting any false positives (Fig. 4a,b and Table 2).


Fast and sensitive mapping of nanopore sequencing reads with GraphMap.

Sović I, Šikić M, Wilm A, Fenlon SN, Chen S, Nagarajan N - Nat Commun (2016)

Variant calling and species identification using nanopore sequencing data and GraphMap.(a) An IGV view of GraphMap alignments that enabled the direct detection of a 200-bp deletion (delineated by red lines). (b) GraphMap alignments spanning a ∼4-kbp deletion (delineated by red lines). Number of reads mapping to various genomes in a database (sorted by GraphMap counts and showing top 10 genomes) using different mappers (GraphMap, BWA-MEM, LAST, DALIGNER and BLASR) and three MinION sequencing data sets for (c) E. coli K-12 (R7.3) (d) S. enterica Typhi and (e) E. coli UTI89. Note that GraphMap typically maps the most reads to the right reference genome (at the strain level) and the S. enterica Typhi data set is a mixture of sequencing data for two different strains for which we do not have reference genomes in the database. Results for marginAlign were nearly identical to that of LAST (within 1%) and have therefore been omitted.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4835549&req=5

f4: Variant calling and species identification using nanopore sequencing data and GraphMap.(a) An IGV view of GraphMap alignments that enabled the direct detection of a 200-bp deletion (delineated by red lines). (b) GraphMap alignments spanning a ∼4-kbp deletion (delineated by red lines). Number of reads mapping to various genomes in a database (sorted by GraphMap counts and showing top 10 genomes) using different mappers (GraphMap, BWA-MEM, LAST, DALIGNER and BLASR) and three MinION sequencing data sets for (c) E. coli K-12 (R7.3) (d) S. enterica Typhi and (e) E. coli UTI89. Note that GraphMap typically maps the most reads to the right reference genome (at the strain level) and the S. enterica Typhi data set is a mixture of sequencing data for two different strains for which we do not have reference genomes in the database. Results for marginAlign were nearly identical to that of LAST (within 1%) and have therefore been omitted.
Mentions: Long reads from the MinION sequencer are, in principle, ideal for the identification of large SVs in the genome22, but existing mappers have not been systematically evaluated for this application1. Read alignments produced by mappers are a critical input for SV callers. To compare the utility of various mappers, their ability to produce spanning alignments or split alignments indicative of a structural variation (insertions or deletions) was evaluated using real E. coli data mapped to a mutated reference (Methods section). As shown in Table 2, mappers showed variable performance in their ability to detect SVs through spanning alignments. In comparison, GraphMap's spanning alignments readily detected insertions and deletions over a range of event sizes (100 bp–4 kbp), providing perfect precision and a 35% improvement in recall over the next best mapper (BLASR; Table 2). LAST alignments were unable to detect any events under a range of parameter settings but post-processing with marginAlign improved recall slightly (5%; Table 2). BWA-MEM alignments natively provided 10% recall at 67% precision. Post-processing BWA-MEM alignments with LUMPY improved recall to 45%, using information from split reads to predict events. GraphMap produced spanning alignments natively that accurately demarcated the alignment event and did this without reporting any false positives (Fig. 4a,b and Table 2).

Bottom Line: Realizing the democratic promise of nanopore sequencing requires the development of new bioinformatics approaches to deal with its specific error characteristics.Here we present GraphMap, a mapping algorithm designed to analyse nanopore sequencing reads, which progressively refines candidate alignments to robustly handle potentially high-error rates and a fast graph traversal to align long reads with speed and high precision (>95%).GraphMap alignments enabled single-nucleotide variant calling on the human genome with increased sensitivity (15%) over the next best mapper, precise detection of structural variants from length 100 bp to 4 kbp, and species and strain-specific identification of pathogens using MinION reads.

View Article: PubMed Central - PubMed

Affiliation: Computational &Systems Biology, Genome Institute of Singapore, 60 Biopolis Street, #02-01 Genome, Singapore 138672, Singapore.

ABSTRACT
Realizing the democratic promise of nanopore sequencing requires the development of new bioinformatics approaches to deal with its specific error characteristics. Here we present GraphMap, a mapping algorithm designed to analyse nanopore sequencing reads, which progressively refines candidate alignments to robustly handle potentially high-error rates and a fast graph traversal to align long reads with speed and high precision (>95%). Evaluation on MinION sequencing data sets against short- and long-read mappers indicates that GraphMap increases mapping sensitivity by 10-80% and maps >95% of bases. GraphMap alignments enabled single-nucleotide variant calling on the human genome with increased sensitivity (15%) over the next best mapper, precise detection of structural variants from length 100 bp to 4 kbp, and species and strain-specific identification of pathogens using MinION reads. GraphMap is available open source under the MIT license at https://github.com/isovic/graphmap.

No MeSH data available.


Related in: MedlinePlus