Limits...
Fast and sensitive mapping of nanopore sequencing reads with GraphMap.

Sović I, Šikić M, Wilm A, Fenlon SN, Chen S, Nagarajan N - Nat Commun (2016)

Bottom Line: Realizing the democratic promise of nanopore sequencing requires the development of new bioinformatics approaches to deal with its specific error characteristics.Here we present GraphMap, a mapping algorithm designed to analyse nanopore sequencing reads, which progressively refines candidate alignments to robustly handle potentially high-error rates and a fast graph traversal to align long reads with speed and high precision (>95%).GraphMap alignments enabled single-nucleotide variant calling on the human genome with increased sensitivity (15%) over the next best mapper, precise detection of structural variants from length 100 bp to 4 kbp, and species and strain-specific identification of pathogens using MinION reads.

View Article: PubMed Central - PubMed

Affiliation: Computational &Systems Biology, Genome Institute of Singapore, 60 Biopolis Street, #02-01 Genome, Singapore 138672, Singapore.

ABSTRACT
Realizing the democratic promise of nanopore sequencing requires the development of new bioinformatics approaches to deal with its specific error characteristics. Here we present GraphMap, a mapping algorithm designed to analyse nanopore sequencing reads, which progressively refines candidate alignments to robustly handle potentially high-error rates and a fast graph traversal to align long reads with speed and high precision (>95%). Evaluation on MinION sequencing data sets against short- and long-read mappers indicates that GraphMap increases mapping sensitivity by 10-80% and maps >95% of bases. GraphMap alignments enabled single-nucleotide variant calling on the human genome with increased sensitivity (15%) over the next best mapper, precise detection of structural variants from length 100 bp to 4 kbp, and species and strain-specific identification of pathogens using MinION reads. GraphMap is available open source under the MIT license at https://github.com/isovic/graphmap.

No MeSH data available.


Related in: MedlinePlus

A schematic representation of stages in GraphMap.(a) Order of stages in the ‘read-funneling' approach used in GraphMap to refine alignments and reduce the number of candidate locations to one. (b) Structure of spaced seeds used for index construction and index look-up. For each position in the reference one seed is inserted into the index and for each position in the query, three seeds are looked up corresponding to the different error scenarios (c) Region selection by clustering of candidate seeds on the reference. Diagonals with sufficient number of seed hits are used to identify regions for further processing. (d) Generation of alignment anchors through a fast graph based ordering of seeds (Graph Mapping). After construction of the initial graph based on the reference, seeds from the query (2mers here; starting from the green seed) are looked up, and information in the graph propagated, to construct a maximal walk that serves as an anchor. (e) Filtering of seed matches using LCSk search and L1 regression. Anchors are chained into a monotonically increasing sequence, with outliers trimmed using L1 regression, to get an approximate alignment that helps select the correct mapping location.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4835549&req=5

f1: A schematic representation of stages in GraphMap.(a) Order of stages in the ‘read-funneling' approach used in GraphMap to refine alignments and reduce the number of candidate locations to one. (b) Structure of spaced seeds used for index construction and index look-up. For each position in the reference one seed is inserted into the index and for each position in the query, three seeds are looked up corresponding to the different error scenarios (c) Region selection by clustering of candidate seeds on the reference. Diagonals with sufficient number of seed hits are used to identify regions for further processing. (d) Generation of alignment anchors through a fast graph based ordering of seeds (Graph Mapping). After construction of the initial graph based on the reference, seeds from the query (2mers here; starting from the green seed) are looked up, and information in the graph propagated, to construct a maximal walk that serves as an anchor. (e) Filtering of seed matches using LCSk search and L1 regression. Anchors are chained into a monotonically increasing sequence, with outliers trimmed using L1 regression, to get an approximate alignment that helps select the correct mapping location.

Mentions: The GraphMap algorithm is structured to achieve high-sensitivity and speed using a five-stage ‘read-funneling' approach as depicted in Fig. 1a. The underlying design principle is to have efficiently computable stages that conservatively reduce the set of candidate locations based on progressively defined forms of the read-to-reference alignment. For example, in stage I, GraphMap uses a novel adaptation of gapped spaced seeds15 to efficiently reduce the search space (Fig. 1b) and then clusters seed hits as a form of coarse alignment (Fig. 1c). These are then refined in stage II using graph-based vertex-centric processing of seeds to efficiently (allowing seed-level parallelism) construct alignment anchors (Fig. 1d). GraphMap then chains anchors using a kmer version of longest common subsequence construction (stage III; Fig. 1e), refines alignments with a form of L1 linear regression (stage IV; Fig. 1e) and finally evaluates the remaining candidates to select the best location to construct a final alignment (stage V). GraphMap computes a BLAST-like E-value as well as a mapping quality for its alignments. Further details about each of these stages, the design choices and how they impact GraphMap's performance can be found in the Methods section.


Fast and sensitive mapping of nanopore sequencing reads with GraphMap.

Sović I, Šikić M, Wilm A, Fenlon SN, Chen S, Nagarajan N - Nat Commun (2016)

A schematic representation of stages in GraphMap.(a) Order of stages in the ‘read-funneling' approach used in GraphMap to refine alignments and reduce the number of candidate locations to one. (b) Structure of spaced seeds used for index construction and index look-up. For each position in the reference one seed is inserted into the index and for each position in the query, three seeds are looked up corresponding to the different error scenarios (c) Region selection by clustering of candidate seeds on the reference. Diagonals with sufficient number of seed hits are used to identify regions for further processing. (d) Generation of alignment anchors through a fast graph based ordering of seeds (Graph Mapping). After construction of the initial graph based on the reference, seeds from the query (2mers here; starting from the green seed) are looked up, and information in the graph propagated, to construct a maximal walk that serves as an anchor. (e) Filtering of seed matches using LCSk search and L1 regression. Anchors are chained into a monotonically increasing sequence, with outliers trimmed using L1 regression, to get an approximate alignment that helps select the correct mapping location.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4835549&req=5

f1: A schematic representation of stages in GraphMap.(a) Order of stages in the ‘read-funneling' approach used in GraphMap to refine alignments and reduce the number of candidate locations to one. (b) Structure of spaced seeds used for index construction and index look-up. For each position in the reference one seed is inserted into the index and for each position in the query, three seeds are looked up corresponding to the different error scenarios (c) Region selection by clustering of candidate seeds on the reference. Diagonals with sufficient number of seed hits are used to identify regions for further processing. (d) Generation of alignment anchors through a fast graph based ordering of seeds (Graph Mapping). After construction of the initial graph based on the reference, seeds from the query (2mers here; starting from the green seed) are looked up, and information in the graph propagated, to construct a maximal walk that serves as an anchor. (e) Filtering of seed matches using LCSk search and L1 regression. Anchors are chained into a monotonically increasing sequence, with outliers trimmed using L1 regression, to get an approximate alignment that helps select the correct mapping location.
Mentions: The GraphMap algorithm is structured to achieve high-sensitivity and speed using a five-stage ‘read-funneling' approach as depicted in Fig. 1a. The underlying design principle is to have efficiently computable stages that conservatively reduce the set of candidate locations based on progressively defined forms of the read-to-reference alignment. For example, in stage I, GraphMap uses a novel adaptation of gapped spaced seeds15 to efficiently reduce the search space (Fig. 1b) and then clusters seed hits as a form of coarse alignment (Fig. 1c). These are then refined in stage II using graph-based vertex-centric processing of seeds to efficiently (allowing seed-level parallelism) construct alignment anchors (Fig. 1d). GraphMap then chains anchors using a kmer version of longest common subsequence construction (stage III; Fig. 1e), refines alignments with a form of L1 linear regression (stage IV; Fig. 1e) and finally evaluates the remaining candidates to select the best location to construct a final alignment (stage V). GraphMap computes a BLAST-like E-value as well as a mapping quality for its alignments. Further details about each of these stages, the design choices and how they impact GraphMap's performance can be found in the Methods section.

Bottom Line: Realizing the democratic promise of nanopore sequencing requires the development of new bioinformatics approaches to deal with its specific error characteristics.Here we present GraphMap, a mapping algorithm designed to analyse nanopore sequencing reads, which progressively refines candidate alignments to robustly handle potentially high-error rates and a fast graph traversal to align long reads with speed and high precision (>95%).GraphMap alignments enabled single-nucleotide variant calling on the human genome with increased sensitivity (15%) over the next best mapper, precise detection of structural variants from length 100 bp to 4 kbp, and species and strain-specific identification of pathogens using MinION reads.

View Article: PubMed Central - PubMed

Affiliation: Computational &Systems Biology, Genome Institute of Singapore, 60 Biopolis Street, #02-01 Genome, Singapore 138672, Singapore.

ABSTRACT
Realizing the democratic promise of nanopore sequencing requires the development of new bioinformatics approaches to deal with its specific error characteristics. Here we present GraphMap, a mapping algorithm designed to analyse nanopore sequencing reads, which progressively refines candidate alignments to robustly handle potentially high-error rates and a fast graph traversal to align long reads with speed and high precision (>95%). Evaluation on MinION sequencing data sets against short- and long-read mappers indicates that GraphMap increases mapping sensitivity by 10-80% and maps >95% of bases. GraphMap alignments enabled single-nucleotide variant calling on the human genome with increased sensitivity (15%) over the next best mapper, precise detection of structural variants from length 100 bp to 4 kbp, and species and strain-specific identification of pathogens using MinION reads. GraphMap is available open source under the MIT license at https://github.com/isovic/graphmap.

No MeSH data available.


Related in: MedlinePlus