Limits...
Decoding long nanopore sequencing reads of natural DNA.

Laszlo AH, Derrington IM, Ross BC, Brinkerhoff H, Adey A, Nova IC, Craig JM, Langford KW, Samson JM, Daza R, Doering K, Shendure J, Gundlach JH - Nat. Biotechnol. (2014)

Bottom Line: As approximately four nucleotides affect the ion current of each level, we measured the ion current corresponding to all 256 four-nucleotide combinations (quadromers).This quadromer map is highly predictive of ion current levels of previously unmeasured sequences derived from the bacteriophage phi X 174 genome.This work provides a foundation for nanopore sequencing of long, natural DNA strands.

View Article: PubMed Central - PubMed

Affiliation: Department of Physics, University of Washington, Seattle, Washington, USA.

ABSTRACT
Nanopore sequencing of DNA is a single-molecule technique that may achieve long reads, low cost and high speed with minimal sample preparation and instrumentation. Here, we build on recent progress with respect to nanopore resolution and DNA control to interpret the procession of ion current levels observed during the translocation of DNA through the pore MspA. As approximately four nucleotides affect the ion current of each level, we measured the ion current corresponding to all 256 four-nucleotide combinations (quadromers). This quadromer map is highly predictive of ion current levels of previously unmeasured sequences derived from the bacteriophage phi X 174 genome. Furthermore, we show nanopore sequencing reads of phi X 174 up to 4,500 bases in length, which can be unambiguously aligned to the phi X 174 reference genome, and demonstrate proof-of-concept utility with respect to hybrid genome assembly and polymorphism detection. This work provides a foundation for nanopore sequencing of long, natural DNA strands.

Show MeSH

Related in: MedlinePlus

Raw data to alignment. (a) Raw data are processed using a level-finding algorithm (Supplementary Discussion) to identify transitions between levels in the current trace. A subsequent filter removes most repeated levels, which likely result from polymerase backsteps (indicated by `*'). (b) Extract the sequence of median current values of each level. (c) Align the current values to predicted values from the reference sequence using the quadromer map (Fig. 2a). Alignment is performed with a dynamic programming alignment algorithm similar to Needleman-Wunch alignment20 (Supplementary Discussion). In some locations, levels are skipped in the nanopore read either owing to motions of the DNAP or errors made by the level finding algorithm. In other places, backsteps result in multiple reads of the same level. We determine read boundaries from the first and last matched levels in the reference sequence. Read boundaries are indicated by the blue lines. The above alignment had an estimated 6.4× 10−15 probability of false alignment.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4126851&req=5

Figure 3: Raw data to alignment. (a) Raw data are processed using a level-finding algorithm (Supplementary Discussion) to identify transitions between levels in the current trace. A subsequent filter removes most repeated levels, which likely result from polymerase backsteps (indicated by `*'). (b) Extract the sequence of median current values of each level. (c) Align the current values to predicted values from the reference sequence using the quadromer map (Fig. 2a). Alignment is performed with a dynamic programming alignment algorithm similar to Needleman-Wunch alignment20 (Supplementary Discussion). In some locations, levels are skipped in the nanopore read either owing to motions of the DNAP or errors made by the level finding algorithm. In other places, backsteps result in multiple reads of the same level. We determine read boundaries from the first and last matched levels in the reference sequence. Read boundaries are indicated by the blue lines. The above alignment had an estimated 6.4× 10−15 probability of false alignment.

Mentions: The strong homology between quadromer-based current predictions and nanopore sequencing reads can be used to perform alignments to reference genomes and sequence databases with high confidence. As a first assessment, we subjected three PCR amplicons derived from phi X 174 to nanopore sequencing in a blinded fashion, i.e. the individuals performing sequencing and analysis were not aware of the genomic positions of the amplicons. After extracting the current levels from nanopore reads using a custom algorithm (Supplementary Discussion), we aligned the observed current levels from each read to predicted current levels obtained by applying the quadromer map to the known phi X 174 genome sequence (Fig. 3a,b). Our alignment algorithm is similar to Needleman-Wunsch alignment20, 21 but allows for backsteps in the series of levels (Fig. 3c and Supplementary Figs. 5 and 6). We assessed the confidence of these alignments by comparing alignment scores with those obtained against random sequences (Supplementary Fig. 7). The vast majority (30 out of 31) of nanopore sequencing reads with a probability of false alignment below 1 × 10−4 aligned to one of three regions; un-blinding confirmed that these corresponded to the locations along the phi X 174 genome from which the three PCR amplicons were derived (Supplementary Fig. 8).


Decoding long nanopore sequencing reads of natural DNA.

Laszlo AH, Derrington IM, Ross BC, Brinkerhoff H, Adey A, Nova IC, Craig JM, Langford KW, Samson JM, Daza R, Doering K, Shendure J, Gundlach JH - Nat. Biotechnol. (2014)

Raw data to alignment. (a) Raw data are processed using a level-finding algorithm (Supplementary Discussion) to identify transitions between levels in the current trace. A subsequent filter removes most repeated levels, which likely result from polymerase backsteps (indicated by `*'). (b) Extract the sequence of median current values of each level. (c) Align the current values to predicted values from the reference sequence using the quadromer map (Fig. 2a). Alignment is performed with a dynamic programming alignment algorithm similar to Needleman-Wunch alignment20 (Supplementary Discussion). In some locations, levels are skipped in the nanopore read either owing to motions of the DNAP or errors made by the level finding algorithm. In other places, backsteps result in multiple reads of the same level. We determine read boundaries from the first and last matched levels in the reference sequence. Read boundaries are indicated by the blue lines. The above alignment had an estimated 6.4× 10−15 probability of false alignment.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4126851&req=5

Figure 3: Raw data to alignment. (a) Raw data are processed using a level-finding algorithm (Supplementary Discussion) to identify transitions between levels in the current trace. A subsequent filter removes most repeated levels, which likely result from polymerase backsteps (indicated by `*'). (b) Extract the sequence of median current values of each level. (c) Align the current values to predicted values from the reference sequence using the quadromer map (Fig. 2a). Alignment is performed with a dynamic programming alignment algorithm similar to Needleman-Wunch alignment20 (Supplementary Discussion). In some locations, levels are skipped in the nanopore read either owing to motions of the DNAP or errors made by the level finding algorithm. In other places, backsteps result in multiple reads of the same level. We determine read boundaries from the first and last matched levels in the reference sequence. Read boundaries are indicated by the blue lines. The above alignment had an estimated 6.4× 10−15 probability of false alignment.
Mentions: The strong homology between quadromer-based current predictions and nanopore sequencing reads can be used to perform alignments to reference genomes and sequence databases with high confidence. As a first assessment, we subjected three PCR amplicons derived from phi X 174 to nanopore sequencing in a blinded fashion, i.e. the individuals performing sequencing and analysis were not aware of the genomic positions of the amplicons. After extracting the current levels from nanopore reads using a custom algorithm (Supplementary Discussion), we aligned the observed current levels from each read to predicted current levels obtained by applying the quadromer map to the known phi X 174 genome sequence (Fig. 3a,b). Our alignment algorithm is similar to Needleman-Wunsch alignment20, 21 but allows for backsteps in the series of levels (Fig. 3c and Supplementary Figs. 5 and 6). We assessed the confidence of these alignments by comparing alignment scores with those obtained against random sequences (Supplementary Fig. 7). The vast majority (30 out of 31) of nanopore sequencing reads with a probability of false alignment below 1 × 10−4 aligned to one of three regions; un-blinding confirmed that these corresponded to the locations along the phi X 174 genome from which the three PCR amplicons were derived (Supplementary Fig. 8).

Bottom Line: As approximately four nucleotides affect the ion current of each level, we measured the ion current corresponding to all 256 four-nucleotide combinations (quadromers).This quadromer map is highly predictive of ion current levels of previously unmeasured sequences derived from the bacteriophage phi X 174 genome.This work provides a foundation for nanopore sequencing of long, natural DNA strands.

View Article: PubMed Central - PubMed

Affiliation: Department of Physics, University of Washington, Seattle, Washington, USA.

ABSTRACT
Nanopore sequencing of DNA is a single-molecule technique that may achieve long reads, low cost and high speed with minimal sample preparation and instrumentation. Here, we build on recent progress with respect to nanopore resolution and DNA control to interpret the procession of ion current levels observed during the translocation of DNA through the pore MspA. As approximately four nucleotides affect the ion current of each level, we measured the ion current corresponding to all 256 four-nucleotide combinations (quadromers). This quadromer map is highly predictive of ion current levels of previously unmeasured sequences derived from the bacteriophage phi X 174 genome. Furthermore, we show nanopore sequencing reads of phi X 174 up to 4,500 bases in length, which can be unambiguously aligned to the phi X 174 reference genome, and demonstrate proof-of-concept utility with respect to hybrid genome assembly and polymorphism detection. This work provides a foundation for nanopore sequencing of long, natural DNA strands.

Show MeSH
Related in: MedlinePlus