Limits...
Insertion and deletion correcting DNA barcodes based on watermarks.

Kracht D, Schober S - BMC Bioinformatics (2015)

Bottom Line: Moreover, we provide an exemplary set of barcodes that are experimentally compatible with common next-generation sequencing platforms.Our adaption of watermark codes enables the construction of barcodes that are capable of correcting substitutions, insertion and deletion errors.The presented approach has the advantage of not needing any markers or technical sequences to recover the position of the barcode in the sequencing reads, which poses a significant restriction with other approaches.

View Article: PubMed Central - PubMed

Affiliation: Institute of Communications Engineering, Ulm University, Albert-Einstein-Allee 43, Ulm, 89081, Germany. david.kracht@uni-ulm.de.

ABSTRACT

Background: Barcode multiplexing is a key strategy for sharing the rising capacity of next-generation sequencing devices: Synthetic DNA tags, called barcodes, are attached to natural DNA fragments within the library preparation procedure. Different libraries, can individually be labeled with barcodes for a joint sequencing procedure. A post-processing step is needed to sort the sequencing data according to their origin, utilizing these DNA labels. The final separation step is called demultiplexing and is mainly determined by the characteristics of the DNA code words used as labels. Currently, we are facing two different strategies for barcoding: One is based on the Hamming distance, the other uses the edit metric to measure distances of code words. The theory of channel coding provides well-known code constructions for Hamming metric. They provide a large number of code words with variable lengths and maximal correction capability regarding substitution errors. However, some sequencing platforms are known to have exceptional high numbers of insertion or deletion errors. Barcodes based on the edit distance can take insertion and deletion errors into account in the decoding process. Unfortunately, there is no explicit code-construction known that gives optimal codes for edit metric.

Results: In the present work we focus on an entirely different perspective to obtain DNA barcodes. We consider a concatenated code construction, producing so-called watermark codes, which were first proposed by Davey and Mackay, to communicate via binary channels with synchronization errors. We adapt and extend the concepts of watermark codes to use them for DNA sequencing. Moreover, we provide an exemplary set of barcodes that are experimentally compatible with common next-generation sequencing platforms. Finally, a realistic simulation scenario is use to evaluate the proposed codes to show that the watermark concept is suitable for DNA sequencing applications.

Conclusion: Our adaption of watermark codes enables the construction of barcodes that are capable of correcting substitutions, insertion and deletion errors. The presented approach has the advantage of not needing any markers or technical sequences to recover the position of the barcode in the sequencing reads, which poses a significant restriction with other approaches.

Show MeSH

Related in: MedlinePlus

Illustration of the HMM. Exemplary series of transitions of the HMM and observables r(j), based on state transition xj−1→xj and transmit symbols tj for j∈{i−1,i,i+1}.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4339740&req=5

Fig3: Illustration of the HMM. Exemplary series of transitions of the HMM and observables r(j), based on state transition xj−1→xj and transmit symbols tj for j∈{i−1,i,i+1}.

Mentions: to be assembled of sub-sequences r(i), as observables, based on the hidden state xi and the transmit symbol ti. Every state transition xi−1→xi causes an emission of a sub-sequences r(i), that is associated to the position i in t (in general HMMs the emissions are associated to single states and not to transitions, compare with Figure 3). To characterize the transition probabilities among hidden states and the emission probabilities of observables in the HMM, we use the following set of parameters . Although we used an identical notation for parameters as before (infinite state-machine in section Sequencing Channel), the channel model and the HMM discussed here are not equivalent.Figure 3


Insertion and deletion correcting DNA barcodes based on watermarks.

Kracht D, Schober S - BMC Bioinformatics (2015)

Illustration of the HMM. Exemplary series of transitions of the HMM and observables r(j), based on state transition xj−1→xj and transmit symbols tj for j∈{i−1,i,i+1}.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4339740&req=5

Fig3: Illustration of the HMM. Exemplary series of transitions of the HMM and observables r(j), based on state transition xj−1→xj and transmit symbols tj for j∈{i−1,i,i+1}.
Mentions: to be assembled of sub-sequences r(i), as observables, based on the hidden state xi and the transmit symbol ti. Every state transition xi−1→xi causes an emission of a sub-sequences r(i), that is associated to the position i in t (in general HMMs the emissions are associated to single states and not to transitions, compare with Figure 3). To characterize the transition probabilities among hidden states and the emission probabilities of observables in the HMM, we use the following set of parameters . Although we used an identical notation for parameters as before (infinite state-machine in section Sequencing Channel), the channel model and the HMM discussed here are not equivalent.Figure 3

Bottom Line: Moreover, we provide an exemplary set of barcodes that are experimentally compatible with common next-generation sequencing platforms.Our adaption of watermark codes enables the construction of barcodes that are capable of correcting substitutions, insertion and deletion errors.The presented approach has the advantage of not needing any markers or technical sequences to recover the position of the barcode in the sequencing reads, which poses a significant restriction with other approaches.

View Article: PubMed Central - PubMed

Affiliation: Institute of Communications Engineering, Ulm University, Albert-Einstein-Allee 43, Ulm, 89081, Germany. david.kracht@uni-ulm.de.

ABSTRACT

Background: Barcode multiplexing is a key strategy for sharing the rising capacity of next-generation sequencing devices: Synthetic DNA tags, called barcodes, are attached to natural DNA fragments within the library preparation procedure. Different libraries, can individually be labeled with barcodes for a joint sequencing procedure. A post-processing step is needed to sort the sequencing data according to their origin, utilizing these DNA labels. The final separation step is called demultiplexing and is mainly determined by the characteristics of the DNA code words used as labels. Currently, we are facing two different strategies for barcoding: One is based on the Hamming distance, the other uses the edit metric to measure distances of code words. The theory of channel coding provides well-known code constructions for Hamming metric. They provide a large number of code words with variable lengths and maximal correction capability regarding substitution errors. However, some sequencing platforms are known to have exceptional high numbers of insertion or deletion errors. Barcodes based on the edit distance can take insertion and deletion errors into account in the decoding process. Unfortunately, there is no explicit code-construction known that gives optimal codes for edit metric.

Results: In the present work we focus on an entirely different perspective to obtain DNA barcodes. We consider a concatenated code construction, producing so-called watermark codes, which were first proposed by Davey and Mackay, to communicate via binary channels with synchronization errors. We adapt and extend the concepts of watermark codes to use them for DNA sequencing. Moreover, we provide an exemplary set of barcodes that are experimentally compatible with common next-generation sequencing platforms. Finally, a realistic simulation scenario is use to evaluate the proposed codes to show that the watermark concept is suitable for DNA sequencing applications.

Conclusion: Our adaption of watermark codes enables the construction of barcodes that are capable of correcting substitutions, insertion and deletion errors. The presented approach has the advantage of not needing any markers or technical sequences to recover the position of the barcode in the sequencing reads, which poses a significant restriction with other approaches.

Show MeSH
Related in: MedlinePlus