Limits...
Streamlined Genome Sequence Compression using Distributed Source Coding.

Wang S, Jiang X, Chen F, Cui L, Cheng S - Cancer Inform (2014)

Bottom Line: Existing techniques that require heavy client (encoder side) cannot be applied.To tackle this challenge, we carefully examined distributed source coding theory and developed a customized reference-based genome compression protocol to meet the low-complexity need at the client side.Our experimental results showed promising performance of the proposed method when compared with the state-of-the-art algorithm (GRS).

View Article: PubMed Central - PubMed

Affiliation: Division of Biomedical Informatics, University of California, San Diego, La Jolla, CA, USA.

ABSTRACT
We aim at developing a streamlined genome sequence compression algorithm to support alternative miniaturized sequencing devices, which have limited communication, storage, and computation power. Existing techniques that require heavy client (encoder side) cannot be applied. To tackle this challenge, we carefully examined distributed source coding theory and developed a customized reference-based genome compression protocol to meet the low-complexity need at the client side. Based on the variation between source and reference, our protocol will pick adaptively either syndrome coding or hash coding to compress subsequences of changing code length. Our experimental results showed promising performance of the proposed method when compared with the state-of-the-art algorithm (GRS).

No MeSH data available.


Related in: MedlinePlus

Factor graph of genome compression based on DSC.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4256044&req=5

f4-cin-suppl.1-2014-123: Factor graph of genome compression based on DSC.

Mentions: To perform syndrome-based decoding for non-repeat DNA subsequence x with the reference sequence as side information y, the key factor is to be able to explore the variations between the source subsequence x and the reference sequence y, where the variations are modeled by the insertion, deletion, and substitution between the source and reference. Moreover, a substitution can be expressed as an insertion in the source sequence followed by a deletion in the corresponding location in the reference sequence. In this section, we demonstrate that such variations can be effectively estimated through Bayesian inference on graphical models. The graphical model of our proposed syndrome-based decoding with variation is depicted in Figure 4. In Figure 4, the variable nodes (usually depicted by a circle) denote variables such as source symbol, binary source bits, local offset introduced by variation, and syndromes. Besides, factor nodes (depicted by squares) represent the relationship among the connected variable nodes. In the rest of this section, we will describe how to construct the proposed factor graph for the DNA sequence decoding with variations. We first study the parity check constraint imposed by the received syndromes, where s1,…, sM, the realization of variable node Sl, l = 1,…, M, denotes the received syndromes in Figure 4. Similar to the standard LDPC codes, the factor nodes cl, l = 1,…, M, take into account the parity check constraints, where the corresponding factor function can be expressed ascl(xcl,sl)={1,ifsl⊕⊕xcl=0,0,otherwise.(1)


Streamlined Genome Sequence Compression using Distributed Source Coding.

Wang S, Jiang X, Chen F, Cui L, Cheng S - Cancer Inform (2014)

Factor graph of genome compression based on DSC.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4256044&req=5

f4-cin-suppl.1-2014-123: Factor graph of genome compression based on DSC.
Mentions: To perform syndrome-based decoding for non-repeat DNA subsequence x with the reference sequence as side information y, the key factor is to be able to explore the variations between the source subsequence x and the reference sequence y, where the variations are modeled by the insertion, deletion, and substitution between the source and reference. Moreover, a substitution can be expressed as an insertion in the source sequence followed by a deletion in the corresponding location in the reference sequence. In this section, we demonstrate that such variations can be effectively estimated through Bayesian inference on graphical models. The graphical model of our proposed syndrome-based decoding with variation is depicted in Figure 4. In Figure 4, the variable nodes (usually depicted by a circle) denote variables such as source symbol, binary source bits, local offset introduced by variation, and syndromes. Besides, factor nodes (depicted by squares) represent the relationship among the connected variable nodes. In the rest of this section, we will describe how to construct the proposed factor graph for the DNA sequence decoding with variations. We first study the parity check constraint imposed by the received syndromes, where s1,…, sM, the realization of variable node Sl, l = 1,…, M, denotes the received syndromes in Figure 4. Similar to the standard LDPC codes, the factor nodes cl, l = 1,…, M, take into account the parity check constraints, where the corresponding factor function can be expressed ascl(xcl,sl)={1,ifsl⊕⊕xcl=0,0,otherwise.(1)

Bottom Line: Existing techniques that require heavy client (encoder side) cannot be applied.To tackle this challenge, we carefully examined distributed source coding theory and developed a customized reference-based genome compression protocol to meet the low-complexity need at the client side.Our experimental results showed promising performance of the proposed method when compared with the state-of-the-art algorithm (GRS).

View Article: PubMed Central - PubMed

Affiliation: Division of Biomedical Informatics, University of California, San Diego, La Jolla, CA, USA.

ABSTRACT
We aim at developing a streamlined genome sequence compression algorithm to support alternative miniaturized sequencing devices, which have limited communication, storage, and computation power. Existing techniques that require heavy client (encoder side) cannot be applied. To tackle this challenge, we carefully examined distributed source coding theory and developed a customized reference-based genome compression protocol to meet the low-complexity need at the client side. Based on the variation between source and reference, our protocol will pick adaptively either syndrome coding or hash coding to compress subsequences of changing code length. Our experimental results showed promising performance of the proposed method when compared with the state-of-the-art algorithm (GRS).

No MeSH data available.


Related in: MedlinePlus