Limits...
Streamlined Genome Sequence Compression using Distributed Source Coding.

Wang S, Jiang X, Chen F, Cui L, Cheng S - Cancer Inform (2014)

Bottom Line: Existing techniques that require heavy client (encoder side) cannot be applied.To tackle this challenge, we carefully examined distributed source coding theory and developed a customized reference-based genome compression protocol to meet the low-complexity need at the client side.Our experimental results showed promising performance of the proposed method when compared with the state-of-the-art algorithm (GRS).

View Article: PubMed Central - PubMed

Affiliation: Division of Biomedical Informatics, University of California, San Diego, La Jolla, CA, USA.

ABSTRACT
We aim at developing a streamlined genome sequence compression algorithm to support alternative miniaturized sequencing devices, which have limited communication, storage, and computation power. Existing techniques that require heavy client (encoder side) cannot be applied. To tackle this challenge, we carefully examined distributed source coding theory and developed a customized reference-based genome compression protocol to meet the low-complexity need at the client side. Based on the variation between source and reference, our protocol will pick adaptively either syndrome coding or hash coding to compress subsequences of changing code length. Our experimental results showed promising performance of the proposed method when compared with the state-of-the-art algorithm (GRS).

No MeSH data available.


Related in: MedlinePlus

The empirical statistics of (A) the DNA bases {“A”, “T”, “G”, “C”, “N”} and these of (B) the local offsets with the range from −4 to 4.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4256044&req=5

f5-cin-suppl.1-2014-123: The empirical statistics of (A) the DNA bases {“A”, “T”, “G”, “C”, “N”} and these of (B) the local offsets with the range from −4 to 4.

Mentions: First, the empirical marginal statistics of the DNA bases {“A”, “T”, “G”, “C”, “N”} and those of the local offsets ti within the range from −4 to 4 are shown in Figure 5A and 5B, respectively, which will be used as the priors in the syndrome-based nonrepeated sequence decoding. In Figure 5A, we verify the assumption that the alphabets of DNA sequences are usually non-uniformly distributed. Moreover, Figure 5B depicts that the maximum local offset with T = 4 is sufficiently large for capturing shifts between the reference and the source.


Streamlined Genome Sequence Compression using Distributed Source Coding.

Wang S, Jiang X, Chen F, Cui L, Cheng S - Cancer Inform (2014)

The empirical statistics of (A) the DNA bases {“A”, “T”, “G”, “C”, “N”} and these of (B) the local offsets with the range from −4 to 4.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4256044&req=5

f5-cin-suppl.1-2014-123: The empirical statistics of (A) the DNA bases {“A”, “T”, “G”, “C”, “N”} and these of (B) the local offsets with the range from −4 to 4.
Mentions: First, the empirical marginal statistics of the DNA bases {“A”, “T”, “G”, “C”, “N”} and those of the local offsets ti within the range from −4 to 4 are shown in Figure 5A and 5B, respectively, which will be used as the priors in the syndrome-based nonrepeated sequence decoding. In Figure 5A, we verify the assumption that the alphabets of DNA sequences are usually non-uniformly distributed. Moreover, Figure 5B depicts that the maximum local offset with T = 4 is sufficiently large for capturing shifts between the reference and the source.

Bottom Line: Existing techniques that require heavy client (encoder side) cannot be applied.To tackle this challenge, we carefully examined distributed source coding theory and developed a customized reference-based genome compression protocol to meet the low-complexity need at the client side.Our experimental results showed promising performance of the proposed method when compared with the state-of-the-art algorithm (GRS).

View Article: PubMed Central - PubMed

Affiliation: Division of Biomedical Informatics, University of California, San Diego, La Jolla, CA, USA.

ABSTRACT
We aim at developing a streamlined genome sequence compression algorithm to support alternative miniaturized sequencing devices, which have limited communication, storage, and computation power. Existing techniques that require heavy client (encoder side) cannot be applied. To tackle this challenge, we carefully examined distributed source coding theory and developed a customized reference-based genome compression protocol to meet the low-complexity need at the client side. Based on the variation between source and reference, our protocol will pick adaptively either syndrome coding or hash coding to compress subsequences of changing code length. Our experimental results showed promising performance of the proposed method when compared with the state-of-the-art algorithm (GRS).

No MeSH data available.


Related in: MedlinePlus