Limits...
Streamlined Genome Sequence Compression using Distributed Source Coding.

Wang S, Jiang X, Chen F, Cui L, Cheng S - Cancer Inform (2014)

Bottom Line: Existing techniques that require heavy client (encoder side) cannot be applied.To tackle this challenge, we carefully examined distributed source coding theory and developed a customized reference-based genome compression protocol to meet the low-complexity need at the client side.Our experimental results showed promising performance of the proposed method when compared with the state-of-the-art algorithm (GRS).

View Article: PubMed Central - PubMed

Affiliation: Division of Biomedical Informatics, University of California, San Diego, La Jolla, CA, USA.

ABSTRACT
We aim at developing a streamlined genome sequence compression algorithm to support alternative miniaturized sequencing devices, which have limited communication, storage, and computation power. Existing techniques that require heavy client (encoder side) cannot be applied. To tackle this challenge, we carefully examined distributed source coding theory and developed a customized reference-based genome compression protocol to meet the low-complexity need at the client side. Based on the variation between source and reference, our protocol will pick adaptively either syndrome coding or hash coding to compress subsequences of changing code length. Our experimental results showed promising performance of the proposed method when compared with the state-of-the-art algorithm (GRS).

No MeSH data available.


Related in: MedlinePlus

Compression performance of the proposed codec on TAIR dataset, (A) the average code rates vs. the different maximum local offsets in syndrome coding; (B) the overall compression performance (ie, hash bits + syndromes) for all 5 chromosomes.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4256044&req=5

f6-cin-suppl.1-2014-123: Compression performance of the proposed codec on TAIR dataset, (A) the average code rates vs. the different maximum local offsets in syndrome coding; (B) the overall compression performance (ie, hash bits + syndromes) for all 5 chromosomes.

Mentions: Figure 6A illustrates the relationship between the average code rates and the different maximum local offsets in syndrome coding based on all five chromosomes. In Figure 6A, we can see that the code rates decrease as the maximum local offsets increase, due to the fact that a larger maximum local offset offers a wider search region for exploring the reference. However, a larger maximum local offset may also result in a higher decoding complexity. Figure 6B shows the overall compression performance (ie, hash bits + syndromes) for all five chromosomes in terms of compressed file size. Moreover, Figure 7 shows a side-by-side comparison of the compression rate and compression time. We can see that both the proposed method and GRS algorithm achieve significant file size reductions (ie, up to 8252 × file size reduction).


Streamlined Genome Sequence Compression using Distributed Source Coding.

Wang S, Jiang X, Chen F, Cui L, Cheng S - Cancer Inform (2014)

Compression performance of the proposed codec on TAIR dataset, (A) the average code rates vs. the different maximum local offsets in syndrome coding; (B) the overall compression performance (ie, hash bits + syndromes) for all 5 chromosomes.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4256044&req=5

f6-cin-suppl.1-2014-123: Compression performance of the proposed codec on TAIR dataset, (A) the average code rates vs. the different maximum local offsets in syndrome coding; (B) the overall compression performance (ie, hash bits + syndromes) for all 5 chromosomes.
Mentions: Figure 6A illustrates the relationship between the average code rates and the different maximum local offsets in syndrome coding based on all five chromosomes. In Figure 6A, we can see that the code rates decrease as the maximum local offsets increase, due to the fact that a larger maximum local offset offers a wider search region for exploring the reference. However, a larger maximum local offset may also result in a higher decoding complexity. Figure 6B shows the overall compression performance (ie, hash bits + syndromes) for all five chromosomes in terms of compressed file size. Moreover, Figure 7 shows a side-by-side comparison of the compression rate and compression time. We can see that both the proposed method and GRS algorithm achieve significant file size reductions (ie, up to 8252 × file size reduction).

Bottom Line: Existing techniques that require heavy client (encoder side) cannot be applied.To tackle this challenge, we carefully examined distributed source coding theory and developed a customized reference-based genome compression protocol to meet the low-complexity need at the client side.Our experimental results showed promising performance of the proposed method when compared with the state-of-the-art algorithm (GRS).

View Article: PubMed Central - PubMed

Affiliation: Division of Biomedical Informatics, University of California, San Diego, La Jolla, CA, USA.

ABSTRACT
We aim at developing a streamlined genome sequence compression algorithm to support alternative miniaturized sequencing devices, which have limited communication, storage, and computation power. Existing techniques that require heavy client (encoder side) cannot be applied. To tackle this challenge, we carefully examined distributed source coding theory and developed a customized reference-based genome compression protocol to meet the low-complexity need at the client side. Based on the variation between source and reference, our protocol will pick adaptively either syndrome coding or hash coding to compress subsequences of changing code length. Our experimental results showed promising performance of the proposed method when compared with the state-of-the-art algorithm (GRS).

No MeSH data available.


Related in: MedlinePlus