Limits...
On-Demand Indexing for Referential Compression of DNA Sequences.

Alves F, Cogo V, Wandelt S, Leser U, Bessani A - PLoS ONE (2015)

Bottom Line: Referential compression is one of these techniques, in which the similarity between the DNA of organisms of the same or an evolutionary close species is exploited to reduce the storage demands of genome sequences up to 700 times.The general idea is to store in the compressed file only the differences between the to-be-compressed and a well-known reference sequence.Our approach, called On-Demand Indexing (ODI) compresses human chromosomes five to ten times faster than other state-of-the-art tools (on average), while achieving similar compression ratios.

View Article: PubMed Central - PubMed

Affiliation: LaSIGE, University of Lisbon, Lisbon, Portugal.

ABSTRACT
The decreasing costs of genome sequencing is creating a demand for scalable storage and processing tools and techniques to deal with the large amounts of generated data. Referential compression is one of these techniques, in which the similarity between the DNA of organisms of the same or an evolutionary close species is exploited to reduce the storage demands of genome sequences up to 700 times. The general idea is to store in the compressed file only the differences between the to-be-compressed and a well-known reference sequence. In this paper, we propose a method for improving the performance of referential compression by removing the most costly phase of the process, the complete reference indexing. Our approach, called On-Demand Indexing (ODI) compresses human chromosomes five to ten times faster than other state-of-the-art tools (on average), while achieving similar compression ratios.

No MeSH data available.


Compression ratio comparison.JDNA’s compression ratio is always close to FRESCO’s ratio, even though always a bit smaller.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4493149&req=5

pone.0132460.g008: Compression ratio comparison.JDNA’s compression ratio is always close to FRESCO’s ratio, even though always a bit smaller.

Mentions: A compression ratio of (y:1) means that the size of a compressed file is y times smaller than the size of the original one. Fig 8 compares the achieved compression ratio with each tool. The compression ratios obtained with JDNA and FRESCO differ around 12% on average, and they may happen for two reasons. First, the encoding algorithms used by each tool are different, where a difference of a single bit in the encoding phase amplified by thousands of matches causes a considerable oscillation in the final compressing ratio. Second, FRESCO indexes the entire reference genome, which always finds the best match for each segment, while JDNA indexes only small blocks from the reference. However, in this order of magnitude (compressing a file over 700 ×), the compressed file sizes vary between 6 and 40kB, which we consider a minor difference comparing to the dozens of MB the original file occupies.


On-Demand Indexing for Referential Compression of DNA Sequences.

Alves F, Cogo V, Wandelt S, Leser U, Bessani A - PLoS ONE (2015)

Compression ratio comparison.JDNA’s compression ratio is always close to FRESCO’s ratio, even though always a bit smaller.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4493149&req=5

pone.0132460.g008: Compression ratio comparison.JDNA’s compression ratio is always close to FRESCO’s ratio, even though always a bit smaller.
Mentions: A compression ratio of (y:1) means that the size of a compressed file is y times smaller than the size of the original one. Fig 8 compares the achieved compression ratio with each tool. The compression ratios obtained with JDNA and FRESCO differ around 12% on average, and they may happen for two reasons. First, the encoding algorithms used by each tool are different, where a difference of a single bit in the encoding phase amplified by thousands of matches causes a considerable oscillation in the final compressing ratio. Second, FRESCO indexes the entire reference genome, which always finds the best match for each segment, while JDNA indexes only small blocks from the reference. However, in this order of magnitude (compressing a file over 700 ×), the compressed file sizes vary between 6 and 40kB, which we consider a minor difference comparing to the dozens of MB the original file occupies.

Bottom Line: Referential compression is one of these techniques, in which the similarity between the DNA of organisms of the same or an evolutionary close species is exploited to reduce the storage demands of genome sequences up to 700 times.The general idea is to store in the compressed file only the differences between the to-be-compressed and a well-known reference sequence.Our approach, called On-Demand Indexing (ODI) compresses human chromosomes five to ten times faster than other state-of-the-art tools (on average), while achieving similar compression ratios.

View Article: PubMed Central - PubMed

Affiliation: LaSIGE, University of Lisbon, Lisbon, Portugal.

ABSTRACT
The decreasing costs of genome sequencing is creating a demand for scalable storage and processing tools and techniques to deal with the large amounts of generated data. Referential compression is one of these techniques, in which the similarity between the DNA of organisms of the same or an evolutionary close species is exploited to reduce the storage demands of genome sequences up to 700 times. The general idea is to store in the compressed file only the differences between the to-be-compressed and a well-known reference sequence. In this paper, we propose a method for improving the performance of referential compression by removing the most costly phase of the process, the complete reference indexing. Our approach, called On-Demand Indexing (ODI) compresses human chromosomes five to ten times faster than other state-of-the-art tools (on average), while achieving similar compression ratios.

No MeSH data available.