Limits...
On-Demand Indexing for Referential Compression of DNA Sequences.

Alves F, Cogo V, Wandelt S, Leser U, Bessani A - PLoS ONE (2015)

Bottom Line: Referential compression is one of these techniques, in which the similarity between the DNA of organisms of the same or an evolutionary close species is exploited to reduce the storage demands of genome sequences up to 700 times.The general idea is to store in the compressed file only the differences between the to-be-compressed and a well-known reference sequence.Our approach, called On-Demand Indexing (ODI) compresses human chromosomes five to ten times faster than other state-of-the-art tools (on average), while achieving similar compression ratios.

View Article: PubMed Central - PubMed

Affiliation: LaSIGE, University of Lisbon, Lisbon, Portugal.

ABSTRACT
The decreasing costs of genome sequencing is creating a demand for scalable storage and processing tools and techniques to deal with the large amounts of generated data. Referential compression is one of these techniques, in which the similarity between the DNA of organisms of the same or an evolutionary close species is exploited to reduce the storage demands of genome sequences up to 700 times. The general idea is to store in the compressed file only the differences between the to-be-compressed and a well-known reference sequence. In this paper, we propose a method for improving the performance of referential compression by removing the most costly phase of the process, the complete reference indexing. Our approach, called On-Demand Indexing (ODI) compresses human chromosomes five to ten times faster than other state-of-the-art tools (on average), while achieving similar compression ratios.

No MeSH data available.


Comparison between the transfer of a compressed and an uncompressed genome.Steps taken to simply transfer a genome file (T1), or to compress (C), transfer (T2) and decompress the file (D).
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4493149&req=5

pone.0132460.g010: Comparison between the transfer of a compressed and an uncompressed genome.Steps taken to simply transfer a genome file (T1), or to compress (C), transfer (T2) and decompress the file (D).

Mentions: Compressing files is widely accepted as an efficient solution for storage limitations. However, there are other scenarios where it is advantageous. For example, data transfer may also benefit from compression to reduce the burden of transmitting large files through the network from one point to another. In this experiment, we intend to show that a compress-transfer-decompress workflow is better than a simple transfer workflow when data is as big as a human genome. Fig 10 depicts this scenario, and our experiment considers transfers between our facilities in Portugal and an Amazon EC2 instance (t2.medium) running in Ireland. The to-be-transferred genome is the one identified by the accession number HG00173 in the 1000 Genomes Project, which has approximately 3GB uncompressed and 4.1MB compressed. Table 2 presents the average result of each step and its standard deviation.


On-Demand Indexing for Referential Compression of DNA Sequences.

Alves F, Cogo V, Wandelt S, Leser U, Bessani A - PLoS ONE (2015)

Comparison between the transfer of a compressed and an uncompressed genome.Steps taken to simply transfer a genome file (T1), or to compress (C), transfer (T2) and decompress the file (D).
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4493149&req=5

pone.0132460.g010: Comparison between the transfer of a compressed and an uncompressed genome.Steps taken to simply transfer a genome file (T1), or to compress (C), transfer (T2) and decompress the file (D).
Mentions: Compressing files is widely accepted as an efficient solution for storage limitations. However, there are other scenarios where it is advantageous. For example, data transfer may also benefit from compression to reduce the burden of transmitting large files through the network from one point to another. In this experiment, we intend to show that a compress-transfer-decompress workflow is better than a simple transfer workflow when data is as big as a human genome. Fig 10 depicts this scenario, and our experiment considers transfers between our facilities in Portugal and an Amazon EC2 instance (t2.medium) running in Ireland. The to-be-transferred genome is the one identified by the accession number HG00173 in the 1000 Genomes Project, which has approximately 3GB uncompressed and 4.1MB compressed. Table 2 presents the average result of each step and its standard deviation.

Bottom Line: Referential compression is one of these techniques, in which the similarity between the DNA of organisms of the same or an evolutionary close species is exploited to reduce the storage demands of genome sequences up to 700 times.The general idea is to store in the compressed file only the differences between the to-be-compressed and a well-known reference sequence.Our approach, called On-Demand Indexing (ODI) compresses human chromosomes five to ten times faster than other state-of-the-art tools (on average), while achieving similar compression ratios.

View Article: PubMed Central - PubMed

Affiliation: LaSIGE, University of Lisbon, Lisbon, Portugal.

ABSTRACT
The decreasing costs of genome sequencing is creating a demand for scalable storage and processing tools and techniques to deal with the large amounts of generated data. Referential compression is one of these techniques, in which the similarity between the DNA of organisms of the same or an evolutionary close species is exploited to reduce the storage demands of genome sequences up to 700 times. The general idea is to store in the compressed file only the differences between the to-be-compressed and a well-known reference sequence. In this paper, we propose a method for improving the performance of referential compression by removing the most costly phase of the process, the complete reference indexing. Our approach, called On-Demand Indexing (ODI) compresses human chromosomes five to ten times faster than other state-of-the-art tools (on average), while achieving similar compression ratios.

No MeSH data available.