Limits...
elPrep: High-Performance Preparation of Sequence Alignment/Map Files for Variant Calling.

Herzeel C, Costanza P, Decap D, Fostier J, Reumers J - PLoS ONE (2015)

Bottom Line: For example, for a preparation pipeline of five steps on a whole-exome BAM file (NA12878), we reduce the execution time from about 1:40 hours, when using a combination of SAMtools and Picard, to about 15 minutes when using elPrep, while utilising the same server resources, here 48 threads and 23GB of RAM.For the same pipeline on whole-genome data (NA12878), elPrep reduces the runtime from 24 hours to less than 5 hours.As a typical clinical study may contain sequencing data for hundreds of patients, elPrep can remove several hundreds of hours of computing time, and thus substantially reduce analysis time and cost.

View Article: PubMed Central - PubMed

Affiliation: Imec, Leuven, Belgium; ExaScience Life Lab, Leuven, Belgium.

ABSTRACT
elPrep is a high-performance tool for preparing sequence alignment/map files for variant calling in sequencing pipelines. It can be used as a replacement for SAMtools and Picard for preparation steps such as filtering, sorting, marking duplicates, reordering contigs, and so on, while producing identical results. What sets elPrep apart is its software architecture that allows executing preparation pipelines by making only a single pass through the data, no matter how many preparation steps are used in the pipeline. elPrep is designed as a multithreaded application that runs entirely in memory, avoids repeated file I/O, and merges the computation of several preparation steps to significantly speed up the execution time. For example, for a preparation pipeline of five steps on a whole-exome BAM file (NA12878), we reduce the execution time from about 1:40 hours, when using a combination of SAMtools and Picard, to about 15 minutes when using elPrep, while utilising the same server resources, here 48 threads and 23GB of RAM. For the same pipeline on whole-genome data (NA12878), elPrep reduces the runtime from 24 hours to less than 5 hours. As a typical clinical study may contain sequencing data for hundreds of patients, elPrep can remove several hundreds of hours of computing time, and thus substantially reduce analysis time and cost.

No MeSH data available.


BAM processing: standard practice (top) versus elPrep (bottom).The standard practice is calling a (different) preparation tool for each step, which leads to repeated file I/O, as well as repeated traversal of the same SAM file. To use elPrep, one instead issues a single command that lists the preparation steps to be applied to a SAM file. elPrep internally combines the execution of the different preparation steps, resulting in a single pass over the SAM file, and avoiding repetitive file I/O.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4504710&req=5

pone.0132868.g002: BAM processing: standard practice (top) versus elPrep (bottom).The standard practice is calling a (different) preparation tool for each step, which leads to repeated file I/O, as well as repeated traversal of the same SAM file. To use elPrep, one instead issues a single command that lists the preparation steps to be applied to a SAM file. elPrep internally combines the execution of the different preparation steps, resulting in a single pass over the SAM file, and avoiding repetitive file I/O.

Mentions: In practice, different alignment tools produce slightly different outputs, and different analysis tools depend on slightly different SAM structures to work properly. For example, some analysis tools require optional information to be present, or require the reads to be filtered, for example to remove unmapped reads, or only work if the reads are stored in a particular order, and so on. This is why in practice, there are typically a number of steps in between the alignment and analysis tools to rewrite the SAM file into a form that is accepted by the analysis tool (Fig 2). For example, the GATK Best Practices [7] and the bcbio-nextgen project [8] give recommendations on which SAM manipulation tools need to be called to successfully combine different alignment and analysis tools.


elPrep: High-Performance Preparation of Sequence Alignment/Map Files for Variant Calling.

Herzeel C, Costanza P, Decap D, Fostier J, Reumers J - PLoS ONE (2015)

BAM processing: standard practice (top) versus elPrep (bottom).The standard practice is calling a (different) preparation tool for each step, which leads to repeated file I/O, as well as repeated traversal of the same SAM file. To use elPrep, one instead issues a single command that lists the preparation steps to be applied to a SAM file. elPrep internally combines the execution of the different preparation steps, resulting in a single pass over the SAM file, and avoiding repetitive file I/O.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4504710&req=5

pone.0132868.g002: BAM processing: standard practice (top) versus elPrep (bottom).The standard practice is calling a (different) preparation tool for each step, which leads to repeated file I/O, as well as repeated traversal of the same SAM file. To use elPrep, one instead issues a single command that lists the preparation steps to be applied to a SAM file. elPrep internally combines the execution of the different preparation steps, resulting in a single pass over the SAM file, and avoiding repetitive file I/O.
Mentions: In practice, different alignment tools produce slightly different outputs, and different analysis tools depend on slightly different SAM structures to work properly. For example, some analysis tools require optional information to be present, or require the reads to be filtered, for example to remove unmapped reads, or only work if the reads are stored in a particular order, and so on. This is why in practice, there are typically a number of steps in between the alignment and analysis tools to rewrite the SAM file into a form that is accepted by the analysis tool (Fig 2). For example, the GATK Best Practices [7] and the bcbio-nextgen project [8] give recommendations on which SAM manipulation tools need to be called to successfully combine different alignment and analysis tools.

Bottom Line: For example, for a preparation pipeline of five steps on a whole-exome BAM file (NA12878), we reduce the execution time from about 1:40 hours, when using a combination of SAMtools and Picard, to about 15 minutes when using elPrep, while utilising the same server resources, here 48 threads and 23GB of RAM.For the same pipeline on whole-genome data (NA12878), elPrep reduces the runtime from 24 hours to less than 5 hours.As a typical clinical study may contain sequencing data for hundreds of patients, elPrep can remove several hundreds of hours of computing time, and thus substantially reduce analysis time and cost.

View Article: PubMed Central - PubMed

Affiliation: Imec, Leuven, Belgium; ExaScience Life Lab, Leuven, Belgium.

ABSTRACT
elPrep is a high-performance tool for preparing sequence alignment/map files for variant calling in sequencing pipelines. It can be used as a replacement for SAMtools and Picard for preparation steps such as filtering, sorting, marking duplicates, reordering contigs, and so on, while producing identical results. What sets elPrep apart is its software architecture that allows executing preparation pipelines by making only a single pass through the data, no matter how many preparation steps are used in the pipeline. elPrep is designed as a multithreaded application that runs entirely in memory, avoids repeated file I/O, and merges the computation of several preparation steps to significantly speed up the execution time. For example, for a preparation pipeline of five steps on a whole-exome BAM file (NA12878), we reduce the execution time from about 1:40 hours, when using a combination of SAMtools and Picard, to about 15 minutes when using elPrep, while utilising the same server resources, here 48 threads and 23GB of RAM. For the same pipeline on whole-genome data (NA12878), elPrep reduces the runtime from 24 hours to less than 5 hours. As a typical clinical study may contain sequencing data for hundreds of patients, elPrep can remove several hundreds of hours of computing time, and thus substantially reduce analysis time and cost.

No MeSH data available.