Limits...
Fast inexact mapping using advanced tree exploration on backward search methods.

Salavert J, Tomás A, Tárraga J, Medina I, Dopazo J, Blanquer I - BMC Bioinformatics (2015)

Bottom Line: Depending on the aligner the overall execution time is reduced between 20-40%.This step significantly reduces the number reads to be aligned, accelerating overall alignment time.In addition, an out-of-core index has been implemented for working with large genomes on systems without expensive memory configurations.

View Article: PubMed Central - PubMed

Affiliation: GRyCAP department of I3M, Universitat Politècnica de València, Building 8B, Camino de vera s/n, Valencia, 46022, Spain. josator@i3m.upv.es.

ABSTRACT

Background: Short sequence mapping methods for Next Generation Sequencing consist on a combination of seeding techniques followed by local alignment based on dynamic programming approaches. Most seeding algorithms are based on backward search alignment, using the Burrows Wheeler Transform, the Ferragina and Manzini Index or Suffix Arrays. All these backward search algorithms have excellent performance, but their computational cost highly increases when allowing errors. In this paper, we discuss an inexact mapping algorithm based on pruning strategies for search tree exploration over genomic data.

Results: The proposed algorithm achieves a 13x speed-up over similar algorithms when allowing 6 base errors, including insertions, deletions and mismatches. This algorithm can deal with 400 bps reads with up to 9 errors in a high quality Illumina dataset. In this example, the algorithm works as a preprocessor that reduces by 55% the number of reads to be aligned. Depending on the aligner the overall execution time is reduced between 20-40%.

Conclusions: Although not intended as a complete sequence mapping tool, the proposed algorithm could be used as a preprocessing step to modern sequence mappers. This step significantly reduces the number reads to be aligned, accelerating overall alignment time. Furthermore, this algorithm could be used for accelerating the seeding step of already available sequence mappers. In addition, an out-of-core index has been implemented for working with large genomes on systems without expensive memory configurations.

Show MeSH

Related in: MedlinePlus

BWT and SW tools. 2 Million 400 bps reads. Execution times comparing the new algorithm, the modern mappers and the combination of both.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4384339&req=5

Fig3: BWT and SW tools. 2 Million 400 bps reads. Execution times comparing the new algorithm, the modern mappers and the combination of both.

Mentions: Figure 3 shows execution times for the 400 bps dataset, Bowtie 2 took 40 m 35 s and BWA-MEM took 33 m 09s. Our algorithm was executed allowing up to 9 errors, finding 62.21% of the reads in 9 m 24s. The combined approach with Bowtie 2 took 24 m 33 s (40% faster) and with BWA-MEM took 25 m 57 s (21% faster). Bowtie 2 found 94.46% of the reads and BWA-MEM found 94.48%, the same amount of reads where found when using our algorithm as a preprocessing step.Figure 3


Fast inexact mapping using advanced tree exploration on backward search methods.

Salavert J, Tomás A, Tárraga J, Medina I, Dopazo J, Blanquer I - BMC Bioinformatics (2015)

BWT and SW tools. 2 Million 400 bps reads. Execution times comparing the new algorithm, the modern mappers and the combination of both.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4384339&req=5

Fig3: BWT and SW tools. 2 Million 400 bps reads. Execution times comparing the new algorithm, the modern mappers and the combination of both.
Mentions: Figure 3 shows execution times for the 400 bps dataset, Bowtie 2 took 40 m 35 s and BWA-MEM took 33 m 09s. Our algorithm was executed allowing up to 9 errors, finding 62.21% of the reads in 9 m 24s. The combined approach with Bowtie 2 took 24 m 33 s (40% faster) and with BWA-MEM took 25 m 57 s (21% faster). Bowtie 2 found 94.46% of the reads and BWA-MEM found 94.48%, the same amount of reads where found when using our algorithm as a preprocessing step.Figure 3

Bottom Line: Depending on the aligner the overall execution time is reduced between 20-40%.This step significantly reduces the number reads to be aligned, accelerating overall alignment time.In addition, an out-of-core index has been implemented for working with large genomes on systems without expensive memory configurations.

View Article: PubMed Central - PubMed

Affiliation: GRyCAP department of I3M, Universitat Politècnica de València, Building 8B, Camino de vera s/n, Valencia, 46022, Spain. josator@i3m.upv.es.

ABSTRACT

Background: Short sequence mapping methods for Next Generation Sequencing consist on a combination of seeding techniques followed by local alignment based on dynamic programming approaches. Most seeding algorithms are based on backward search alignment, using the Burrows Wheeler Transform, the Ferragina and Manzini Index or Suffix Arrays. All these backward search algorithms have excellent performance, but their computational cost highly increases when allowing errors. In this paper, we discuss an inexact mapping algorithm based on pruning strategies for search tree exploration over genomic data.

Results: The proposed algorithm achieves a 13x speed-up over similar algorithms when allowing 6 base errors, including insertions, deletions and mismatches. This algorithm can deal with 400 bps reads with up to 9 errors in a high quality Illumina dataset. In this example, the algorithm works as a preprocessor that reduces by 55% the number of reads to be aligned. Depending on the aligner the overall execution time is reduced between 20-40%.

Conclusions: Although not intended as a complete sequence mapping tool, the proposed algorithm could be used as a preprocessing step to modern sequence mappers. This step significantly reduces the number reads to be aligned, accelerating overall alignment time. Furthermore, this algorithm could be used for accelerating the seeding step of already available sequence mappers. In addition, an out-of-core index has been implemented for working with large genomes on systems without expensive memory configurations.

Show MeSH
Related in: MedlinePlus