Limits...
BarraCUDA - a fast short read sequence aligner using graphics processing units.

Klus P, Lam S, Lyberg D, Cheung MS, Pullan G, McFarlane I, Yeo GSh, Lam BY - BMC Res Notes (2012)

Bottom Line: General purpose computing on graphics processing units (GPGPU), extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC) clusters.As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity.BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments.

View Article: PubMed Central - HTML - PubMed

Affiliation: University of Cambridge Metabolic Research Laboratories, Institute of Metabolic Science, Box 289, Addenbrooke's Hospital, Hill's Road, Cambridge CB2 0QQ, UK. yhbl2@cam.ac.uk.

ABSTRACT

Background: With the maturation of next-generation DNA sequencing (NGS) technologies, the throughput of DNA sequencing reads has soared to over 600 gigabases from a single instrument run. General purpose computing on graphics processing units (GPGPU), extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC) clusters. In this article, we describe the implementation of BarraCUDA, a GPGPU sequence alignment software that is based on BWA, to accelerate the alignment of sequencing reads generated by these instruments to a reference DNA sequence.

Findings: Using the NVIDIA Compute Unified Device Architecture (CUDA) software development environment, we ported the most computational-intensive alignment component of BWA to GPU to take advantage of the massive parallelism. As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the alignment throughput.

Conclusions: BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology.BarraCUDA is currently available from http://seqbarracuda.sf.net.

No MeSH data available.


The search space traversal for inexact alignments. A. There are 4 types of alignments for each base, namely exact match, mismatch, insertion and deletion. A mismatch alignment is performed by substituting the reference base G with the three other possible bases (A, T and C). Insertions are detected by deleting the base from the query sequence, and deletions by inserting all 4 bases into the query sequence. B. A BFS strategy: BFS starts from the first base (1) and stores all hits in daughter nodes (1, 1)...(1, 3) in memory (with shaded squares), then it chooses (1, 1) and expands it into (1, 1, 1) ... (1, 1, 3). With all the nodes in memory, the agent then evaluates (1, 1, 1) which returns an alignment followed by (1, 1, 3), a suboptimal alignment, which is the next best hit in memory. After that the agent proceeds to (1, 2) which has the same number of differences as (1, 1, 3). BWA does not process nodes with more than z + 1 differences, i.e. (1, 3), (1, 1, 2) with the default option. C. A DFS strategy: DFS chooses the best hit (1, 1) from (1), and subsequently chooses (1, 1, 1) which returns an alignment. Then the agent goes back to (1, 1) to reach (1, 1, 3) as the next best hit to return the sub-optimal alignment. After that, the agent returns to (1, 1), then (1) to reach (1, 2) (not shown). Like BWA, BarraCUDA skips nodes with > z + 1 differences by default.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3278344&req=5

Figure 1: The search space traversal for inexact alignments. A. There are 4 types of alignments for each base, namely exact match, mismatch, insertion and deletion. A mismatch alignment is performed by substituting the reference base G with the three other possible bases (A, T and C). Insertions are detected by deleting the base from the query sequence, and deletions by inserting all 4 bases into the query sequence. B. A BFS strategy: BFS starts from the first base (1) and stores all hits in daughter nodes (1, 1)...(1, 3) in memory (with shaded squares), then it chooses (1, 1) and expands it into (1, 1, 1) ... (1, 1, 3). With all the nodes in memory, the agent then evaluates (1, 1, 1) which returns an alignment followed by (1, 1, 3), a suboptimal alignment, which is the next best hit in memory. After that the agent proceeds to (1, 2) which has the same number of differences as (1, 1, 3). BWA does not process nodes with more than z + 1 differences, i.e. (1, 3), (1, 1, 2) with the default option. C. A DFS strategy: DFS chooses the best hit (1, 1) from (1), and subsequently chooses (1, 1, 1) which returns an alignment. Then the agent goes back to (1, 1) to reach (1, 1, 3) as the next best hit to return the sub-optimal alignment. After that, the agent returns to (1, 1), then (1) to reach (1, 2) (not shown). Like BWA, BarraCUDA skips nodes with > z + 1 differences by default.

Mentions: The alignment kernel in BarraCUDA, like BWA and all other BWT-based sequence aligners, consists of a backward search string-matching algorithm [13,14] to look for alignments. Inexact alignment requires a search space of O(9n) in order to generate and evaluate a series of base substitutions that could lead to an exact string match (Figure 1a). BarraCUDA differs from BWA in its strategy to perform search traversal. While BWA utilises a time efficient breadth-first search (BFS) approach as shown in Figure 1b, it can utilize a maximum of 40 MB of memory space for each computational thread (please refer to the Additional file 1 for explanations). With thousands of concurrent threads on the GPU, the memory to each thread is very limited and BFS does not seem to be an option. Therefore in BarraCUDA, we adopted a difference-bound DFS approach (as outlined in Figure 1c) where it uses 20, 000 fold less memory to perform alignments (see Additional file 1). Rather than storing all partial hits while traversing through the search space, the DFS algorithm only stores in memory the branch of the search space with the local best hit score, i.e. (1), (1, 1) followed by (1, 1, 1) to give a full alignment (Figure 1c). Nonetheless, DFS is not as time efficient as BFS employed in BWA, BarraCUDA has to re-assess nodes multiple times until all possible hits from that node are evaluated, for instance in Figure 1c, the program has to return from (1, 1, 1) to (1, 1) in order to reach (1, 1, 3) as the next best hit, and this is particular a problem when the read length is long and the search space is extremely large.


BarraCUDA - a fast short read sequence aligner using graphics processing units.

Klus P, Lam S, Lyberg D, Cheung MS, Pullan G, McFarlane I, Yeo GSh, Lam BY - BMC Res Notes (2012)

The search space traversal for inexact alignments. A. There are 4 types of alignments for each base, namely exact match, mismatch, insertion and deletion. A mismatch alignment is performed by substituting the reference base G with the three other possible bases (A, T and C). Insertions are detected by deleting the base from the query sequence, and deletions by inserting all 4 bases into the query sequence. B. A BFS strategy: BFS starts from the first base (1) and stores all hits in daughter nodes (1, 1)...(1, 3) in memory (with shaded squares), then it chooses (1, 1) and expands it into (1, 1, 1) ... (1, 1, 3). With all the nodes in memory, the agent then evaluates (1, 1, 1) which returns an alignment followed by (1, 1, 3), a suboptimal alignment, which is the next best hit in memory. After that the agent proceeds to (1, 2) which has the same number of differences as (1, 1, 3). BWA does not process nodes with more than z + 1 differences, i.e. (1, 3), (1, 1, 2) with the default option. C. A DFS strategy: DFS chooses the best hit (1, 1) from (1), and subsequently chooses (1, 1, 1) which returns an alignment. Then the agent goes back to (1, 1) to reach (1, 1, 3) as the next best hit to return the sub-optimal alignment. After that, the agent returns to (1, 1), then (1) to reach (1, 2) (not shown). Like BWA, BarraCUDA skips nodes with > z + 1 differences by default.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3278344&req=5

Figure 1: The search space traversal for inexact alignments. A. There are 4 types of alignments for each base, namely exact match, mismatch, insertion and deletion. A mismatch alignment is performed by substituting the reference base G with the three other possible bases (A, T and C). Insertions are detected by deleting the base from the query sequence, and deletions by inserting all 4 bases into the query sequence. B. A BFS strategy: BFS starts from the first base (1) and stores all hits in daughter nodes (1, 1)...(1, 3) in memory (with shaded squares), then it chooses (1, 1) and expands it into (1, 1, 1) ... (1, 1, 3). With all the nodes in memory, the agent then evaluates (1, 1, 1) which returns an alignment followed by (1, 1, 3), a suboptimal alignment, which is the next best hit in memory. After that the agent proceeds to (1, 2) which has the same number of differences as (1, 1, 3). BWA does not process nodes with more than z + 1 differences, i.e. (1, 3), (1, 1, 2) with the default option. C. A DFS strategy: DFS chooses the best hit (1, 1) from (1), and subsequently chooses (1, 1, 1) which returns an alignment. Then the agent goes back to (1, 1) to reach (1, 1, 3) as the next best hit to return the sub-optimal alignment. After that, the agent returns to (1, 1), then (1) to reach (1, 2) (not shown). Like BWA, BarraCUDA skips nodes with > z + 1 differences by default.
Mentions: The alignment kernel in BarraCUDA, like BWA and all other BWT-based sequence aligners, consists of a backward search string-matching algorithm [13,14] to look for alignments. Inexact alignment requires a search space of O(9n) in order to generate and evaluate a series of base substitutions that could lead to an exact string match (Figure 1a). BarraCUDA differs from BWA in its strategy to perform search traversal. While BWA utilises a time efficient breadth-first search (BFS) approach as shown in Figure 1b, it can utilize a maximum of 40 MB of memory space for each computational thread (please refer to the Additional file 1 for explanations). With thousands of concurrent threads on the GPU, the memory to each thread is very limited and BFS does not seem to be an option. Therefore in BarraCUDA, we adopted a difference-bound DFS approach (as outlined in Figure 1c) where it uses 20, 000 fold less memory to perform alignments (see Additional file 1). Rather than storing all partial hits while traversing through the search space, the DFS algorithm only stores in memory the branch of the search space with the local best hit score, i.e. (1), (1, 1) followed by (1, 1, 1) to give a full alignment (Figure 1c). Nonetheless, DFS is not as time efficient as BFS employed in BWA, BarraCUDA has to re-assess nodes multiple times until all possible hits from that node are evaluated, for instance in Figure 1c, the program has to return from (1, 1, 1) to (1, 1) in order to reach (1, 1, 3) as the next best hit, and this is particular a problem when the read length is long and the search space is extremely large.

Bottom Line: General purpose computing on graphics processing units (GPGPU), extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC) clusters.As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity.BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments.

View Article: PubMed Central - HTML - PubMed

Affiliation: University of Cambridge Metabolic Research Laboratories, Institute of Metabolic Science, Box 289, Addenbrooke's Hospital, Hill's Road, Cambridge CB2 0QQ, UK. yhbl2@cam.ac.uk.

ABSTRACT

Background: With the maturation of next-generation DNA sequencing (NGS) technologies, the throughput of DNA sequencing reads has soared to over 600 gigabases from a single instrument run. General purpose computing on graphics processing units (GPGPU), extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC) clusters. In this article, we describe the implementation of BarraCUDA, a GPGPU sequence alignment software that is based on BWA, to accelerate the alignment of sequencing reads generated by these instruments to a reference DNA sequence.

Findings: Using the NVIDIA Compute Unified Device Architecture (CUDA) software development environment, we ported the most computational-intensive alignment component of BWA to GPU to take advantage of the massive parallelism. As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the alignment throughput.

Conclusions: BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology.BarraCUDA is currently available from http://seqbarracuda.sf.net.

No MeSH data available.