Limits...
BarraCUDA - a fast short read sequence aligner using graphics processing units.

Klus P, Lam S, Lyberg D, Cheung MS, Pullan G, McFarlane I, Yeo GSh, Lam BY - BMC Res Notes (2012)

Bottom Line: General purpose computing on graphics processing units (GPGPU), extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC) clusters.As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity.BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments.

View Article: PubMed Central - HTML - PubMed

Affiliation: University of Cambridge Metabolic Research Laboratories, Institute of Metabolic Science, Box 289, Addenbrooke's Hospital, Hill's Road, Cambridge CB2 0QQ, UK. yhbl2@cam.ac.uk.

ABSTRACT

Background: With the maturation of next-generation DNA sequencing (NGS) technologies, the throughput of DNA sequencing reads has soared to over 600 gigabases from a single instrument run. General purpose computing on graphics processing units (GPGPU), extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC) clusters. In this article, we describe the implementation of BarraCUDA, a GPGPU sequence alignment software that is based on BWA, to accelerate the alignment of sequencing reads generated by these instruments to a reference DNA sequence.

Findings: Using the NVIDIA Compute Unified Device Architecture (CUDA) software development environment, we ported the most computational-intensive alignment component of BWA to GPU to take advantage of the massive parallelism. As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the alignment throughput.

Conclusions: BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology.BarraCUDA is currently available from http://seqbarracuda.sf.net.

No MeSH data available.


The scalability of alignment throughputs using multiple GPUs and CPUs. This figures shows the effect on alignment throughputs (in megabases per seconds, Mbp/s) when multiple GPUs and CPUs were used to map a single-end library containing 13.6 million 96 bp reads to the D. Melanogaster genome. A computer node containing two 6-core Intel Xeon 5670 s and eight NVIDIA Tesla M2050 were used in this test. The throughputs of BWA were measured with 1, 2, 4, 6, 8, 10 and 12 threads and BarraCUDA with 1, 2, 4, 6 and 8 M2050s using default options.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3278344&req=5

Figure 4: The scalability of alignment throughputs using multiple GPUs and CPUs. This figures shows the effect on alignment throughputs (in megabases per seconds, Mbp/s) when multiple GPUs and CPUs were used to map a single-end library containing 13.6 million 96 bp reads to the D. Melanogaster genome. A computer node containing two 6-core Intel Xeon 5670 s and eight NVIDIA Tesla M2050 were used in this test. The throughputs of BWA were measured with 1, 2, 4, 6, 8, 10 and 12 threads and BarraCUDA with 1, 2, 4, 6 and 8 M2050s using default options.

Mentions: Figure 4 shows the scalability of using multiple GPUs and CPUs in aligning another whole-genome shotgun library of 13.5 million single-end 95 bp reads (ENA accession: SRR063699) to the Drosophila Melanogaster genome (BDGP5.25.63). Similar to the human library we examined earlier, the alignment throughput of BarraCUDA with 1 Tesla M2050 GPU was similar to that of BWA with 6 CPU cores (Xeon X5670 2.93 GHz with 8 GB DDR3 RAM). We tried to boost further the speed of BWA with more CPU cores, but we did not find any additional benefit beyond 8 cores. On the other hand we found that using BarraCUDA with two GPUs already outperformed BWA using all 12 cores (2× Xeon X5670s) at 2.5 Mbp/s, and the alignment throughput when used with 8 GPUs took only 3.8 min, which was 2.8 times the speed of BWA utilising all 12 CPU cores available on the computer node. The difference in the scalabilities between CPUs and GPUs is mainly due to the difference in memory bandwidths, where each GPU has exclusive access to their own dedicated on-board memory, the system memory on the computer is shared among 12 CPU cores, and this become a bottleneck when there are more than 8 BWA threads running at the same time.


BarraCUDA - a fast short read sequence aligner using graphics processing units.

Klus P, Lam S, Lyberg D, Cheung MS, Pullan G, McFarlane I, Yeo GSh, Lam BY - BMC Res Notes (2012)

The scalability of alignment throughputs using multiple GPUs and CPUs. This figures shows the effect on alignment throughputs (in megabases per seconds, Mbp/s) when multiple GPUs and CPUs were used to map a single-end library containing 13.6 million 96 bp reads to the D. Melanogaster genome. A computer node containing two 6-core Intel Xeon 5670 s and eight NVIDIA Tesla M2050 were used in this test. The throughputs of BWA were measured with 1, 2, 4, 6, 8, 10 and 12 threads and BarraCUDA with 1, 2, 4, 6 and 8 M2050s using default options.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3278344&req=5

Figure 4: The scalability of alignment throughputs using multiple GPUs and CPUs. This figures shows the effect on alignment throughputs (in megabases per seconds, Mbp/s) when multiple GPUs and CPUs were used to map a single-end library containing 13.6 million 96 bp reads to the D. Melanogaster genome. A computer node containing two 6-core Intel Xeon 5670 s and eight NVIDIA Tesla M2050 were used in this test. The throughputs of BWA were measured with 1, 2, 4, 6, 8, 10 and 12 threads and BarraCUDA with 1, 2, 4, 6 and 8 M2050s using default options.
Mentions: Figure 4 shows the scalability of using multiple GPUs and CPUs in aligning another whole-genome shotgun library of 13.5 million single-end 95 bp reads (ENA accession: SRR063699) to the Drosophila Melanogaster genome (BDGP5.25.63). Similar to the human library we examined earlier, the alignment throughput of BarraCUDA with 1 Tesla M2050 GPU was similar to that of BWA with 6 CPU cores (Xeon X5670 2.93 GHz with 8 GB DDR3 RAM). We tried to boost further the speed of BWA with more CPU cores, but we did not find any additional benefit beyond 8 cores. On the other hand we found that using BarraCUDA with two GPUs already outperformed BWA using all 12 cores (2× Xeon X5670s) at 2.5 Mbp/s, and the alignment throughput when used with 8 GPUs took only 3.8 min, which was 2.8 times the speed of BWA utilising all 12 CPU cores available on the computer node. The difference in the scalabilities between CPUs and GPUs is mainly due to the difference in memory bandwidths, where each GPU has exclusive access to their own dedicated on-board memory, the system memory on the computer is shared among 12 CPU cores, and this become a bottleneck when there are more than 8 BWA threads running at the same time.

Bottom Line: General purpose computing on graphics processing units (GPGPU), extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC) clusters.As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity.BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments.

View Article: PubMed Central - HTML - PubMed

Affiliation: University of Cambridge Metabolic Research Laboratories, Institute of Metabolic Science, Box 289, Addenbrooke's Hospital, Hill's Road, Cambridge CB2 0QQ, UK. yhbl2@cam.ac.uk.

ABSTRACT

Background: With the maturation of next-generation DNA sequencing (NGS) technologies, the throughput of DNA sequencing reads has soared to over 600 gigabases from a single instrument run. General purpose computing on graphics processing units (GPGPU), extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC) clusters. In this article, we describe the implementation of BarraCUDA, a GPGPU sequence alignment software that is based on BWA, to accelerate the alignment of sequencing reads generated by these instruments to a reference DNA sequence.

Findings: Using the NVIDIA Compute Unified Device Architecture (CUDA) software development environment, we ported the most computational-intensive alignment component of BWA to GPU to take advantage of the massive parallelism. As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the alignment throughput.

Conclusions: BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology.BarraCUDA is currently available from http://seqbarracuda.sf.net.

No MeSH data available.