Limits...
Improving the Mapping of Smith-Waterman Sequence Database Searches onto CUDA-Enabled GPUs.

Huang LT, Wu CC, Lai LF, Li YJ - Biomed Res Int (2015)

Bottom Line: In this paper, we focused on how to improve the mapping, especially for short query sequences, by better usage of shared memory.We performed and evaluated the proposed method on two different platforms (Tesla C1060 and Tesla K20) and compared it with two classic methods in CUDASW++.Further, the performance on different numbers of threads and blocks has been analyzed.

View Article: PubMed Central - PubMed

Affiliation: Department of Medical Informatics, Tzu Chi University, Hualien 970, Taiwan.

ABSTRACT
Sequence alignment lies at heart of the bioinformatics. The Smith-Waterman algorithm is one of the key sequence search algorithms and has gained popularity due to improved implementations and rapidly increasing compute power. Recently, the Smith-Waterman algorithm has been successfully mapped onto the emerging general-purpose graphics processing units (GPUs). In this paper, we focused on how to improve the mapping, especially for short query sequences, by better usage of shared memory. We performed and evaluated the proposed method on two different platforms (Tesla C1060 and Tesla K20) and compared it with two classic methods in CUDASW++. Further, the performance on different numbers of threads and blocks has been analyzed. The results showed that the proposed method significantly improves Smith-Waterman algorithm on CUDA-enabled GPUs in proper allocation of block and thread numbers.

No MeSH data available.


The speedup of our method over CUDASW++ 2.0 on Tesla C1060 with 256 blocks.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4538332&req=5

fig4: The speedup of our method over CUDASW++ 2.0 on Tesla C1060 with 256 blocks.

Mentions: We present the speedup of our method in the following, where the speedup is to divide the execution time of a method by the execution time of CUDASW++ 2.0. The performance improvement of our method over CUDASW++ 2.0 is shown in Figure 4. The C1060 has 16 K-byte shared memory. Because there are 256 threads in each block, each thread is assigned with 64-byte, that is, 16-word, shared memory for storing the matrices H, E, and F. The best speedup is about 1.14 when the query sequence is P36515 consisting of four amino acids.


Improving the Mapping of Smith-Waterman Sequence Database Searches onto CUDA-Enabled GPUs.

Huang LT, Wu CC, Lai LF, Li YJ - Biomed Res Int (2015)

The speedup of our method over CUDASW++ 2.0 on Tesla C1060 with 256 blocks.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4538332&req=5

fig4: The speedup of our method over CUDASW++ 2.0 on Tesla C1060 with 256 blocks.
Mentions: We present the speedup of our method in the following, where the speedup is to divide the execution time of a method by the execution time of CUDASW++ 2.0. The performance improvement of our method over CUDASW++ 2.0 is shown in Figure 4. The C1060 has 16 K-byte shared memory. Because there are 256 threads in each block, each thread is assigned with 64-byte, that is, 16-word, shared memory for storing the matrices H, E, and F. The best speedup is about 1.14 when the query sequence is P36515 consisting of four amino acids.

Bottom Line: In this paper, we focused on how to improve the mapping, especially for short query sequences, by better usage of shared memory.We performed and evaluated the proposed method on two different platforms (Tesla C1060 and Tesla K20) and compared it with two classic methods in CUDASW++.Further, the performance on different numbers of threads and blocks has been analyzed.

View Article: PubMed Central - PubMed

Affiliation: Department of Medical Informatics, Tzu Chi University, Hualien 970, Taiwan.

ABSTRACT
Sequence alignment lies at heart of the bioinformatics. The Smith-Waterman algorithm is one of the key sequence search algorithms and has gained popularity due to improved implementations and rapidly increasing compute power. Recently, the Smith-Waterman algorithm has been successfully mapped onto the emerging general-purpose graphics processing units (GPUs). In this paper, we focused on how to improve the mapping, especially for short query sequences, by better usage of shared memory. We performed and evaluated the proposed method on two different platforms (Tesla C1060 and Tesla K20) and compared it with two classic methods in CUDASW++. Further, the performance on different numbers of threads and blocks has been analyzed. The results showed that the proposed method significantly improves Smith-Waterman algorithm on CUDA-enabled GPUs in proper allocation of block and thread numbers.

No MeSH data available.