Limits...
Improving the Mapping of Smith-Waterman Sequence Database Searches onto CUDA-Enabled GPUs.

Huang LT, Wu CC, Lai LF, Li YJ - Biomed Res Int (2015)

Bottom Line: In this paper, we focused on how to improve the mapping, especially for short query sequences, by better usage of shared memory.We performed and evaluated the proposed method on two different platforms (Tesla C1060 and Tesla K20) and compared it with two classic methods in CUDASW++.Further, the performance on different numbers of threads and blocks has been analyzed.

View Article: PubMed Central - PubMed

Affiliation: Department of Medical Informatics, Tzu Chi University, Hualien 970, Taiwan.

ABSTRACT
Sequence alignment lies at heart of the bioinformatics. The Smith-Waterman algorithm is one of the key sequence search algorithms and has gained popularity due to improved implementations and rapidly increasing compute power. Recently, the Smith-Waterman algorithm has been successfully mapped onto the emerging general-purpose graphics processing units (GPUs). In this paper, we focused on how to improve the mapping, especially for short query sequences, by better usage of shared memory. We performed and evaluated the proposed method on two different platforms (Tesla C1060 and Tesla K20) and compared it with two classic methods in CUDASW++. Further, the performance on different numbers of threads and blocks has been analyzed. The results showed that the proposed method significantly improves Smith-Waterman algorithm on CUDA-enabled GPUs in proper allocation of block and thread numbers.

No MeSH data available.


Related in: MedlinePlus

The dependence relationship between matrices H, E, and F.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4538332&req=5

fig2: The dependence relationship between matrices H, E, and F.

Mentions: To perform the Smith-Waterman algorithm, at run time we need to allocate memory space for the three matrices: H, E, and F. Since the matrices might be very large but the data are intermediate, it is better not to save all the matrices data in the global memory. To reduce the required memory space as much as possible, we analyze the dependency relationship among the three matrices, as shown in Figure 2. To compute the element, H(i, j), we require to access the values of E(i, j) and F(i, j), meaning that we have to calculate E(i, j) and F(i, j) before H(i, j) in the same loop iteration. On the other hand, E(i, j) depends on E(i, j − 1) and H(i, j − 1) while F(i, j) depends on F(i − 1, j) and H(i − 1, j). It means that, when executing iteration (i, j), we require the intermediate data produced in iterations (i − 1, j) and (i, j − 1) only. Therefore, we have no need to save all the intermediate data of matrices H, E, and F on global memory. Instead, we can use registers to save the intermediate data produced in the previous iterations and the computation result in the current iteration. Assume the index of the inner loop is j; two registers are sufficient for storing E: one for E(i, j − 1) and one for E(i, j). However, a row of registers is required for F, which is infeasible because of very limited number of registers for a thread. Similarly, we need a row of registers plus two for H. Due to the limited number of registers available in a thread, we use shared memory to buffer the spilled values of registers.


Improving the Mapping of Smith-Waterman Sequence Database Searches onto CUDA-Enabled GPUs.

Huang LT, Wu CC, Lai LF, Li YJ - Biomed Res Int (2015)

The dependence relationship between matrices H, E, and F.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4538332&req=5

fig2: The dependence relationship between matrices H, E, and F.
Mentions: To perform the Smith-Waterman algorithm, at run time we need to allocate memory space for the three matrices: H, E, and F. Since the matrices might be very large but the data are intermediate, it is better not to save all the matrices data in the global memory. To reduce the required memory space as much as possible, we analyze the dependency relationship among the three matrices, as shown in Figure 2. To compute the element, H(i, j), we require to access the values of E(i, j) and F(i, j), meaning that we have to calculate E(i, j) and F(i, j) before H(i, j) in the same loop iteration. On the other hand, E(i, j) depends on E(i, j − 1) and H(i, j − 1) while F(i, j) depends on F(i − 1, j) and H(i − 1, j). It means that, when executing iteration (i, j), we require the intermediate data produced in iterations (i − 1, j) and (i, j − 1) only. Therefore, we have no need to save all the intermediate data of matrices H, E, and F on global memory. Instead, we can use registers to save the intermediate data produced in the previous iterations and the computation result in the current iteration. Assume the index of the inner loop is j; two registers are sufficient for storing E: one for E(i, j − 1) and one for E(i, j). However, a row of registers is required for F, which is infeasible because of very limited number of registers for a thread. Similarly, we need a row of registers plus two for H. Due to the limited number of registers available in a thread, we use shared memory to buffer the spilled values of registers.

Bottom Line: In this paper, we focused on how to improve the mapping, especially for short query sequences, by better usage of shared memory.We performed and evaluated the proposed method on two different platforms (Tesla C1060 and Tesla K20) and compared it with two classic methods in CUDASW++.Further, the performance on different numbers of threads and blocks has been analyzed.

View Article: PubMed Central - PubMed

Affiliation: Department of Medical Informatics, Tzu Chi University, Hualien 970, Taiwan.

ABSTRACT
Sequence alignment lies at heart of the bioinformatics. The Smith-Waterman algorithm is one of the key sequence search algorithms and has gained popularity due to improved implementations and rapidly increasing compute power. Recently, the Smith-Waterman algorithm has been successfully mapped onto the emerging general-purpose graphics processing units (GPUs). In this paper, we focused on how to improve the mapping, especially for short query sequences, by better usage of shared memory. We performed and evaluated the proposed method on two different platforms (Tesla C1060 and Tesla K20) and compared it with two classic methods in CUDASW++. Further, the performance on different numbers of threads and blocks has been analyzed. The results showed that the proposed method significantly improves Smith-Waterman algorithm on CUDA-enabled GPUs in proper allocation of block and thread numbers.

No MeSH data available.


Related in: MedlinePlus