Limits...
Accelerating pairwise statistical significance estimation for local alignment by harvesting GPU's power.

Zhang Y, Misra S, Agrawal A, Patwary MM, Liao WK, Qin Z, Choudhary A - BMC Bioinformatics (2012)

Bottom Line: We further extend the parallelization technique to estimate pairwise statistical significance using position-specific substitution matrices, which has earlier demonstrated significantly better sequence comparison accuracy than using standard substitution matrices.We observe end-to-end speedups of nearly 250 (370) × using single-GPU Tesla C2050 GPU (dual-Tesla C2050) over the CPU implementation using Intel Corei7 CPU 920 processor.Harvesting the high performance of modern GPUs is a promising approach to accelerate pairwise statistical significance estimation for local sequence alignment.

View Article: PubMed Central - HTML - PubMed

Affiliation: School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China. yuhongzhang@uestc.edu.cn

ABSTRACT

Background: Pairwise statistical significance has been recognized to be able to accurately identify related sequences, which is a very important cornerstone procedure in numerous bioinformatics applications. However, it is both computationally and data intensive, which poses a big challenge in terms of performance and scalability.

Results: We present a GPU implementation to accelerate pairwise statistical significance estimation of local sequence alignment using standard substitution matrices. By carefully studying the algorithm's data access characteristics, we developed a tile-based scheme that can produce a contiguous data access in the GPU global memory and sustain a large number of threads to achieve a high GPU occupancy. We further extend the parallelization technique to estimate pairwise statistical significance using position-specific substitution matrices, which has earlier demonstrated significantly better sequence comparison accuracy than using standard substitution matrices. The implementation is also extended to take advantage of dual-GPUs. We observe end-to-end speedups of nearly 250 (370) × using single-GPU Tesla C2050 GPU (dual-Tesla C2050) over the CPU implementation using Intel Corei7 CPU 920 processor.

Conclusions: Harvesting the high performance of modern GPUs is a promising approach to accelerate pairwise statistical significance estimation for local sequence alignment.

Show MeSH
Adaptive tile-based strategy. The optimal tile size T can be calculated according to the hardware configuration of GPU. After calculated, the T subject sequences together are transferred to GPU global memory. Permutations and alignments are done in parallel in GPU. Then T × 1000 alignment scores are moved back to CPU for T fittings.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3318904&req=5

Figure 3: Adaptive tile-based strategy. The optimal tile size T can be calculated according to the hardware configuration of GPU. After calculated, the T subject sequences together are transferred to GPU global memory. Permutations and alignments are done in parallel in GPU. Then T × 1000 alignment scores are moved back to CPU for T fittings.

Mentions: In Tesla C2050 used in our experiments, there are 14 SMs and the maximum number of resident threads per SM is 1024. Let N = 1000, then, T = ⌊14 × 1024/1000⌋ = 14, which means that there are 14 distinct subject sequences to be transferred to the GPU's global memory at a time. Based on the second strategy (i.e., data reuse), 14 subject sequences and their permuted copies are aligned against one query sequence at a time, until all the query sequences are processed. As a result, 14 × 1000 alignment scores in total are obtained in each round, which are subsequently transferred to CPU for fitting. The CPU takes the 1000 alignment scores for each subject-query sequence pair and uses them to compute the corresponding pss. The tile-based strategy has been described in Figure 3.


Accelerating pairwise statistical significance estimation for local alignment by harvesting GPU's power.

Zhang Y, Misra S, Agrawal A, Patwary MM, Liao WK, Qin Z, Choudhary A - BMC Bioinformatics (2012)

Adaptive tile-based strategy. The optimal tile size T can be calculated according to the hardware configuration of GPU. After calculated, the T subject sequences together are transferred to GPU global memory. Permutations and alignments are done in parallel in GPU. Then T × 1000 alignment scores are moved back to CPU for T fittings.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3318904&req=5

Figure 3: Adaptive tile-based strategy. The optimal tile size T can be calculated according to the hardware configuration of GPU. After calculated, the T subject sequences together are transferred to GPU global memory. Permutations and alignments are done in parallel in GPU. Then T × 1000 alignment scores are moved back to CPU for T fittings.
Mentions: In Tesla C2050 used in our experiments, there are 14 SMs and the maximum number of resident threads per SM is 1024. Let N = 1000, then, T = ⌊14 × 1024/1000⌋ = 14, which means that there are 14 distinct subject sequences to be transferred to the GPU's global memory at a time. Based on the second strategy (i.e., data reuse), 14 subject sequences and their permuted copies are aligned against one query sequence at a time, until all the query sequences are processed. As a result, 14 × 1000 alignment scores in total are obtained in each round, which are subsequently transferred to CPU for fitting. The CPU takes the 1000 alignment scores for each subject-query sequence pair and uses them to compute the corresponding pss. The tile-based strategy has been described in Figure 3.

Bottom Line: We further extend the parallelization technique to estimate pairwise statistical significance using position-specific substitution matrices, which has earlier demonstrated significantly better sequence comparison accuracy than using standard substitution matrices.We observe end-to-end speedups of nearly 250 (370) × using single-GPU Tesla C2050 GPU (dual-Tesla C2050) over the CPU implementation using Intel Corei7 CPU 920 processor.Harvesting the high performance of modern GPUs is a promising approach to accelerate pairwise statistical significance estimation for local sequence alignment.

View Article: PubMed Central - HTML - PubMed

Affiliation: School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China. yuhongzhang@uestc.edu.cn

ABSTRACT

Background: Pairwise statistical significance has been recognized to be able to accurately identify related sequences, which is a very important cornerstone procedure in numerous bioinformatics applications. However, it is both computationally and data intensive, which poses a big challenge in terms of performance and scalability.

Results: We present a GPU implementation to accelerate pairwise statistical significance estimation of local sequence alignment using standard substitution matrices. By carefully studying the algorithm's data access characteristics, we developed a tile-based scheme that can produce a contiguous data access in the GPU global memory and sustain a large number of threads to achieve a high GPU occupancy. We further extend the parallelization technique to estimate pairwise statistical significance using position-specific substitution matrices, which has earlier demonstrated significantly better sequence comparison accuracy than using standard substitution matrices. The implementation is also extended to take advantage of dual-GPUs. We observe end-to-end speedups of nearly 250 (370) × using single-GPU Tesla C2050 GPU (dual-Tesla C2050) over the CPU implementation using Intel Corei7 CPU 920 processor.

Conclusions: Harvesting the high performance of modern GPUs is a promising approach to accelerate pairwise statistical significance estimation for local sequence alignment.

Show MeSH