Limits...
Accelerating fibre orientation estimation from diffusion weighted magnetic resonance imaging using GPUs.

Hernández M, Guerrero GD, Cecilia JM, García JM, Inuggi A, Jbabdi S, Behrens TE, Sotiropoulos SN - PLoS ONE (2013)

Bottom Line: With the performance of central processing units (CPUs) having effectively reached a limit, parallel processing offers an alternative for applications with high computational demands.We show that the parameter estimation, performed through Markov Chain Monte Carlo (MCMC), is accelerated by at least two orders of magnitude, when comparing a single GPU with the respective sequential single-core CPU version.We also illustrate similar speed-up factors (up to 120x) when comparing a multi-GPU with a multi-CPU implementation.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, University of Murcia, Murcia, Spain. moises.hernandez@um.es

ABSTRACT
With the performance of central processing units (CPUs) having effectively reached a limit, parallel processing offers an alternative for applications with high computational demands. Modern graphics processing units (GPUs) are massively parallel processors that can execute simultaneously thousands of light-weight processes. In this study, we propose and implement a parallel GPU-based design of a popular method that is used for the analysis of brain magnetic resonance imaging (MRI). More specifically, we are concerned with a model-based approach for extracting tissue structural information from diffusion-weighted (DW) MRI data. DW-MRI offers, through tractography approaches, the only way to study brain structural connectivity, non-invasively and in-vivo. We parallelise the Bayesian inference framework for the ball & stick model, as it is implemented in the tractography toolbox of the popular FSL software package (University of Oxford). For our implementation, we utilise the Compute Unified Device Architecture (CUDA) programming model. We show that the parameter estimation, performed through Markov Chain Monte Carlo (MCMC), is accelerated by at least two orders of magnitude, when comparing a single GPU with the respective sequential single-core CPU version. We also illustrate similar speed-up factors (up to 120x) when comparing a multi-GPU with a multi-CPU implementation.

Show MeSH
Comparison of single-core CPU and GPU execution times (in log scale) running the Levenberg-Marquardt algorithm with speed gains over two orders of magnitude: (a) As the number of Levenberg-Marquardt iterations are increased, and (b) as the number of voxels per slice are increased.The execution times for (a) are for a slice of 4804 voxels, with the convergence criterion of the algorithm decreased to allow more iterations. For each case, results are shown for different number K of gradient directions (64, 128 and 256) and for estimating  fibres.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3643787&req=5

pone-0061892-g008: Comparison of single-core CPU and GPU execution times (in log scale) running the Levenberg-Marquardt algorithm with speed gains over two orders of magnitude: (a) As the number of Levenberg-Marquardt iterations are increased, and (b) as the number of voxels per slice are increased.The execution times for (a) are for a slice of 4804 voxels, with the convergence criterion of the algorithm decreased to allow more iterations. For each case, results are shown for different number K of gradient directions (64, 128 and 256) and for estimating fibres.

Mentions: Figure 8 shows the performance evaluation of the MCMC kernel on a single GPU compared to the sequential counterpart version in a single-core CPU for three data sets with different number K of gradient directions (64, 128 and 256) and for estimating fibres. Similar as before, Figure 8a presents execution times as the number of MCMC iterations increase and Figure 8b as the number of voxels to be processed increase. In both cases, we can see the scalability of the GPU version versus the CPU version. The maximum speed-up for 128 gradient directions was 135x.


Accelerating fibre orientation estimation from diffusion weighted magnetic resonance imaging using GPUs.

Hernández M, Guerrero GD, Cecilia JM, García JM, Inuggi A, Jbabdi S, Behrens TE, Sotiropoulos SN - PLoS ONE (2013)

Comparison of single-core CPU and GPU execution times (in log scale) running the Levenberg-Marquardt algorithm with speed gains over two orders of magnitude: (a) As the number of Levenberg-Marquardt iterations are increased, and (b) as the number of voxels per slice are increased.The execution times for (a) are for a slice of 4804 voxels, with the convergence criterion of the algorithm decreased to allow more iterations. For each case, results are shown for different number K of gradient directions (64, 128 and 256) and for estimating  fibres.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3643787&req=5

pone-0061892-g008: Comparison of single-core CPU and GPU execution times (in log scale) running the Levenberg-Marquardt algorithm with speed gains over two orders of magnitude: (a) As the number of Levenberg-Marquardt iterations are increased, and (b) as the number of voxels per slice are increased.The execution times for (a) are for a slice of 4804 voxels, with the convergence criterion of the algorithm decreased to allow more iterations. For each case, results are shown for different number K of gradient directions (64, 128 and 256) and for estimating fibres.
Mentions: Figure 8 shows the performance evaluation of the MCMC kernel on a single GPU compared to the sequential counterpart version in a single-core CPU for three data sets with different number K of gradient directions (64, 128 and 256) and for estimating fibres. Similar as before, Figure 8a presents execution times as the number of MCMC iterations increase and Figure 8b as the number of voxels to be processed increase. In both cases, we can see the scalability of the GPU version versus the CPU version. The maximum speed-up for 128 gradient directions was 135x.

Bottom Line: With the performance of central processing units (CPUs) having effectively reached a limit, parallel processing offers an alternative for applications with high computational demands.We show that the parameter estimation, performed through Markov Chain Monte Carlo (MCMC), is accelerated by at least two orders of magnitude, when comparing a single GPU with the respective sequential single-core CPU version.We also illustrate similar speed-up factors (up to 120x) when comparing a multi-GPU with a multi-CPU implementation.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, University of Murcia, Murcia, Spain. moises.hernandez@um.es

ABSTRACT
With the performance of central processing units (CPUs) having effectively reached a limit, parallel processing offers an alternative for applications with high computational demands. Modern graphics processing units (GPUs) are massively parallel processors that can execute simultaneously thousands of light-weight processes. In this study, we propose and implement a parallel GPU-based design of a popular method that is used for the analysis of brain magnetic resonance imaging (MRI). More specifically, we are concerned with a model-based approach for extracting tissue structural information from diffusion-weighted (DW) MRI data. DW-MRI offers, through tractography approaches, the only way to study brain structural connectivity, non-invasively and in-vivo. We parallelise the Bayesian inference framework for the ball & stick model, as it is implemented in the tractography toolbox of the popular FSL software package (University of Oxford). For our implementation, we utilise the Compute Unified Device Architecture (CUDA) programming model. We show that the parameter estimation, performed through Markov Chain Monte Carlo (MCMC), is accelerated by at least two orders of magnitude, when comparing a single GPU with the respective sequential single-core CPU version. We also illustrate similar speed-up factors (up to 120x) when comparing a multi-GPU with a multi-CPU implementation.

Show MeSH