Limits...
Accelerating fibre orientation estimation from diffusion weighted magnetic resonance imaging using GPUs.

Hernández M, Guerrero GD, Cecilia JM, García JM, Inuggi A, Jbabdi S, Behrens TE, Sotiropoulos SN - PLoS ONE (2013)

Bottom Line: With the performance of central processing units (CPUs) having effectively reached a limit, parallel processing offers an alternative for applications with high computational demands.We show that the parameter estimation, performed through Markov Chain Monte Carlo (MCMC), is accelerated by at least two orders of magnitude, when comparing a single GPU with the respective sequential single-core CPU version.We also illustrate similar speed-up factors (up to 120x) when comparing a multi-GPU with a multi-CPU implementation.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, University of Murcia, Murcia, Spain. moises.hernandez@um.es

ABSTRACT
With the performance of central processing units (CPUs) having effectively reached a limit, parallel processing offers an alternative for applications with high computational demands. Modern graphics processing units (GPUs) are massively parallel processors that can execute simultaneously thousands of light-weight processes. In this study, we propose and implement a parallel GPU-based design of a popular method that is used for the analysis of brain magnetic resonance imaging (MRI). More specifically, we are concerned with a model-based approach for extracting tissue structural information from diffusion-weighted (DW) MRI data. DW-MRI offers, through tractography approaches, the only way to study brain structural connectivity, non-invasively and in-vivo. We parallelise the Bayesian inference framework for the ball & stick model, as it is implemented in the tractography toolbox of the popular FSL software package (University of Oxford). For our implementation, we utilise the Compute Unified Device Architecture (CUDA) programming model. We show that the parameter estimation, performed through Markov Chain Monte Carlo (MCMC), is accelerated by at least two orders of magnitude, when comparing a single GPU with the respective sequential single-core CPU version. We also illustrate similar speed-up factors (up to 120x) when comparing a multi-GPU with a multi-CPU implementation.

Show MeSH

Related in: MedlinePlus

Comparison between CPU and GPU model estimates for the diffusivity d, the baseline signal  and the volume fraction of the first fibre , in different brain areas.(a) A corpus callosum voxel, (b) a centrum semiovale voxel and (c) a grey matter voxel. Each design was ran 1000 times on the same data and for each repeat the mean of the posterior distribution of the respective parameter was recorded. The histograms show the distributions of these means across all 1000 repeats. For each repeat, a burn-in period of 3000 iterations and a thinning period of 25 samples was used for the MCMC.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3643787&req=5

pone-0061892-g007: Comparison between CPU and GPU model estimates for the diffusivity d, the baseline signal and the volume fraction of the first fibre , in different brain areas.(a) A corpus callosum voxel, (b) a centrum semiovale voxel and (c) a grey matter voxel. Each design was ran 1000 times on the same data and for each repeat the mean of the posterior distribution of the respective parameter was recorded. The histograms show the distributions of these means across all 1000 repeats. For each repeat, a burn-in period of 3000 iterations and a thinning period of 25 samples was used for the MCMC.

Mentions: Figure 7 shows a performance comparison between a single-core CPU and a GPU when executing Levenberg-Marquardt for three data sets with different number K of gradient directions (64, 128 and 256) and when estimating fibres. In Figure 7a execution times for a slice are shown as the number of iterations is increased (i.e. the convergence criterion for the optimisation is reduced). A good scalability of the GPU version is shown compared to the CPU version. In Figure 7b, execution times are shown for different slice sizes (i.e. different number of voxels to be processed). A linear speedup is obtained whenever the number of voxels increases. The maximum speed-up for 128 gradient directions was 120x.


Accelerating fibre orientation estimation from diffusion weighted magnetic resonance imaging using GPUs.

Hernández M, Guerrero GD, Cecilia JM, García JM, Inuggi A, Jbabdi S, Behrens TE, Sotiropoulos SN - PLoS ONE (2013)

Comparison between CPU and GPU model estimates for the diffusivity d, the baseline signal  and the volume fraction of the first fibre , in different brain areas.(a) A corpus callosum voxel, (b) a centrum semiovale voxel and (c) a grey matter voxel. Each design was ran 1000 times on the same data and for each repeat the mean of the posterior distribution of the respective parameter was recorded. The histograms show the distributions of these means across all 1000 repeats. For each repeat, a burn-in period of 3000 iterations and a thinning period of 25 samples was used for the MCMC.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3643787&req=5

pone-0061892-g007: Comparison between CPU and GPU model estimates for the diffusivity d, the baseline signal and the volume fraction of the first fibre , in different brain areas.(a) A corpus callosum voxel, (b) a centrum semiovale voxel and (c) a grey matter voxel. Each design was ran 1000 times on the same data and for each repeat the mean of the posterior distribution of the respective parameter was recorded. The histograms show the distributions of these means across all 1000 repeats. For each repeat, a burn-in period of 3000 iterations and a thinning period of 25 samples was used for the MCMC.
Mentions: Figure 7 shows a performance comparison between a single-core CPU and a GPU when executing Levenberg-Marquardt for three data sets with different number K of gradient directions (64, 128 and 256) and when estimating fibres. In Figure 7a execution times for a slice are shown as the number of iterations is increased (i.e. the convergence criterion for the optimisation is reduced). A good scalability of the GPU version is shown compared to the CPU version. In Figure 7b, execution times are shown for different slice sizes (i.e. different number of voxels to be processed). A linear speedup is obtained whenever the number of voxels increases. The maximum speed-up for 128 gradient directions was 120x.

Bottom Line: With the performance of central processing units (CPUs) having effectively reached a limit, parallel processing offers an alternative for applications with high computational demands.We show that the parameter estimation, performed through Markov Chain Monte Carlo (MCMC), is accelerated by at least two orders of magnitude, when comparing a single GPU with the respective sequential single-core CPU version.We also illustrate similar speed-up factors (up to 120x) when comparing a multi-GPU with a multi-CPU implementation.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science, University of Murcia, Murcia, Spain. moises.hernandez@um.es

ABSTRACT
With the performance of central processing units (CPUs) having effectively reached a limit, parallel processing offers an alternative for applications with high computational demands. Modern graphics processing units (GPUs) are massively parallel processors that can execute simultaneously thousands of light-weight processes. In this study, we propose and implement a parallel GPU-based design of a popular method that is used for the analysis of brain magnetic resonance imaging (MRI). More specifically, we are concerned with a model-based approach for extracting tissue structural information from diffusion-weighted (DW) MRI data. DW-MRI offers, through tractography approaches, the only way to study brain structural connectivity, non-invasively and in-vivo. We parallelise the Bayesian inference framework for the ball & stick model, as it is implemented in the tractography toolbox of the popular FSL software package (University of Oxford). For our implementation, we utilise the Compute Unified Device Architecture (CUDA) programming model. We show that the parameter estimation, performed through Markov Chain Monte Carlo (MCMC), is accelerated by at least two orders of magnitude, when comparing a single GPU with the respective sequential single-core CPU version. We also illustrate similar speed-up factors (up to 120x) when comparing a multi-GPU with a multi-CPU implementation.

Show MeSH
Related in: MedlinePlus