Limits...
Heterogeneous computing architecture for fast detection of SNP-SNP interactions.

Sluga D, Curk T, Zupan B, Lotric U - BMC Bioinformatics (2014)

Bottom Line: Their utility resulted in an order of magnitude shorter execution times when compared to the single-threaded CPU implementation.GPU implementation on a single Nvidia Tesla K20 runs twice as fast as that for the MIC architecture-based Xeon Phi P5110 coprocessor, but also requires considerably more programming effort.On the other hand the new MIC architecture, albeit lacking in performance reduces the programming effort and makes it up with a more general architecture suitable for a wider range of problems.

View Article: PubMed Central - HTML - PubMed

Affiliation: Faculty of Computer and Information Science, University of Ljubljana, Trzaska 25, SI 1000 Ljubljana, SI, Slovenia. uros.lotric@fri.uni-lj.si.

ABSTRACT

Background: The extent of data in a typical genome-wide association study (GWAS) poses considerable computational challenges to software tools for gene-gene interaction discovery. Exhaustive evaluation of all interactions among hundreds of thousands to millions of single nucleotide polymorphisms (SNPs) may require weeks or even months of computation. Massively parallel hardware within a modern Graphic Processing Unit (GPU) and Many Integrated Core (MIC) coprocessors can shorten the run time considerably. While the utility of GPU-based implementations in bioinformatics has been well studied, MIC architecture has been introduced only recently and may provide a number of comparative advantages that have yet to be explored and tested.

Results: We have developed a heterogeneous, GPU and Intel MIC-accelerated software module for SNP-SNP interaction discovery to replace the previously single-threaded computational core in the interactive web-based data exploration program SNPsyn. We report on differences between these two modern massively parallel architectures and their software environments. Their utility resulted in an order of magnitude shorter execution times when compared to the single-threaded CPU implementation. GPU implementation on a single Nvidia Tesla K20 runs twice as fast as that for the MIC architecture-based Xeon Phi P5110 coprocessor, but also requires considerably more programming effort.

Conclusions: General purpose GPUs are a mature platform with large amounts of computing power capable of tackling inherently parallel problems, but can prove demanding for the programmer. On the other hand the new MIC architecture, albeit lacking in performance reduces the programming effort and makes it up with a more general architecture suitable for a wider range of problems.

Show MeSH
MIC code snippet. The first pragma directive marks the start of a MIC code section. Keywords in and out indicate the data to be transferred to and from the Xeon Phi. The OpenMP clause omp parallel for launches all available threads in parallel, which execute the code in the body of the loop and score the SNP pairs.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4230497&req=5

Figure 4: MIC code snippet. The first pragma directive marks the start of a MIC code section. Keywords in and out indicate the data to be transferred to and from the Xeon Phi. The OpenMP clause omp parallel for launches all available threads in parallel, which execute the code in the body of the loop and score the SNP pairs.

Mentions: Intel provides a C++ compiler suite and all the tools needed to exploit the hardware [21]. The code can be parallelized using OpenMP directives or the MPI library and compiled for the MIC architecture. Resulting applications can then run only on the Xeon Phi coprocessors. Another, more general way to specify parallel execution is to use offload constructs along with OpenMP to mark the data and the code to be transferred and executed on the Xeon Phi. All other parts of the program will run normally on the host computer CPU. A third possibility is to use OpenCL framework in the same manner as with GPUs.MIC development tools facilitate data management through compiler directives. The example in Figure 4 demonstrates this programming paradigm. It performs the same operation as the snippet from Figure 3. The programmer marks the data and the code that is needed on the coprocessor. All memory allocations and transfers are done implicitly. To obtain best performance, the programmer must tailor the algorithms to fully utilize the vector unit. The Intel compiler automatically vectorizes sections of code where possible.


Heterogeneous computing architecture for fast detection of SNP-SNP interactions.

Sluga D, Curk T, Zupan B, Lotric U - BMC Bioinformatics (2014)

MIC code snippet. The first pragma directive marks the start of a MIC code section. Keywords in and out indicate the data to be transferred to and from the Xeon Phi. The OpenMP clause omp parallel for launches all available threads in parallel, which execute the code in the body of the loop and score the SNP pairs.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4230497&req=5

Figure 4: MIC code snippet. The first pragma directive marks the start of a MIC code section. Keywords in and out indicate the data to be transferred to and from the Xeon Phi. The OpenMP clause omp parallel for launches all available threads in parallel, which execute the code in the body of the loop and score the SNP pairs.
Mentions: Intel provides a C++ compiler suite and all the tools needed to exploit the hardware [21]. The code can be parallelized using OpenMP directives or the MPI library and compiled for the MIC architecture. Resulting applications can then run only on the Xeon Phi coprocessors. Another, more general way to specify parallel execution is to use offload constructs along with OpenMP to mark the data and the code to be transferred and executed on the Xeon Phi. All other parts of the program will run normally on the host computer CPU. A third possibility is to use OpenCL framework in the same manner as with GPUs.MIC development tools facilitate data management through compiler directives. The example in Figure 4 demonstrates this programming paradigm. It performs the same operation as the snippet from Figure 3. The programmer marks the data and the code that is needed on the coprocessor. All memory allocations and transfers are done implicitly. To obtain best performance, the programmer must tailor the algorithms to fully utilize the vector unit. The Intel compiler automatically vectorizes sections of code where possible.

Bottom Line: Their utility resulted in an order of magnitude shorter execution times when compared to the single-threaded CPU implementation.GPU implementation on a single Nvidia Tesla K20 runs twice as fast as that for the MIC architecture-based Xeon Phi P5110 coprocessor, but also requires considerably more programming effort.On the other hand the new MIC architecture, albeit lacking in performance reduces the programming effort and makes it up with a more general architecture suitable for a wider range of problems.

View Article: PubMed Central - HTML - PubMed

Affiliation: Faculty of Computer and Information Science, University of Ljubljana, Trzaska 25, SI 1000 Ljubljana, SI, Slovenia. uros.lotric@fri.uni-lj.si.

ABSTRACT

Background: The extent of data in a typical genome-wide association study (GWAS) poses considerable computational challenges to software tools for gene-gene interaction discovery. Exhaustive evaluation of all interactions among hundreds of thousands to millions of single nucleotide polymorphisms (SNPs) may require weeks or even months of computation. Massively parallel hardware within a modern Graphic Processing Unit (GPU) and Many Integrated Core (MIC) coprocessors can shorten the run time considerably. While the utility of GPU-based implementations in bioinformatics has been well studied, MIC architecture has been introduced only recently and may provide a number of comparative advantages that have yet to be explored and tested.

Results: We have developed a heterogeneous, GPU and Intel MIC-accelerated software module for SNP-SNP interaction discovery to replace the previously single-threaded computational core in the interactive web-based data exploration program SNPsyn. We report on differences between these two modern massively parallel architectures and their software environments. Their utility resulted in an order of magnitude shorter execution times when compared to the single-threaded CPU implementation. GPU implementation on a single Nvidia Tesla K20 runs twice as fast as that for the MIC architecture-based Xeon Phi P5110 coprocessor, but also requires considerably more programming effort.

Conclusions: General purpose GPUs are a mature platform with large amounts of computing power capable of tackling inherently parallel problems, but can prove demanding for the programmer. On the other hand the new MIC architecture, albeit lacking in performance reduces the programming effort and makes it up with a more general architecture suitable for a wider range of problems.

Show MeSH