Limits...
A uniform approach for programming distributed heterogeneous computing systems.

Grasso I, Pellegrini S, Cosenza B, Fahringer T - J Parallel Distrib Comput (2014)

Bottom Line: However, such systems require a combination of different programming paradigms making application development very challenging.In this article we introduce libWater, a library-based extension of the OpenCL programming model that simplifies the development of heterogeneous distributed applications. libWater consists of a simple interface, which is a transparent abstraction of the underlying distributed architecture, offering advanced features such as inter-context and inter-node device synchronization.We assess libWater's performance in three compute clusters available from the Vienna Scientific Cluster, the Barcelona Supercomputing Center and the University of Innsbruck, demonstrating improved performance and scaling with different test applications and configurations.

View Article: PubMed Central - PubMed

Affiliation: Institute of Computer Science, University of Innsbruck, Austria ; Barcelona Supercomputing Center, Barcelona, Spain.

ABSTRACT

Large-scale compute clusters of heterogeneous nodes equipped with multi-core CPUs and GPUs are getting increasingly popular in the scientific community. However, such systems require a combination of different programming paradigms making application development very challenging. In this article we introduce libWater, a library-based extension of the OpenCL programming model that simplifies the development of heterogeneous distributed applications. libWater consists of a simple interface, which is a transparent abstraction of the underlying distributed architecture, offering advanced features such as inter-context and inter-node device synchronization. It provides a runtime system which tracks dependency information enforced by event synchronization to dynamically build a DAG of commands, on which we automatically apply two optimizations: collective communication pattern detection and device-host-device copy removal. We assess libWater's performance in three compute clusters available from the Vienna Scientific Cluster, the Barcelona Supercomputing Center and the University of Innsbruck, demonstrating improved performance and scaling with different test applications and configurations.

No MeSH data available.


Related in: MedlinePlus

libWater’s distributed runtime system architecture.
© Copyright Policy - CC BY-NC-ND
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4375632&req=5

f000005: libWater’s distributed runtime system architecture.

Mentions: Fig. 1 shows the organization of the libWater  distributed runtime system. The host code, which directly interacts with libWater’s routines, runs on the so-called root node, which by default is the cluster node with rank 0. This thread will be referred to as the host thread. In the background, a second thread, i.e. the scheduler thread, is allocated to execute an instance of the WTRScheduler. On the remaining cluster nodes, a single scheduler thread is spawned independently of the number of available devices (only one MPI process is allocated per node). This thread executes an instance of the WTRScheduler which represents the backbone of libWater’s distributed runtime system.


A uniform approach for programming distributed heterogeneous computing systems.

Grasso I, Pellegrini S, Cosenza B, Fahringer T - J Parallel Distrib Comput (2014)

libWater’s distributed runtime system architecture.
© Copyright Policy - CC BY-NC-ND
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4375632&req=5

f000005: libWater’s distributed runtime system architecture.
Mentions: Fig. 1 shows the organization of the libWater  distributed runtime system. The host code, which directly interacts with libWater’s routines, runs on the so-called root node, which by default is the cluster node with rank 0. This thread will be referred to as the host thread. In the background, a second thread, i.e. the scheduler thread, is allocated to execute an instance of the WTRScheduler. On the remaining cluster nodes, a single scheduler thread is spawned independently of the number of available devices (only one MPI process is allocated per node). This thread executes an instance of the WTRScheduler which represents the backbone of libWater’s distributed runtime system.

Bottom Line: However, such systems require a combination of different programming paradigms making application development very challenging.In this article we introduce libWater, a library-based extension of the OpenCL programming model that simplifies the development of heterogeneous distributed applications. libWater consists of a simple interface, which is a transparent abstraction of the underlying distributed architecture, offering advanced features such as inter-context and inter-node device synchronization.We assess libWater's performance in three compute clusters available from the Vienna Scientific Cluster, the Barcelona Supercomputing Center and the University of Innsbruck, demonstrating improved performance and scaling with different test applications and configurations.

View Article: PubMed Central - PubMed

Affiliation: Institute of Computer Science, University of Innsbruck, Austria ; Barcelona Supercomputing Center, Barcelona, Spain.

ABSTRACT

Large-scale compute clusters of heterogeneous nodes equipped with multi-core CPUs and GPUs are getting increasingly popular in the scientific community. However, such systems require a combination of different programming paradigms making application development very challenging. In this article we introduce libWater, a library-based extension of the OpenCL programming model that simplifies the development of heterogeneous distributed applications. libWater consists of a simple interface, which is a transparent abstraction of the underlying distributed architecture, offering advanced features such as inter-context and inter-node device synchronization. It provides a runtime system which tracks dependency information enforced by event synchronization to dynamically build a DAG of commands, on which we automatically apply two optimizations: collective communication pattern detection and device-host-device copy removal. We assess libWater's performance in three compute clusters available from the Vienna Scientific Cluster, the Barcelona Supercomputing Center and the University of Innsbruck, demonstrating improved performance and scaling with different test applications and configurations.

No MeSH data available.


Related in: MedlinePlus