Limits...
A uniform approach for programming distributed heterogeneous computing systems.

Grasso I, Pellegrini S, Cosenza B, Fahringer T - J Parallel Distrib Comput (2014)

Bottom Line: However, such systems require a combination of different programming paradigms making application development very challenging.In this article we introduce libWater, a library-based extension of the OpenCL programming model that simplifies the development of heterogeneous distributed applications. libWater consists of a simple interface, which is a transparent abstraction of the underlying distributed architecture, offering advanced features such as inter-context and inter-node device synchronization.We assess libWater's performance in three compute clusters available from the Vienna Scientific Cluster, the Barcelona Supercomputing Center and the University of Innsbruck, demonstrating improved performance and scaling with different test applications and configurations.

View Article: PubMed Central - PubMed

Affiliation: Institute of Computer Science, University of Innsbruck, Austria ; Barcelona Supercomputing Center, Barcelona, Spain.

ABSTRACT

Large-scale compute clusters of heterogeneous nodes equipped with multi-core CPUs and GPUs are getting increasingly popular in the scientific community. However, such systems require a combination of different programming paradigms making application development very challenging. In this article we introduce libWater, a library-based extension of the OpenCL programming model that simplifies the development of heterogeneous distributed applications. libWater consists of a simple interface, which is a transparent abstraction of the underlying distributed architecture, offering advanced features such as inter-context and inter-node device synchronization. It provides a runtime system which tracks dependency information enforced by event synchronization to dynamically build a DAG of commands, on which we automatically apply two optimizations: collective communication pattern detection and device-host-device copy removal. We assess libWater's performance in three compute clusters available from the Vienna Scientific Cluster, the Barcelona Supercomputing Center and the University of Innsbruck, demonstrating improved performance and scaling with different test applications and configurations.

No MeSH data available.


Related in: MedlinePlus

DAG of wtr_commands generated during the execution of the code snippet in Listing 1.
© Copyright Policy - CC BY-NC-ND
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4375632&req=5

f000010: DAG of wtr_commands generated during the execution of the code snippet in Listing 1.

Mentions: A complete multi-device libWater-based host program is shown in Listing 1. This code initializes all the available NVidia GPU devices. It then selects two devices belonging respectively to node rank 0 and 1, with a global memory larger than 1024 MB. For each device the code in Listing 1 does the following: create a kernel (i.e.  kern, in line 10) and a read/write buffer (i.e.  buff, line 11). Then the contents from the host memory is written into the device buffer by the wtr_write_buffer command (line 12) and the wtr_run_kernel command is issued providing buff as an input argument (lines 14–16). The computed result is then retrieved by the wtr_read_buffer command (line 17) which moves data from the device memory back to the host memory. From the runtime system point of view, the execution of the previous code generates a set of dependent commands structured as the DAG depicted in Fig. 2. The DAG is composed of vertices, i.e.  , interconnected through directed edges , or events, which guarantee that the correct order of execution, and therefore the semantics of the input program, is maintained. The set of dependencies associated with a command is defined as . It is worth mentioning that not all libWater  library routines generate a corresponding wtr_command. For example, creation, merging and release of events are only meaningful in the root node, therefore there is no need for serializing them. In Fig. 2, each wtr_command carries a descriptor in the form where represents the node rank, , on which the targeted device, , is hosted and is the unique command identifier assigned by the runtime system. As already mentioned, for buffer operations on remote devices (i.e. device on node 1) explicit data transfers are automatically inserted by the libWater  library (e.g.  wtr_commands 10 and 14).


A uniform approach for programming distributed heterogeneous computing systems.

Grasso I, Pellegrini S, Cosenza B, Fahringer T - J Parallel Distrib Comput (2014)

DAG of wtr_commands generated during the execution of the code snippet in Listing 1.
© Copyright Policy - CC BY-NC-ND
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4375632&req=5

f000010: DAG of wtr_commands generated during the execution of the code snippet in Listing 1.
Mentions: A complete multi-device libWater-based host program is shown in Listing 1. This code initializes all the available NVidia GPU devices. It then selects two devices belonging respectively to node rank 0 and 1, with a global memory larger than 1024 MB. For each device the code in Listing 1 does the following: create a kernel (i.e.  kern, in line 10) and a read/write buffer (i.e.  buff, line 11). Then the contents from the host memory is written into the device buffer by the wtr_write_buffer command (line 12) and the wtr_run_kernel command is issued providing buff as an input argument (lines 14–16). The computed result is then retrieved by the wtr_read_buffer command (line 17) which moves data from the device memory back to the host memory. From the runtime system point of view, the execution of the previous code generates a set of dependent commands structured as the DAG depicted in Fig. 2. The DAG is composed of vertices, i.e.  , interconnected through directed edges , or events, which guarantee that the correct order of execution, and therefore the semantics of the input program, is maintained. The set of dependencies associated with a command is defined as . It is worth mentioning that not all libWater  library routines generate a corresponding wtr_command. For example, creation, merging and release of events are only meaningful in the root node, therefore there is no need for serializing them. In Fig. 2, each wtr_command carries a descriptor in the form where represents the node rank, , on which the targeted device, , is hosted and is the unique command identifier assigned by the runtime system. As already mentioned, for buffer operations on remote devices (i.e. device on node 1) explicit data transfers are automatically inserted by the libWater  library (e.g.  wtr_commands 10 and 14).

Bottom Line: However, such systems require a combination of different programming paradigms making application development very challenging.In this article we introduce libWater, a library-based extension of the OpenCL programming model that simplifies the development of heterogeneous distributed applications. libWater consists of a simple interface, which is a transparent abstraction of the underlying distributed architecture, offering advanced features such as inter-context and inter-node device synchronization.We assess libWater's performance in three compute clusters available from the Vienna Scientific Cluster, the Barcelona Supercomputing Center and the University of Innsbruck, demonstrating improved performance and scaling with different test applications and configurations.

View Article: PubMed Central - PubMed

Affiliation: Institute of Computer Science, University of Innsbruck, Austria ; Barcelona Supercomputing Center, Barcelona, Spain.

ABSTRACT

Large-scale compute clusters of heterogeneous nodes equipped with multi-core CPUs and GPUs are getting increasingly popular in the scientific community. However, such systems require a combination of different programming paradigms making application development very challenging. In this article we introduce libWater, a library-based extension of the OpenCL programming model that simplifies the development of heterogeneous distributed applications. libWater consists of a simple interface, which is a transparent abstraction of the underlying distributed architecture, offering advanced features such as inter-context and inter-node device synchronization. It provides a runtime system which tracks dependency information enforced by event synchronization to dynamically build a DAG of commands, on which we automatically apply two optimizations: collective communication pattern detection and device-host-device copy removal. We assess libWater's performance in three compute clusters available from the Vienna Scientific Cluster, the Barcelona Supercomputing Center and the University of Innsbruck, demonstrating improved performance and scaling with different test applications and configurations.

No MeSH data available.


Related in: MedlinePlus