Limits...
A uniform approach for programming distributed heterogeneous computing systems.

Grasso I, Pellegrini S, Cosenza B, Fahringer T - J Parallel Distrib Comput (2014)

Bottom Line: However, such systems require a combination of different programming paradigms making application development very challenging.In this article we introduce libWater, a library-based extension of the OpenCL programming model that simplifies the development of heterogeneous distributed applications. libWater consists of a simple interface, which is a transparent abstraction of the underlying distributed architecture, offering advanced features such as inter-context and inter-node device synchronization.We assess libWater's performance in three compute clusters available from the Vienna Scientific Cluster, the Barcelona Supercomputing Center and the University of Innsbruck, demonstrating improved performance and scaling with different test applications and configurations.

View Article: PubMed Central - PubMed

Affiliation: Institute of Computer Science, University of Innsbruck, Austria ; Barcelona Supercomputing Center, Barcelona, Spain.

ABSTRACT

Large-scale compute clusters of heterogeneous nodes equipped with multi-core CPUs and GPUs are getting increasingly popular in the scientific community. However, such systems require a combination of different programming paradigms making application development very challenging. In this article we introduce libWater, a library-based extension of the OpenCL programming model that simplifies the development of heterogeneous distributed applications. libWater consists of a simple interface, which is a transparent abstraction of the underlying distributed architecture, offering advanced features such as inter-context and inter-node device synchronization. It provides a runtime system which tracks dependency information enforced by event synchronization to dynamically build a DAG of commands, on which we automatically apply two optimizations: collective communication pattern detection and device-host-device copy removal. We assess libWater's performance in three compute clusters available from the Vienna Scientific Cluster, the Barcelona Supercomputing Center and the University of Innsbruck, demonstrating improved performance and scaling with different test applications and configurations.

No MeSH data available.


Related in: MedlinePlus

Strong scaling of matrix chain multiplication on the VSC2 Cluster.
© Copyright Policy - CC BY-NC-ND
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4375632&req=5

f000065: Strong scaling of matrix chain multiplication on the VSC2 Cluster.

Mentions: The results of our experiments are depicted in Fig. 5. For this application, we show the execution time (in seconds) for up to 16 nodes and the corresponding speedup with respect to a single node. The baseline approach scales almost linearly up to 8 nodes with an efficiency of 87%. For 16 nodes the runtime system efficiency decreases significantly reaching 48%. The main reason is the high communication overhead caused by the unnecessary copies of intermediate buffers to the root node. Before proceeding with the operation, the results of and have to be gathered by the root scheduler and then distributed again on the remaining nodes. While the buffer containing can be directly reused, the result of can be copied to remaining nodes using a more efficient collective pattern, the MPI_Allgather. In this paper, only the former redundant copy is automatically detected and removed, the latter is replaced by an MPI_Gater and MPI_Bcast by the DCR optimization.


A uniform approach for programming distributed heterogeneous computing systems.

Grasso I, Pellegrini S, Cosenza B, Fahringer T - J Parallel Distrib Comput (2014)

Strong scaling of matrix chain multiplication on the VSC2 Cluster.
© Copyright Policy - CC BY-NC-ND
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4375632&req=5

f000065: Strong scaling of matrix chain multiplication on the VSC2 Cluster.
Mentions: The results of our experiments are depicted in Fig. 5. For this application, we show the execution time (in seconds) for up to 16 nodes and the corresponding speedup with respect to a single node. The baseline approach scales almost linearly up to 8 nodes with an efficiency of 87%. For 16 nodes the runtime system efficiency decreases significantly reaching 48%. The main reason is the high communication overhead caused by the unnecessary copies of intermediate buffers to the root node. Before proceeding with the operation, the results of and have to be gathered by the root scheduler and then distributed again on the remaining nodes. While the buffer containing can be directly reused, the result of can be copied to remaining nodes using a more efficient collective pattern, the MPI_Allgather. In this paper, only the former redundant copy is automatically detected and removed, the latter is replaced by an MPI_Gater and MPI_Bcast by the DCR optimization.

Bottom Line: However, such systems require a combination of different programming paradigms making application development very challenging.In this article we introduce libWater, a library-based extension of the OpenCL programming model that simplifies the development of heterogeneous distributed applications. libWater consists of a simple interface, which is a transparent abstraction of the underlying distributed architecture, offering advanced features such as inter-context and inter-node device synchronization.We assess libWater's performance in three compute clusters available from the Vienna Scientific Cluster, the Barcelona Supercomputing Center and the University of Innsbruck, demonstrating improved performance and scaling with different test applications and configurations.

View Article: PubMed Central - PubMed

Affiliation: Institute of Computer Science, University of Innsbruck, Austria ; Barcelona Supercomputing Center, Barcelona, Spain.

ABSTRACT

Large-scale compute clusters of heterogeneous nodes equipped with multi-core CPUs and GPUs are getting increasingly popular in the scientific community. However, such systems require a combination of different programming paradigms making application development very challenging. In this article we introduce libWater, a library-based extension of the OpenCL programming model that simplifies the development of heterogeneous distributed applications. libWater consists of a simple interface, which is a transparent abstraction of the underlying distributed architecture, offering advanced features such as inter-context and inter-node device synchronization. It provides a runtime system which tracks dependency information enforced by event synchronization to dynamically build a DAG of commands, on which we automatically apply two optimizations: collective communication pattern detection and device-host-device copy removal. We assess libWater's performance in three compute clusters available from the Vienna Scientific Cluster, the Barcelona Supercomputing Center and the University of Innsbruck, demonstrating improved performance and scaling with different test applications and configurations.

No MeSH data available.


Related in: MedlinePlus