Limits...
Integrating reconfigurable hardware-based grid for high performance computing.

Dondo Gazzano J, Sanchez Molina F, Rincon F, López JC - ScientificWorldJournal (2015)

Bottom Line: The impressive speed-up factors that they are able to achieve, the reduced power consumption, and the easiness and flexibility of the design process with fast iterations between consecutive versions are examples of benefits obtained with their use.An example application and a comparison with other hardware and software implementations are shown.Experimental results show that the proposed architecture offers encouraging advantages for deployment of high performance distributed applications simplifying development process.

View Article: PubMed Central - PubMed

Affiliation: Escuela Superior de Informatica, Universidad de Castilla-La Mancha, 13071 Ciudad Real, Spain.

ABSTRACT
FPGAs have shown several characteristics that make them very attractive for high performance computing (HPC). The impressive speed-up factors that they are able to achieve, the reduced power consumption, and the easiness and flexibility of the design process with fast iterations between consecutive versions are examples of benefits obtained with their use. However, there are still some difficulties when using reconfigurable platforms as accelerator that need to be addressed: the need of an in-depth application study to identify potential acceleration, the lack of tools for the deployment of computational problems in distributed hardware platforms, and the low portability of components, among others. This work proposes a complete grid infrastructure for distributed high performance computing based on dynamically reconfigurable FPGAs. Besides, a set of services designed to facilitate the application deployment is described. An example application and a comparison with other hardware and software implementations are shown. Experimental results show that the proposed architecture offers encouraging advantages for deployment of high performance distributed applications simplifying development process.

No MeSH data available.


Timing diagram.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4385699&req=5

fig18: Timing diagram.

Mentions: In order to attain an execution time very close to the computation time calculated above, it is necessary to hide the data transference time with the computation time of each computational kernel. In order to optimize the execution time and to avoid that the computational kernel remains inactive until new data is loaded in its local memory, it is necessary to determine the minimum amount of data to be loaded into local memory to ensure continuous processing time. It is necessary to reach a balance between the number of computational kernels and the amount of data needed by each one. This last value affects linearly the transference time and in a square mode the computation time. As shown in Figure 18, four nodes work in parallel. The transference time for each node is symbolized by TT Node n where n is the node number. If the amount of data loaded in each local memory allows an execution time long enough until new data is loaded, then the computing time shown in Figure 17 is actually equal to the total execution time. Time is calculated from formula (5), where P represent the amount of rows or columns in M × P multiplied by P × N matrix multiplication.


Integrating reconfigurable hardware-based grid for high performance computing.

Dondo Gazzano J, Sanchez Molina F, Rincon F, López JC - ScientificWorldJournal (2015)

Timing diagram.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4385699&req=5

fig18: Timing diagram.
Mentions: In order to attain an execution time very close to the computation time calculated above, it is necessary to hide the data transference time with the computation time of each computational kernel. In order to optimize the execution time and to avoid that the computational kernel remains inactive until new data is loaded in its local memory, it is necessary to determine the minimum amount of data to be loaded into local memory to ensure continuous processing time. It is necessary to reach a balance between the number of computational kernels and the amount of data needed by each one. This last value affects linearly the transference time and in a square mode the computation time. As shown in Figure 18, four nodes work in parallel. The transference time for each node is symbolized by TT Node n where n is the node number. If the amount of data loaded in each local memory allows an execution time long enough until new data is loaded, then the computing time shown in Figure 17 is actually equal to the total execution time. Time is calculated from formula (5), where P represent the amount of rows or columns in M × P multiplied by P × N matrix multiplication.

Bottom Line: The impressive speed-up factors that they are able to achieve, the reduced power consumption, and the easiness and flexibility of the design process with fast iterations between consecutive versions are examples of benefits obtained with their use.An example application and a comparison with other hardware and software implementations are shown.Experimental results show that the proposed architecture offers encouraging advantages for deployment of high performance distributed applications simplifying development process.

View Article: PubMed Central - PubMed

Affiliation: Escuela Superior de Informatica, Universidad de Castilla-La Mancha, 13071 Ciudad Real, Spain.

ABSTRACT
FPGAs have shown several characteristics that make them very attractive for high performance computing (HPC). The impressive speed-up factors that they are able to achieve, the reduced power consumption, and the easiness and flexibility of the design process with fast iterations between consecutive versions are examples of benefits obtained with their use. However, there are still some difficulties when using reconfigurable platforms as accelerator that need to be addressed: the need of an in-depth application study to identify potential acceleration, the lack of tools for the deployment of computational problems in distributed hardware platforms, and the low portability of components, among others. This work proposes a complete grid infrastructure for distributed high performance computing based on dynamically reconfigurable FPGAs. Besides, a set of services designed to facilitate the application deployment is described. An example application and a comparison with other hardware and software implementations are shown. Experimental results show that the proposed architecture offers encouraging advantages for deployment of high performance distributed applications simplifying development process.

No MeSH data available.