Limits...
Efficient BinDCT hardware architecture exploration and implementation on FPGA

View Article: PubMed Central - PubMed

ABSTRACT

This paper presents a hardware module design for the forward Binary Discrete Cosine Transform (BinDCT) and its implementation on a field programmable gate array device. Different architectures of the BinDCT module were explored to ensure the maximum efficiency. The elaboration of these architectures included architectural design, timing and pipeline analysis, hardware description language modeling, design synthesis, and implementation. The developed BinDCT hardware module presents a high efficiency in terms of operating frequency and hardware resources, which has made it suitable for the most recent video standards with high image resolution and refresh frequency. Additionally, the high hardware efficiency of the BinDCT would make it a very good candidate for time and resource-constrained applications. By comparison with several recent implementations of discrete cosine transform approximations, it has been shown that the proposed hardware BinDCT module presents the best performances.

No MeSH data available.


Different implementation solutions of (a) stage 1, (b) stage 2, (c) stage 3, and (d) stage 4.
© Copyright Policy - CC BY-NC-ND
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5037209&req=5

f0015: Different implementation solutions of (a) stage 1, (b) stage 2, (c) stage 3, and (d) stage 4.

Mentions: Fig.3a presents three different implementation solutions of stage 1: a parallel solution, a solution with 2 shared operators (1 adder and 1 subtractor), and a solution with 1 shared operator (1 add/sub). The first implementation solution uses 4 adders, 4 subtractors and 8 output registers. It has a latency of 1 cycle, as all the ai coefficients can be calculated in a parallel way. In the second implementation solution, one adder and one subtractor are used with 2 Multiplexers (MUX) to select the operators’ entries and the output registers. Depending on the value of the MUX selection input “sel”, two ai coefficients can be calculated every time. The calculated coefficients depend on the same Xi input. A latency of 4 cycles is necessary for calculating the ai coefficients simultaneously two by two.


Efficient BinDCT hardware architecture exploration and implementation on FPGA
Different implementation solutions of (a) stage 1, (b) stage 2, (c) stage 3, and (d) stage 4.
© Copyright Policy - CC BY-NC-ND
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5037209&req=5

f0015: Different implementation solutions of (a) stage 1, (b) stage 2, (c) stage 3, and (d) stage 4.
Mentions: Fig.3a presents three different implementation solutions of stage 1: a parallel solution, a solution with 2 shared operators (1 adder and 1 subtractor), and a solution with 1 shared operator (1 add/sub). The first implementation solution uses 4 adders, 4 subtractors and 8 output registers. It has a latency of 1 cycle, as all the ai coefficients can be calculated in a parallel way. In the second implementation solution, one adder and one subtractor are used with 2 Multiplexers (MUX) to select the operators’ entries and the output registers. Depending on the value of the MUX selection input “sel”, two ai coefficients can be calculated every time. The calculated coefficients depend on the same Xi input. A latency of 4 cycles is necessary for calculating the ai coefficients simultaneously two by two.

View Article: PubMed Central - PubMed

ABSTRACT

This paper presents a hardware module design for the forward Binary Discrete Cosine Transform (BinDCT) and its implementation on a field programmable gate array device. Different architectures of the BinDCT module were explored to ensure the maximum efficiency. The elaboration of these architectures included architectural design, timing and pipeline analysis, hardware description language modeling, design synthesis, and implementation. The developed BinDCT hardware module presents a high efficiency in terms of operating frequency and hardware resources, which has made it suitable for the most recent video standards with high image resolution and refresh frequency. Additionally, the high hardware efficiency of the BinDCT would make it a very good candidate for time and resource-constrained applications. By comparison with several recent implementations of discrete cosine transform approximations, it has been shown that the proposed hardware BinDCT module presents the best performances.

No MeSH data available.