Limits...
Efficient BinDCT hardware architecture exploration and implementation on FPGA

View Article: PubMed Central - PubMed

ABSTRACT

This paper presents a hardware module design for the forward Binary Discrete Cosine Transform (BinDCT) and its implementation on a field programmable gate array device. Different architectures of the BinDCT module were explored to ensure the maximum efficiency. The elaboration of these architectures included architectural design, timing and pipeline analysis, hardware description language modeling, design synthesis, and implementation. The developed BinDCT hardware module presents a high efficiency in terms of operating frequency and hardware resources, which has made it suitable for the most recent video standards with high image resolution and refresh frequency. Additionally, the high hardware efficiency of the BinDCT would make it a very good candidate for time and resource-constrained applications. By comparison with several recent implementations of discrete cosine transform approximations, it has been shown that the proposed hardware BinDCT module presents the best performances.

No MeSH data available.


Timing diagram of (a) Arch N°1, (b) Arch N°12 and (c) Arch N°21.
© Copyright Policy - CC BY-NC-ND
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5037209&req=5

f0025: Timing diagram of (a) Arch N°1, (b) Arch N°12 and (c) Arch N°21.

Mentions: The timing diagram of the completely unshared architecture is represented in Fig. 5a. This architecture includes 4 adders, 4 subtractors and 8 registers for stage 1; 2 adders, 2 subtractors and 4 registers for stage 2; 4 adders, 4 subtractors and 8 registers for stage 3; and 2 adders, 6 subtractors and 8 registers for stage 4. The timing diagram shows the number of cycles needed for the various 1D-BinDCT stage modules. The serial inputs of the 1D-BinDCT block (X0, X1, …, X7), their serial outputs (Y0, Y1, …, Y7) and the output coefficients of each stage are provided. The first line (X0, X1, …, X7) of the 8 × 8 input matrix requires a latency of 16 cycles, and each one of the remaining lines presents a latency of 15 cycles. In fact, the input stage takes 9 cycles for the first line and 8 cycles for the other lines, and the four BinDCT stages have a global latency of 8 cycles.


Efficient BinDCT hardware architecture exploration and implementation on FPGA
Timing diagram of (a) Arch N°1, (b) Arch N°12 and (c) Arch N°21.
© Copyright Policy - CC BY-NC-ND
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5037209&req=5

f0025: Timing diagram of (a) Arch N°1, (b) Arch N°12 and (c) Arch N°21.
Mentions: The timing diagram of the completely unshared architecture is represented in Fig. 5a. This architecture includes 4 adders, 4 subtractors and 8 registers for stage 1; 2 adders, 2 subtractors and 4 registers for stage 2; 4 adders, 4 subtractors and 8 registers for stage 3; and 2 adders, 6 subtractors and 8 registers for stage 4. The timing diagram shows the number of cycles needed for the various 1D-BinDCT stage modules. The serial inputs of the 1D-BinDCT block (X0, X1, …, X7), their serial outputs (Y0, Y1, …, Y7) and the output coefficients of each stage are provided. The first line (X0, X1, …, X7) of the 8 × 8 input matrix requires a latency of 16 cycles, and each one of the remaining lines presents a latency of 15 cycles. In fact, the input stage takes 9 cycles for the first line and 8 cycles for the other lines, and the four BinDCT stages have a global latency of 8 cycles.

View Article: PubMed Central - PubMed

ABSTRACT

This paper presents a hardware module design for the forward Binary Discrete Cosine Transform (BinDCT) and its implementation on a field programmable gate array device. Different architectures of the BinDCT module were explored to ensure the maximum efficiency. The elaboration of these architectures included architectural design, timing and pipeline analysis, hardware description language modeling, design synthesis, and implementation. The developed BinDCT hardware module presents a high efficiency in terms of operating frequency and hardware resources, which has made it suitable for the most recent video standards with high image resolution and refresh frequency. Additionally, the high hardware efficiency of the BinDCT would make it a very good candidate for time and resource-constrained applications. By comparison with several recent implementations of discrete cosine transform approximations, it has been shown that the proposed hardware BinDCT module presents the best performances.

No MeSH data available.