Limits...
Integral Images: Efficient Algorithms for Their Computation and Storage in Resource-Constrained Embedded Vision Systems.

Ehsan S, Clark AF - Sensors (Basel) (2015)

Bottom Line: The integral image, an intermediate image representation, has found extensive use in multi-scale local feature detection algorithms, such as Speeded-Up Robust Features (SURF), allowing fast computation of rectangular features at constant speed, independent of filter size.An efficient design strategy is also proposed for a parallel integral image computation unit to reduce the size of the required internal memory (nearly 35% for common HD video).Addressing the storage problem of integral image in embedded vision systems, the paper presents two algorithms which allow substantial decrease (at least 44.44%) in the memory requirements.

View Article: PubMed Central - PubMed

Affiliation: School of Computer Science and Electronic Engineering, University of Essex, Colchester CO4 3SQ, UK. sehsan@essex.ac.uk.

ABSTRACT
The integral image, an intermediate image representation, has found extensive use in multi-scale local feature detection algorithms, such as Speeded-Up Robust Features (SURF), allowing fast computation of rectangular features at constant speed, independent of filter size. For resource-constrained real-time embedded vision systems, computation and storage of integral image presents several design challenges due to strict timing and hardware limitations. Although calculation of the integral image only consists of simple addition operations, the total number of operations is large owing to the generally large size of image data. Recursive equations allow substantial decrease in the number of operations but require calculation in a serial fashion. This paper presents two new hardware algorithms that are based on the decomposition of these recursive equations, allowing calculation of up to four integral image values in a row-parallel way without significantly increasing the number of operations. An efficient design strategy is also proposed for a parallel integral image computation unit to reduce the size of the required internal memory (nearly 35% for common HD video). Addressing the storage problem of integral image in embedded vision systems, the paper presents two algorithms which allow substantial decrease (at least 44.44%) in the memory requirements. Finally, the paper provides a case study that highlights the utility of the proposed architectures in embedded vision systems.

No MeSH data available.


Comparative timing analysis (in logarithmic scale) of the three prototype implementations of the fast adaptive binarization algorithm [32] on a Xilinx Virtex-6 FPGA for some common image sizes.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4541907&req=5

sensors-15-16804-f017: Comparative timing analysis (in logarithmic scale) of the three prototype implementations of the fast adaptive binarization algorithm [32] on a Xilinx Virtex-6 FPGA for some common image sizes.

Mentions: To further analyze and highlight the significance of the integral image computation block for the fast adaptive binarization algorithm [32], Figure 17 shows the comparative timing analysis in logarithmic scale for the three implemented systems on a Xilinx Virtex-6 FPGA for some common image sizes. It gives the timing breakdown of the three implementations and clearly highlights how an optimized integral image computation block positively affects the whole computation and makes a substantial performance difference. It is evident from Figure 17 that slow integral image computation for the software-only implementation has a substantial negative effect on the overall system performance in terms of computation time. On the other hand, the system with hardware implementation of the serial integral image computation algorithm performs relatively better and also reduces the overall system computation time. Finally, it can be clearly seen from Figure 17 that the system employing our 4-rows parallel hardware for integral image computation reduces the overall system computation time significantly due to the optimized integral image computation block. To highlight this further, Figure 18 shows the comparative system throughput results in logarithmic scale (in frames/second) for the three implementations for some common image sizes. Again, the system utilizing our proposed 4-rows parallel integral image computation hardware achieves substantially enhanced throughput as compared to the other two implementations, thus showing the utility of our architectures as building blocks for larger embedded vision systems.


Integral Images: Efficient Algorithms for Their Computation and Storage in Resource-Constrained Embedded Vision Systems.

Ehsan S, Clark AF - Sensors (Basel) (2015)

Comparative timing analysis (in logarithmic scale) of the three prototype implementations of the fast adaptive binarization algorithm [32] on a Xilinx Virtex-6 FPGA for some common image sizes.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4541907&req=5

sensors-15-16804-f017: Comparative timing analysis (in logarithmic scale) of the three prototype implementations of the fast adaptive binarization algorithm [32] on a Xilinx Virtex-6 FPGA for some common image sizes.
Mentions: To further analyze and highlight the significance of the integral image computation block for the fast adaptive binarization algorithm [32], Figure 17 shows the comparative timing analysis in logarithmic scale for the three implemented systems on a Xilinx Virtex-6 FPGA for some common image sizes. It gives the timing breakdown of the three implementations and clearly highlights how an optimized integral image computation block positively affects the whole computation and makes a substantial performance difference. It is evident from Figure 17 that slow integral image computation for the software-only implementation has a substantial negative effect on the overall system performance in terms of computation time. On the other hand, the system with hardware implementation of the serial integral image computation algorithm performs relatively better and also reduces the overall system computation time. Finally, it can be clearly seen from Figure 17 that the system employing our 4-rows parallel hardware for integral image computation reduces the overall system computation time significantly due to the optimized integral image computation block. To highlight this further, Figure 18 shows the comparative system throughput results in logarithmic scale (in frames/second) for the three implementations for some common image sizes. Again, the system utilizing our proposed 4-rows parallel integral image computation hardware achieves substantially enhanced throughput as compared to the other two implementations, thus showing the utility of our architectures as building blocks for larger embedded vision systems.

Bottom Line: The integral image, an intermediate image representation, has found extensive use in multi-scale local feature detection algorithms, such as Speeded-Up Robust Features (SURF), allowing fast computation of rectangular features at constant speed, independent of filter size.An efficient design strategy is also proposed for a parallel integral image computation unit to reduce the size of the required internal memory (nearly 35% for common HD video).Addressing the storage problem of integral image in embedded vision systems, the paper presents two algorithms which allow substantial decrease (at least 44.44%) in the memory requirements.

View Article: PubMed Central - PubMed

Affiliation: School of Computer Science and Electronic Engineering, University of Essex, Colchester CO4 3SQ, UK. sehsan@essex.ac.uk.

ABSTRACT
The integral image, an intermediate image representation, has found extensive use in multi-scale local feature detection algorithms, such as Speeded-Up Robust Features (SURF), allowing fast computation of rectangular features at constant speed, independent of filter size. For resource-constrained real-time embedded vision systems, computation and storage of integral image presents several design challenges due to strict timing and hardware limitations. Although calculation of the integral image only consists of simple addition operations, the total number of operations is large owing to the generally large size of image data. Recursive equations allow substantial decrease in the number of operations but require calculation in a serial fashion. This paper presents two new hardware algorithms that are based on the decomposition of these recursive equations, allowing calculation of up to four integral image values in a row-parallel way without significantly increasing the number of operations. An efficient design strategy is also proposed for a parallel integral image computation unit to reduce the size of the required internal memory (nearly 35% for common HD video). Addressing the storage problem of integral image in embedded vision systems, the paper presents two algorithms which allow substantial decrease (at least 44.44%) in the memory requirements. Finally, the paper provides a case study that highlights the utility of the proposed architectures in embedded vision systems.

No MeSH data available.