Limits...
Parallel Algorithm for GPU Processing; for use in High Speed Machine Vision Sensing of Cotton Lint Trash

View Article: PubMed Central - PubMed

ABSTRACT

One of the main hurdles standing in the way of optimal cleaning of cotton lint is the lack of sensing systems that can react fast enough to provide the control system with real-time information as to the level of trash contamination of the cotton lint. This research examines the use of programmable graphic processing units (GPU) as an alternative to the PC's traditional use of the central processing unit (CPU). The use of the GPU, as an alternative computation platform, allowed for the machine vision system to gain a significant improvement in processing time. By improving the processing time, this research seeks to address the lack of availability of rapid trash sensing systems and thus alleviate a situation in which the current systems view the cotton lint either well before, or after, the cotton is cleaned. This extended lag/lead time that is currently imposed on the cotton trash cleaning control systems, is what is responsible for system operators utilizing a very large dead-band safety buffer in order to ensure that the cotton lint is not under-cleaned. Unfortunately, the utilization of a large dead-band buffer results in the majority of the cotton lint being over-cleaned which in turn causes lint fiber-damage as well as significant losses of the valuable lint due to the excessive use of cleaning machinery. This research estimates that upwards of a 30% reduction in lint loss could be gained through the use of a tightly coupled trash sensor to the cleaning machinery control systems. This research seeks to improve processing times through the development of a new algorithm for cotton trash sensing that allows for implementation on a highly parallel architecture. Additionally, by moving the new parallel algorithm onto an alternative computing platform, the graphic processing unit “GPU”, for processing of the cotton trash images, a speed up of over 6.5 times, over optimized code running on the PC's central processing unit “CPU”, was gained. The new parallel algorithm operating on the GPU was able to process a 1024×1024 image in less than 17ms. At this improved speed, the image processing system's performance should now be sufficient to provide a system that would be capable of real-time feed-back control that is in tight cooperation with the cleaning equipment.

No MeSH data available.


Processing flow using a GPU processor
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3672999&req=5

f8-sensors-08-00817: Processing flow using a GPU processor

Mentions: Once the development of a single pass filter was completed, the next task was to fine tune the implementation of the filter to effect the fastest processing on the given hardware. To gain insight into areas that would provide a meaningful speed-up, which was one of the primary goals of this research, it was also crucial to provide a baseline performance by which to judge the GPU approach. The algorithm was initially optimized for use on a Pentium 4 processor using the extended operation set for “Single Instruction for Multiple Data” or SIMD. The SIMD extension for the Pentium 4 provides a single vector processor that is capable of multiplying 4 single-precision floating point numbers in parallel. Performance of the algorithm after adjustment to take advantage of the Pentium's SIMD CPU chipset extensions as well as inline expanded and optimized C code, resulted in a processing time of 7.5 frames per second. This performance increase represents a significant speedup over the previous algorithm implementation of 2.5 frames per second. The next step in the development was to compare the optimized SIMD performance to the same algorithm running on an NVIDIA GeForce 8800 Ultra GPU graphics processing unit, housed on a pci-express bus card, where the code would then have the opportunity to take advantage of the GPU's 132 vector processors. We note here that while the GPU has 132 vector processors, each capable of multiplying 4 single precision floating point numbers in parallel, the core is only running at 500 MHz versus the Pentium's core at 3.0GHz. Given the speed disparity between the GPU processor to the Pentium core, one cannot expect a speedup of 132 times for the GPU over the CPU. Other potential problem areas for implementation on the GPU platform lies in the bottle necks that are created by pushing large amounts of image data across the pci-express bus into the GPU's video ram, figure 8. In short, one may not even expect a 132-X times (500MHz/3000MHz) = 22-X gain from running on the GPU core versus normal operations that take place via computation on the CPU due to other hardware constraints.


Parallel Algorithm for GPU Processing; for use in High Speed Machine Vision Sensing of Cotton Lint Trash
Processing flow using a GPU processor
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3672999&req=5

f8-sensors-08-00817: Processing flow using a GPU processor
Mentions: Once the development of a single pass filter was completed, the next task was to fine tune the implementation of the filter to effect the fastest processing on the given hardware. To gain insight into areas that would provide a meaningful speed-up, which was one of the primary goals of this research, it was also crucial to provide a baseline performance by which to judge the GPU approach. The algorithm was initially optimized for use on a Pentium 4 processor using the extended operation set for “Single Instruction for Multiple Data” or SIMD. The SIMD extension for the Pentium 4 provides a single vector processor that is capable of multiplying 4 single-precision floating point numbers in parallel. Performance of the algorithm after adjustment to take advantage of the Pentium's SIMD CPU chipset extensions as well as inline expanded and optimized C code, resulted in a processing time of 7.5 frames per second. This performance increase represents a significant speedup over the previous algorithm implementation of 2.5 frames per second. The next step in the development was to compare the optimized SIMD performance to the same algorithm running on an NVIDIA GeForce 8800 Ultra GPU graphics processing unit, housed on a pci-express bus card, where the code would then have the opportunity to take advantage of the GPU's 132 vector processors. We note here that while the GPU has 132 vector processors, each capable of multiplying 4 single precision floating point numbers in parallel, the core is only running at 500 MHz versus the Pentium's core at 3.0GHz. Given the speed disparity between the GPU processor to the Pentium core, one cannot expect a speedup of 132 times for the GPU over the CPU. Other potential problem areas for implementation on the GPU platform lies in the bottle necks that are created by pushing large amounts of image data across the pci-express bus into the GPU's video ram, figure 8. In short, one may not even expect a 132-X times (500MHz/3000MHz) = 22-X gain from running on the GPU core versus normal operations that take place via computation on the CPU due to other hardware constraints.

View Article: PubMed Central - PubMed

ABSTRACT

One of the main hurdles standing in the way of optimal cleaning of cotton lint is the lack of sensing systems that can react fast enough to provide the control system with real-time information as to the level of trash contamination of the cotton lint. This research examines the use of programmable graphic processing units (GPU) as an alternative to the PC's traditional use of the central processing unit (CPU). The use of the GPU, as an alternative computation platform, allowed for the machine vision system to gain a significant improvement in processing time. By improving the processing time, this research seeks to address the lack of availability of rapid trash sensing systems and thus alleviate a situation in which the current systems view the cotton lint either well before, or after, the cotton is cleaned. This extended lag/lead time that is currently imposed on the cotton trash cleaning control systems, is what is responsible for system operators utilizing a very large dead-band safety buffer in order to ensure that the cotton lint is not under-cleaned. Unfortunately, the utilization of a large dead-band buffer results in the majority of the cotton lint being over-cleaned which in turn causes lint fiber-damage as well as significant losses of the valuable lint due to the excessive use of cleaning machinery. This research estimates that upwards of a 30% reduction in lint loss could be gained through the use of a tightly coupled trash sensor to the cleaning machinery control systems. This research seeks to improve processing times through the development of a new algorithm for cotton trash sensing that allows for implementation on a highly parallel architecture. Additionally, by moving the new parallel algorithm onto an alternative computing platform, the graphic processing unit “GPU”, for processing of the cotton trash images, a speed up of over 6.5 times, over optimized code running on the PC's central processing unit “CPU”, was gained. The new parallel algorithm operating on the GPU was able to process a 1024×1024 image in less than 17ms. At this improved speed, the image processing system's performance should now be sufficient to provide a system that would be capable of real-time feed-back control that is in tight cooperation with the cleaning equipment.

No MeSH data available.