Gaussian filtering is a lot of Image ProcessingAlgorithmIt is of great significance to implement the quick Gaussian filter algorithm through the most crucial intermediate step.
By reading previous documents about the Gaussian filter fast algorithm, we have achieved our own rapid Gaussian filter algorithm, which is nearly six times faster with the neon command.
1. Mean filtering approaches Gaussian filtering
This algorithm has the advantage of being simple. Generally, Gaussian filtering can be achieved through three mean filters. Of course, if you need a higher accuracy, You need to perform more mean filtering.Simple mean filtering determines the length of mean Filtering Based on the Delta parameter of Gaussian. Delta is continuous, while the length of mean filtering is an integer, resulting in approximation of different delta.There is a certain error. The implementation of O (1) time complexity of mean filtering requires an integral graph, which occupies a cache equivalent to the original image. In this way, mean filtering approaches Gaussian filtering.You need to calculate three integral graphs and traverse three integral graphs to calculate the mean value. Moreover, this algorithm cannot accelerate with the cpu simd command.
For the mean filtering algorithm that approaches Gaussian filtering and Its Accuracy Improvement, refer to the following documents:
Peter kovesi 2009, arbitrary Gaussian filtering with 25 addtions and 5 multiplications per pixel
2. Extended Binary filter approaching Gaussian filter
References for this algorithm are:
Extended binomial filter for fast Gaussian blur
The idea of this algorithm is still to use mean approximation, but the length of each mean filter is different, and it is a weighted mean. However, the integral graph is not used during implementation, but the mean value is calculated recursively.The references provideSource code, Its sourceCodeWritten for a plug-in language of Photoshop. It should be that I am too dull and have not thoroughly understood the details of the algorithm. There are several important aspects in its code.The initial parameters are unknown.
Based on my experience, the above algorithms are not actually faster than Recursive Filtering, because its code is used to recursively calculate the mean filtering in the vertical direction, the image data is accessed across rows. This algorithm cannot be accelerated using the SIMD command of the CPU.
3. IIR filter approaching Gaussian filter
References for this algorithm include:
Your ID deriche-"recursively implementing the Gaussian and its derivatives", 1993.
Lucas J. Van Vliet, Ian T. Young and Piet W. Verbeek-"recursive Gaussian derivative filters", 1998
Dave Hale, "recursive Gaussian filters", CWP-546
This algorithm is implemented by cascade two IIR filters, one of which is a non-causal IIR filter process.
In addition, the intel website has an article on the code that uses the SSE command to optimize the IIR recursion to approach Gaussian filter. However, the code is too complex to write, and non-professionals can also!
There are many programming techniques for implementing Gaussian filter by using IIR filter, especially when using the neon command to accelerate recursive IIR filter.
4. Performance Test of IIR recursive Gaussian filter
4.1 comparison with Photoshop cs5.0
& Lt; Col width = "388" & gt;
& Lt; Col width = "248" & gt;
| |
time |
processor |
| Photoshop cs5.0 Gaussian filter radius: 250 x pixel color photo |
1.5 seconds |
intel I3 CPU clock speed 2.3 GHz 2 GB memory |
| my Gaussian filter radius 250 pixel color photos |
1 second |
intel I3 CPU clock speed 2.3 GHz 2 GB memory |
Note: The Gaussian filter of PS cs5 performs different optimizations on Gaussian filter with different radius. It is extremely fast when the radius is very small and does not rule out multi-core optimization.The IR Recursive Filtering algorithm has the same radius calculation time.
4.2 Performance Test Data in the rvds Environment
|
Instruction count |
Number of cycles |
Processor |
| C code, 1024x768 grayscale image |
110 m |
194 m |
Arm-cortex A8 |
| Neon command assembly code, 1024x768 grayscale image |
34 m |
35 m |
Arm-cortex A8 |
| C code, 1024x768 Color Image |
338 m |
558 m |
Arm-cortex A8 |
| Neon command assembly code, 1024x768 Color Image |
84 m |
85 m |
Arm-cortex A8 |
4.3 Test data on real iPod Devices
|
Time |
Device |
| Neon command assembly code, 720x576 grayscale image |
25 ms |
Ipod4 |
4.4 memory consumption
|
Memory RAM consumption |
| C code |
2 * max (width, height) |
| Neon command assembly code |
8 * max (width, height) |