Multi-threaded computation of image processing

Source: Internet
Author: User

The algorithm complexity of image processing is usually high, and the calculation is Time-consuming. CPU multithreading ability can greatly accelerate the computation Speed. however, in order to ensure that the results of multithreading and the results of single-threaded processing are exactly the same, there are some areas that require special consideration for multi-threaded computation of Images.

Basic Idea: in order to allow multiple threads to process concurrently, then the data processed by each cannot have intersection, which is well understood. The basic idea is to divide a pair of images into sub-blocks, each of which is bound to have no intersection, and each thread processes a chunk of data, which is finished to synthesize the final image of all the child block processing Results.

First of all, the size of each child block is of course an issue to consider. Usually when an application takes a long time to operate, it should be communicated to the user in the appropriate way. Now that we have the image molecule block processing, if the single block processing time is very short, then every time a child block of data processing is completed, we can immediately display its corresponding processing results to the User. The user will see the results of the various parts of the image continuously displayed until the entire image is Completed. To some extent this is to inform the user that the process is in progress and that the user will have to wait too long to complete the image processing. From this point of view, if the size of the sub-block is too large, each sub-block calculation time will certainly be extended correspondingly, for the rapid display of partial processing results to the user is Unfavorable. however, If the child blocks are too small, the total number of child blocks increases, which will certainly increase the threading overhead and other overhead (splitting the image, allocating the Sub-block data, and so on), which is detrimental to the overall computational time. This is a trade-off problem that can be determined on a case-by-situation basis.

In addition, many image processing takes into account the domain-wide information of the pixel, so the processing of each child block cannot just use the contents of the child Block. specifically, for pixels near the edge of the sub-block, but also to take into account some of the pixel information outside the block, add the calculation, in order to ensure that the corresponding pixel processing results are correct. To be exact, if the domain radius is r (other areas can be adjusted for the square or circular field), then all the data required for the child block processing is the range of extended r pixels around the child block.

CRect rect1, rect2;rect1. Copyrect (prect[i]); rect1. Inflaterect (extend, extend); rect2. Copyrect (prect[i]); rect2. Movetoxy (extend, extend); If (rect1.top < 0) {    rect2. Offsetrect (0, rect1.top);    Rect1.top = 0;} If (rect1.left < 0) {    rect2. Offsetrect (rect1.left, 0);    Rect1.left = 0;} If (rect1.bottom > Height) rect1.bottom = height;if (rect1.right > Width) rect1.right = width;

Extend in the code is the size of the child block to expand around, in fact, the domain radius R. Prect[i] is the size of the segment I Block. Height and width are the height and width of the original, and the expansion sub-blocks naturally cannot exceed the original Size. So the final rect1 is to calculate the required data in the original area of the field, the application of the size of the original image to limit it. Since I treat each chunk as a new image, rect2 is where the result of the neutron block processing of the new image is, and it is used to synthesize the final image.

finally, The threads are specifically created for destruction, resource allocation and recycling, thread synchronization and communication, not specifically Discussed. Just discuss the problem of how multithreading works in this Area. Since the thread that computes the child block is only responsible for processing the child block, someone needs to do the partitioning, assigning the data to the child block compute thread, and so On. Should have drawn flowchart, really lazy to draw, here is a brief description of how several threads to coordinate the work, in fact, is also very simple. Interface thread a, handles and interacts with users, accepts user commands, and sends compute messages to thread B. The Compute Coordinator Thread B accepts a message, splits the child block, allocates the child block data, and creates the child block compute thread Ci. The child block compute thread CI is responsible for the child block calculation, sending the processing result (success or failure) message to thread B or A. Interface thread A receives the child block completion message, can immediately display the child block processing results, of course, can do nothing, wait until all the child blocks are processed and then Displayed. Coordinate thread B receives the block I completion message, reclaims the resources assigned to the thread ci, and destroys the Ci. If all the CI has finished working, B sends the message that the image processing completes to A,a can follow up the Work. A single thread B is used here to do the Sub-block calculation coordination work, it feels more clear. of course, You can also let the interface thread A to do this work, coordination of the workload is not very large, so that you can not need b-thread.

single-threaded and multithreaded processing time comparisons

Multithreading speeds must not simply be n times the speed of single-threaded processing, which is just the ideal Condition. Because of the extra work (thread overhead, the preparation of data for each thread, the composition of the processing result, synchronization between threads, and partial repetition of the image Sub-block combined part), multithreading is not possible to achieve the desired Condition. The following table lists a pair of 2400x1350 size 24bit images divided into 12 sub-blocks, on a I5 4300U (dual core four thread) notebook and a I5 6500 (quad core four thread) desktop, processing the approximate average time of Gaussian blur. Gaussian fuzzy algorithm is a simple row and column direction two one-dimensional calculation, radius of 50. In my tests, I also showed the results of block processing in real time, which may be slower.

speed up
  Luma channel RGB channel
  single thread multithreading speed up single threaded multithreading
I5 4300U (dual core four threads) 1100 msec 550 milliseconds 50% 3060 milliseconds 1250 milliseconds 59%
i5 6500 (quad-core four Threads) 670 milliseconds 240 milliseconds 64% 1850 Ms 560 msec 69. 7%

Ideally four threads can take up to 75% less time, and in fact they won't. On the Dual-core Four-wire accesses tile, multithreading is less than half of the time-consuming process (50%) for luminance Channels. For RGB channels, multithreading is about 59% less time-consuming than single-threading. On Quad-core Four-wire accesses tile, multithreading takes time to reduce the brightness channel and RGB channel processing by 64% and 69%, respectively. As you can see, the acceleration effect of multithreading is quite obvious. The toothpaste factory hyper-threading effect is quite amazing, otherwise it is impossible to reduce the time spent on a dual-core CPU by more than 50%. of course, the number of physical cores is even more important.

And you can see a phenomenon, under single-threaded processing, RGB Three-channel processing time is 3 times times the Brightness channel processing time is slightly less, about 2.8 times times (the Luminance channel also includes some of the conversion between RGB and brightness of the additional calculation amount). In multi-threaded, RGB Three-channel processing time is significantly less than the brightness of a channel processing time of 3 times times, about 2.38 times Times. It saves more time than a single thread. This is because the processing of RGB is also handled in one thread of the child block, and does not add new thread Overhead. therefore, threading overhead is also a factor that must be considered, and cannot be ignored.

Multi-threaded computation of image processing

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.