Combining Sketch and Tone for Pencil Drawing production optimization process

Source: Internet
Author: User

Combining Sketch and Tone for Pencil Drawing Production is a very good natural image to pencil drawing algorithm, the specific algorithm principle is not much to say, do not understand can look at this blog, write very clearly:
http://blog.csdn.net/bluecol/article/details/45422763
Here is the main writing about my development process, and the use of the optimization method.
The algorithm consists of three steps:
(1) Generate Structure Map
(2) Generate Tone Map
(3) Generate Texture Map
In this, the most time is the first step (3), which involves the solution of a super-large sparse linear equation group, which will be step (1), involving the image of the 16 convolution operation. (2) Time-consuming is rather short and does not require optimization.

First Look at step (3), according to the implementation of the idea of the article to construct a sparse matrix, and then use CG solution, for 450 * 600 of the image, it takes about 600ms, and because the direct use of a readily available sparse matrix algorithm library, this module optimization is quite laborious. Is there a way to replace this step? Let's take a look at what the beta features of the generated CG solution are. The two pictures are used to experiment, in which the left image is the original picture, and the right image is a texture map.

The beta that was solved with CG is like this:

It is possible to see traces of the input texture image on the beta map, so it is possible to guess that Beta is essentially composed of the texture image that was entered and the other picture. Then consider the generation algorithm of texture map:
Texture map = (Tone map) ^beta, that is, Texture map due to the attenuation of the Tone map through a gamma transform, so this "another picture" is definitely related to Tone map. The diagram on the right is Tone map, compare beta and Tone map, and it's easy to get another image for synthesizing beta: (1–tone map):

Finding these two diagrams for compositing, we can borrow the method used by the pencil-drawing generation algorithm to synthesize structure map and Texture map, synthesizing the two graphs into a beta plot by multiplying the corresponding pixels of the two graphs. You can get an approximation of the beta diagram below, which we call Beta_:

Comparing, beta and beta_, it can be seen that these two are very similar, for approximate calculation is possible, the following is the final pencil drawn with these two pictures, the difference is very small. But there is the difference between the naked eye, the original beta-generated results, its texture is more delicate, almost do not see the input pencil texture, has become similar to a small particle-like texture. However, with the result of Beta_ generation, it can be seen that the texture image of the pencil strokes traces.

After this processing, we have simplified the most difficult part of the optimization, without solving the super-large-scale sparse linear equations, and using only the most basic pixels of the image to add and subtract the multiplication, so the speed becomes very fast, and the optimization of Step (3) is completed. Of course, if you want to pursue high-quality results, this simplification is not appropriate. But if you want to do a high-speed program that can be processed in real time, such processing is sometimes unavoidable.

Let's look at the optimization of section (1). As I said before, the most time-consuming is the convolution operation of the two groups of 8 directions, the most primitive convolution operation is the following:

Img_y = img; for(inty =0; Y < height; y++, img_y + = width) {float* img_x = img_y; for(intx =0; x < width; x + +, img_x++) {unsigned Char* kn_y = kernel; conv_y = m_conv2dimg + y * m_convwidth + x;floatsum =0; for(intyy =0; yy < Kersize; yy++, kn_y + = kersize, conv_y + = m_convwidth) {unsigned Char* kn_x = kn_y;float* conv_x = conv_y; for(intxx =0; xx < kersize;            xx++, kn_x++, conv_x++) {sum + = *conv_x * *kn_x;    }} *img_x = sum; }}

This operation has a 4 for loop, which is conceivable for its efficiency. After testing, using the picture of the rose above, in the 9*9 of the convolution nucleus, 180ms is required to calculate the convolution of 8 directions, and when the convolution core increases to 21*21, the time soars to more than 1s, which is completely unacceptable.

The key to the optimization is two points, one is to reduce the number of inner loop, and the other is to remove the multiplication operation.

First of all, remove the multiplication operation. On the internet can be found on the implementation of the algorithm, whether it is C + + or MATLAB, in the production of convolution core is basically used in the horizontal direction of the convolution nucleus, and then rotated convolution core to get the other 7. The biggest problem is that in the operation of the rotational convolution kernel, there is an interpolation algorithm that results in a non-0 non-1 value in the convolution kernel, so that we can only calculate the convolution using the original method of multiplying and adding. In fact, we can perform a threshold operation on the convolution core generated by the rotation, or directly draw a straight line in 8 directions to generate 8 of the convolution cores with a value other than 0 or 1. The advantage of this is that we can only add to pixels with a value of 1 in the convolution kernel, thus removing the multiplication.

Besides, the inner loop is reduced. Since most of the pixels in the convolution core are 0, at the very beginning, we can enlarge the upper-left corner of the image with this convolution kernel, and then use a pointer array to record the original image pointer corresponding to the non-0 pixels in the convolution kernel, and the convolution operation is to add the corresponding values of the pointers in this array. Then divide by the number of values that are not 0 (this number can be calculated in advance and then the division becomes a multiplication). As the convolution core moves to the right, all the pointers in the pointer array move to the right (+1). This allows us to turn the inner layer loop into a single-layer loop. After this treatment, for 9 * 9 of the convolution core, the speed can be increased by almost 6-8 times, from 180ms into more than 20 Ms. After this step of optimization, the code basically becomes the following:

Img_y = img; for(inty =0; Y < height; Y + +, img_y + = width) {float* img_x = img_y; for(intx =0; x < width; x + +, img_x++) {floatsum=0; for(inti =0; i < Listlen; i + +) {//listlen is the length of the pointer array                sum+ = * (Ptmp[i]);            Ptmp[i] + +; } *img_x =sum; } for(inti =0; i < Listlen;        i + +) {Ptmp[i] + = kersize; }}

After the optimization here, can you continue to optimize? After all, the inner layer still has a for loop, it is more time-consuming, can you take this for also removed? The answer is yes.
This is mainly due to the 8-directional convolution core of the non-0 elements are a straight line, so we can borrow the idea of the integration graph, that is, a one-dimensional integration graph along the line direction, and then use this one-dimensional integration chart to calculate the sum of pixels of any interval. This allows you to perform a two-tier loop of x < width and I < Listlen with two traversal.
Another optimization method, similar in efficiency, is the nature of the borrowed line, taking the horizontal convolution core as an example:

For example, assuming that the convolution core length is 5, we can calculate the sum of the first 5 pixels of each line according to the normal method, and the convolution kernel moves one pixel to the right, we see that only 1 is moved out, and 6 is added in, and the middle 2,3,4,5 is unchanged. So the sum of the pixels in the blue box above can be computed with the sums ' = Sum–p_1 + p_6. Using this method can also achieve the purpose of removing the inner loop. For the horizontal direction, using this method optimization, the program is generally the following like this, you can see, basically only two layers for the. Interestingly, after the optimization of this method, for the 9*9 size of the nucleus, if the core is not 0 elements of the width of 1, then the total convolution in 8 directions is similar to the results of the first optimization above, or about 20ms. A single look at the convolution time in each direction, horizontal, vertical, and 45 degrees, 135 degrees in the four direction of the calculation speed has been significantly accelerated, but 22.5 degrees, 67.5 degrees and other four direction calculation is a lot of intermediate amount needs to be calculated, slightly slower than the above algorithm. But if a larger nucleus, such as 21*21, is used, the acceleration is obvious. And the advantage of this method is that the convolution calculation speed is not related to the size of the convolution core, only the size of the image associated with the speed has become very stable. For example, after using the 21*21 nucleus, the convolution time in 8 directions is only less than 1ms more than the core of 9*9.

 for(inty =0; Y < height;      y++, img_y + = width, conv_y + = m_convwidth) {float* conv_x = conv_y + half_size; float* img_x = img_y;//calc the first sum       for(inti =0; i < Listlen;      i++) {Ptmp[i] = conv_x + offset[i]; } floatsum=0; for(inti =0; i < Listlen; i++) {sum+ = * (Ptmp[i]); } img_x[0] =sum; startp[0] = conv_x + offset[0]; endp[0] = conv_x + Offset[listlen-1]; for(intx =1; x < width; X + +) {sum-= * (startp[0]); startp[0]++; endp[0]++;sum+ = * (endp[0]); IMG_X[X] =sum; }  }

So far, we have basically completed the optimization of two modules, for the 640*480 diagram, has basically reached real-time processing. But the so-called optimization is endless, if you want to accelerate further, the following methods can also be tested:
1. Convolution in 8 directions with multithreading
2. For pixel-based processing of the block using SSE (mobile phone can be used neon) parallel processing.
3. In fact, after removing the module of solving sparse linear equations, the whole system only has the most basic pixel plus and minus multiplication and convolution operation, these three kinds of operations can be implemented on the GPU, and the speed is super fast. The students who are in need of their own doing it.

Combining Sketch and Tone for Pencil Drawing production optimization process

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.