1, principle
In the image affine transformation, many places need to use the interpolation operation, the common interpolation operation includes the nearest neighbor interpolation, the bilinear interpolation, the double three times interpolation, the Langsorth interpolation and so on method, OpenCV provides many methods, in which, bilinear interpolation due to the compromise interpolation effect and the computation speed, the use is comparatively widespread.
The more simple the model is, the better it is for example, we'll give you a simple image: 3*3 's 256-level grayscale graph. The pixel matrix of the pseudo-image is shown (this original figure is called the source graph, sources):
234 38 22
67 44 12
89 65 63
In this matrix, the element coordinates (x, y) are determined so that X is from left to right, starting at 0, y from top to bottom, and also from zero, which is the most commonly used coordinate system in image processing.
What do you do if you want to enlarge the image to 4*4 size? So the first step is to think of 4*4 first to draw out the matrix first, well, the matrix is drawn out, as shown below, of course, each pixel of the matrix is unknown, waiting for us to fill (this will be filled with the graph called the target graph, Destination):
? ? ? ?
? ? ? ?
? ? ? ?
? ? ? ?
Then we have to fill in the empty matrix, the value to be filled out from where? is from the source map, good, first fill in the top left corner of the target map of the pixel, coordinates (0,0), then the coordinates of the corresponding source map can be derived from the following formula srcx=dstx* (srcwidth/dstwidth), Srcy = Dsty * (srcheight/ Dstheight)
Well, to apply the formula, you can find the corresponding map coordinates (0* (3/4), 0* (3/4)) = (0*0.75,0*0.75) and (0,0), found the corresponding coordinates of the source map, you can put the coordinates of the source map (0,0) The 234-pixel value in the target graph (0,0) is filled in this position.
Next, look for the coordinates in the target graph (1,0) corresponding to the coordinates in the source graph, and apply the formula:
(1*0.75,0*0.75) = (0.75,0) The results found that there is a decimal in the coordinates, what can I do? The image in the computer is a digital image, the pixel is the smallest unit, the pixel coordinates are integers, and there is never a decimal coordinate. At this time the use of a strategy is to use the rounding method (can also adopt the method of directly drop the decimal place), the non-integer coordinates into an integer, good, then the rounding method to get the coordinates (1,0), the complete operation process is this: (1*0.75,0*0.75) = ( 0.75,0) = (1,0) then you can fill in a pixel to the target matrix, the same is the source map in the coordinates (1,0) of the pixel value 38 into the target map coordinates.
After filling each pixel sequentially, a magnified image is born, and the pixel matrix is as follows:
234 38 22 22
67 44 12 12
89 65 63 63
89 65 63 63
This magnified image method is called the nearest interpolation algorithm, which is one of the most basic and simplest image scaling algorithm, the effect is the most bad, the enlarged image has a very serious mosaic, reduced the image has a very serious distortion; the root cause of the poor effect is that its simple nearest interpolation method introduces serious image distortion, for example, When the coordinates of the target graph are reversed, the coordinates of the source graph are a floating-point number, using the rounding method, directly using the value of the nearest pixel to the floating-point, this method is very unscientific, when the coordinate value of 0.75 is pushed, should not be simply taken as 1, Since it is 0.75, 1 smaller than 0.25, than 0 to 0.75, then the target pixel value in fact should be based on the source map of the virtual point around the four real points to be calculated according to certain rules, so as to achieve a better zoom effect.
The dual-line interpolation algorithm is a good image scaling algorithm, which makes full use of the four real pixel values around the virtual point in the source graph to determine a pixel value in the target graph, so the scaling effect is much better than the simple nearest neighbor interpolation.
Bilinear interpolation algorithms are described as follows:
For a destination pixel, set the coordinates by the inverse transformation of the floating point coordinates (I+U,J+V) (where I, J are the integer portion of floating-point coordinates, u, V is the fractional part of floating point coordinates, is the value of the [0,1) interval floating point number), then this pixel value f (i+u,j+v) can be from the original image coordinates (I,j), (I+1,j), (i,j+1), (i+1,j+1) corresponds to the value of the surrounding four pixels, namely: f (i+u,j+v) = (1-u) (1-v) F (i,j) + (1-u) VF (i,j+1) + u (1-v) f (i+1,j) + UVF ( I+1,J+1)
where F (i,j) represents the pixel value at the source image (I,j), and so on.
For example, like in the example, now if the target map pixel coordinates are (a), then the inverse of the source map corresponding to the coordinates is (0.75, 0.75), which is actually a conceptual virtual pixel, the actual in the source map does not exist such a pixel, then the target map of the pixel (1, 1) The value can not be determined by this virtual image, and can only be determined by these four pixels of the source graph: (0,0) (0,1) (1,0), and because (0.75,0.75) to be closer to the (), then the decision to play a greater role, This can be reflected from the coefficient uv=0.75x0.75 in Equation 1, and (0.75,0.75) farthest from (0,0), so (0,0) The decision role will be smaller, the formula is (1-u) (1-v) =0.25x0.25 also reflects this feature.
2, calculation method
First, two linear interpolation calculations are performed in the x direction, and then the interpolation is calculated in the y direction.
When it comes to image processing, we start with
srcx=dstx* (Srcwidth/dstwidth),
Srcy = Dsty * (srcheight/dstheight)
To calculate the position of the target pixel in the source image, the SRCX and Srcy calculated here are generally floating-point numbers, such as f (1.2, 3.4), which is a virtual existence, first finding the four actual pixels adjacent to it
(1,3) (2,3)
(1,4) (2,4)
Written in the form F (i+u,j+v), then u=0.2,v=0.4, I=1, j=3
F (R1) =u (f (Q21)-F (Q11)) +f (Q11) when interpolating along the X-direction difference
This is calculated in the same direction as Y.
Alternatively, directly collate a step calculation, f (i+u,j+v) = (1-u) (1-v) F (i,j) + (1-u) VF (i,j+1) + u (1-v) f (i+1,j) + UVF (i+1,j+1).
3, acceleration and optimization strategy
Simply follow the interpolation algorithm implemented above can only barely complete the interpolation function, the speed and effect are not ideal, in the specific code implementation of a few tips. Refer to the OpenCV source code and online blog to organize the following two points:
The alignment of the source image and the geometric center of the target image.
Converting floating-point operations to integer operations
3.1 Alignment of the source image and the geometric center of the target image
Method: When calculating the virtual floating point coordinates of the source image, the general situation is:
srcx=dstx* (Srcwidth/dstwidth),
Srcy = Dsty * (srcheight/dstheight)
Center alignment (OpenCV as well):
srcx= (dstx+0.5) * (srcwidth/dstwidth)-0.5
srcy= (dsty+0.5) * (srcheight/dstheight)-0.5
Principle:
bilinear interpolation algorithm and needing attention this blog explains that "If you choose the upper-right corner as the origin (0,0), the rightmost and bottom pixels are not actually participating in the calculation, and the grayscale value computed for each pixel of the target image is also relative to the left bias of the source image." "I'm kind of in doubt.
Deform the formula, srcx=dstx* (srcwidth/dstwidth) +0.5* (srcwidth/dstwidth-1)
Equivalent to our original floating point coordinates added 0.5* (srcwidth/dstwidth-1) Such a control factor, the sign can be positive, and srcwidth/dstwidth ratio is the current interpolation is to expand or reduce the image, what role? Look at an example: if the source image is 3*3, the center point coordinates (the target image is 9*9, center point coordinates (), we in the interpolation mapping, as far as possible, we want to use the source image of the pixel information, the most intuitive is (() map to () now directly calculate srcx=4* 3/9=1.3333! =1, that is, the pixels we use when interpolating are centered on the lower right of the image, rather than evenly distributing the entire image. Now consider the center point alignment, Srcx= (4+0.5) *3/9-0.5=1, just meet our requirements.
3.2 Converting floating-point operations to integer operations
Optimization of bilinear interpolation algorithm for reference image processing boundary
Directly to calculate, because the calculation of SRCX and Srcy are floating point number, the subsequent will be a lot of multiplication, and the image data volume is large, the speed is not ideal, the solution is:floating-point arithmetic →→ integer operation →→ "<< left and right shift bitwise operations"。
The main object of the amplification is u,v these floating-point numbers, OpenCV Select the magnification is 2048 "how to take this appropriate magnification, to consider from three aspects, first: the accuracy of the problem, if the number is too small, then after the calculation may result in a larger error." Second, this number cannot be too large, and too much of the assembly causes the computational process to be more than the range that long-shaped can express. Third: speed considerations. If the magnification is 12, then the calculation in the final result should be divided by 12*12=144, but if taken as 16, then the last divisor is 16*16=256, this number is good, we can use the right shift to achieve, and the right shift is much faster than the normal division. "We can use the left shift 11-bit operation to achieve amplification purposes."
4, Code
uchar* dataDst = matDst1.data;
int stepDst = matDst1.step;
uchar* dataSrc = matSrc.data;
int stepSrc = matSrc.step;
int iWidthSrc = matSrc.cols;
int iHiehgtSrc = matSrc.rows;
for (int j = 0; j < matDst1.rows; ++j)
{
float fy = (float)((j + 0.5) * scale_y - 0.5);
int sy = cvFloor(fy);
fy -= sy;
sy = std::min(sy, iHiehgtSrc - 2);
sy = std::max(0, sy);
short cbufy[2];
cbufy[0] = cv::saturate_cast<short>((1.f - fy) * 2048);
cbufy[1] = 2048 - cbufy[0];
for (int i = 0; i < matDst1.cols; ++i)
{
float fx = (float)((i + 0.5) * scale_x - 0.5);
int sx = cvFloor(fx);
fx -= sx;
if (sx < 0) {
fx = 0, sx = 0;
}
if (sx >= iWidthSrc - 1) {
fx = 0, sx = iWidthSrc - 2;
}
short cbufx[2];
cbufx[0] = cv::saturate_cast<short>((1.f - fx) * 2048);
cbufx[1] = 2048 - cbufx[0];
for (int k = 0; k < matSrc.channels(); ++k)
{
*(dataDst+ j*stepDst + 3*i + k) = (*(dataSrc + sy*stepSrc + 3*sx + k) * cbufx[0] * cbufy[0] +
*(dataSrc + (sy+1)*stepSrc + 3*sx + k) * cbufx[0] * cbufy[1] +
*(dataSrc + sy*stepSrc + 3*(sx+1) + k) * cbufx[1] * cbufy[0] +
*(dataSrc + (sy+1)*stepSrc + 3*(sx+1) + k) * cbufx[1] * cbufy[1]) >> 22;
}
}
}
cv::imwrite("linear_1.jpg", matDst1);
cv::resize(matSrc, matDst2, matDst1.size(), 0, 0, 1);
cv::imwrite("linear_2.jpg", matDst2);
Reference: implementation process of five interpolation algorithms for resize function in OpenCV
The basic principle of nearest-neighbor interpolation and bilinear interpolation in image scaling
opencv--bilinear interpolation (Bilinear interpolation)