Detailed description of linear interpolation algorithm for Image Scaling

Last Update:2018-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

People who have done image programs in Windows should know that Windows's GDI has an API function: stretchblt, which corresponds to the stretchdraw method of the TCanvas class in VCL. It is easy to scale the image. But the problem is that it uses the fastest, simplest, and worst-performing "Nearest Neighbor Method". Although it is enough in most cases, however, this is not suitable for situations with high requirements.

Not long ago, I made a little trick to manage a bunch of photos I took with DC. One plug-in provided the zoom function. The current version uses stretchdraw, and sometimes the effect is unsatisfactory, I always wanted to add two better methods: linear interpolation and cubic spline. After research, we found that the cubic spline method has a large amount of computing and is not very practical. Therefore, we decided to only use the linear interpolation method version.

From the basic theory of digital image processing, we can know that the deformation transformation of the image is the coordinate transformation from the source image to the target image. The simple idea is to convert the coordinates of each vertex of the source image into the new coordinates of the corresponding vertex of the target image through the deformation operation, but this will lead to a problem that the coordinates of the target vertex are usually not integers, in addition, the image amplification operation will cause the target image to be not mapped to the source image point, which is a disadvantage of the so-called "Forward ing" method. Therefore, the "reverse ing" method is generally used.

However, the reverse ing method also causes the problem that the coordinates mapped to the source image are not integers. Here we need a "re-sampling filter ". This term looks very professional, but it is actually because it borrowed the conventional saying in electronic signal processing (in most cases, its function is similar to the band-pass filter in electronic signal processing ), it is not complicated to understand, that is, how to determine the color of the point at the non-integer coordinate. The three methods mentioned above: the recent neighborhood method, linear interpolation method, and cubic spline method are called "subsampling filters ".

The so-called "neighborhood" refers to rounding the non-integer coordinate to obtain the color of the vertex at the coordinate of the nearest integer point. The linear interpolation method is used for linear interpolation based on the colors of the nearest points (four points for a flat image) (two-dimensional linear interpolation for a flat image) to estimate the color of this point, in most cases, its accuracy is higher than the neighborhood method, of course, the effect is also much better, the most obvious is in the amplification, the image's edges are much smaller than the nearest neighbor method. Of course, it also leads to a problem: that is, the image looks soft. In terms of professional terms, this filter is called: The Band resistance performance is good, but there is a band-pass loss, and the rectangular coefficient of the band-pass curve is not high. As for the cubic spline method, I will not talk about it. It is a little more complicated. You can refer to professional books on digital image processing, such as references in this article.

Let's discuss the algorithm of coordinate transformation. A simple spatial transformation can be expressed using a transformation matrix:

[X', y', W'] = [U, V, W] * t

Here: X' and y' are the coordinates of the target image, and U and V are the coordinates of the source image. W and W' are called homogeneous coordinates, which are usually set to 1, T is a 3x3 transformation matrix.

Although this representation method is mathematical, it can be used to easily represent different transformations, such as translation, rotation, scaling, and so on. For scaling, it is equivalent:

[Su 0 0]

[X, y, 1] = [U, V, 1] * | 0 SV 0 |

[0 0 1]

Su and SV are the zooming rate in the X axis and Y axis respectively. If the value is greater than 1, The zooming rate is increased. If the value is greater than 0 and less than 1, The zooming rate is reversed.

Does the matrix look dizzy? In fact, the above formula is expanded by matrix multiplication:

{X = u * su

{Y = V * SV

That's simple. Pai_^

With the above three preparations, you can start to write code. The idea is simple: first, traverse each vertex coordinate of the target image with two duplicates and use the transformation formula above (Note: Because reverse ing is used, the corresponding transformation formula should be: U = x/Su and V = y/SV) to obtain the source coordinate. Because the source coordinate is not an integer coordinate, two-dimensional linear interpolation is required:

P = N * B * pa + N * (1-B) * Pb + (1-N) * B * PC + (1-N) * (1-B) * PD

Where: N is V (the Y axis coordinate of the corresponding vertex in the source image, which is generally not an integer), and the Y axis coordinate of the closest row is similar to that of V. Similarly, B is similar, but it is the X axis coordinate. Pa-Pd is the color of the source image points (in the pixels attribute of TCanvas) that are closest to the four (upper left, upper right, lower left, and lower right) points ). P is the interpolation color of (u, v) points, that is, the approximate color of (x, y) points.

I will not write this code because it is too inefficient: to perform the complex floating point operation on the RGB of each vertex of the target image. Therefore, optimization is required. For VCL applications, a simple optimization method is to use the scanline attribute of tbitmap for Row-based processing to avoid pixel-level operations of pixels, this can greatly improve the performance. This is already the basic optimization knowledge for image processing using VCL. However, this method does not always work. For example, more techniques are required for image rotation.

In any case, the overhead of floating point operations is much larger than that of integers, which must be optimized. As can be seen from the above, floating point numbers are introduced during the transformation, while the Su and SV parameters of the transformation are usually floating point numbers, so we can optimize them from it. Generally, Su and SV can be expressed as scores:

Su = (double) DW/SW; Sv = (double) DH/SH

DW and DH are the width and height of the target image, SW and sh are the width and height of the source image (because they are all integers, type conversion is required for obtaining floating point results ).

The new Su and SV are substituted into the preceding transformation and interpolation formulas to export the new interpolation formula:

Because:

B = 1-x * SW % DW/(double) dw; n = 1-y * sh % DH/(double) DH

Settings:

B = DW-x * SW % dw; n = DH-y * sh % DH

Then:

B = B/(double) dw; n = N/(double) DH

Use the integer B and N to replace the floating point B and N. The conversion interpolation formula is as follows:

P = (B * n * (Pa-Pb-PC + PD) + DW * n * Pb + DH * B * PC + (DW * DH-DH * B-DW * n) * PD)/(double) (DW * DH)

Here, the final result P is a floating point number, which is rounded to the result. To completely remove floating-point numbers, you can use this method to rounding up:

P = (B * n... * PD + DW * DH/2)/(DW * DH)

In this way, p is the integer value after rounding. All calculations are integer operations.

The code after simple optimization is as follows:

Int _ fastcall tresizedlg: stretch_linear (graphics: tbitmap * adest, graphics: tbitmap * asrc)

{

Int Sw = asrc-> width-1, SH = asrc-> height-1, DW = adest-> width-1, DH = adest-> height-1;

Int B, n, x, y;

Int npixelsize = getpixelsize (adest-> pixelformat );

Byte * plineprev, * plinenext;

Byte * pdest;

Byte * pA, * pb, * PC, * PD;

For (INT I = 0; I <= DH; ++ I)

{

Pdest = (byte *) adest-> scanline [I];

Y = I * sh/DH;

N = DH-I * sh % DH;

Plineprev = (byte *) asrc-> scanline [y ++];

Plinenext = (n = DH )? Plineprev: (byte *) asrc-> scanline [y];

For (Int J = 0; j <= dw; ++ J)

{

X = J * Sw/DW * npixelsize;

B = DW-J * SW % dw;

Pa = plineprev + X;

PB = pa + npixelsize;

PC = plinenext + X;

Pd = PC + npixelsize;

If (B = DW)

{

PB = PA;

Pd = pc;

}

For (int K = 0; k <npixelsize; ++ K)

* Pdest ++ = (byte) (INT )(

(B * n * (* pA ++-* pb-* PC + * PD) + DW * n ** Pb ++

+ DH * B ** PC ++ (DW * DH-DH * B-DW * n) ** PD ++

+ DW * DH/2)/(DW * DH)

);

}

Return 0;

}

It should be said that it is concise. Because the width and height are calculated from 0, one is required. getpixelsize determines the number of bytes of each pixel Based on the pixelformat attribute, this Code only supports 24 or 32-bit colors (for 15 or 16-bit colors, they need to be split by bit-because if they are not split, unexpected carry or bits will appear in the computation, cause image color confusion-difficult to process; you need to check the color palette for 8-bit and below-8-bit index colors, and re-indexing is required, which is also very troublesome, so it is not supported; but 8-bit grayscale images are supported ). In addition, some code is added to the Code to prevent cross-border access at the edge of the image.

Through comparison, in the PIII-733 machine, the target image is less than 1024x768, basically do not feel the speed is significantly slower than the stretchdraw (with floating point feeling more obvious ). The effect is also quite satisfactory, and the image quality is significantly improved by using the stretchdraw method, whether reduced or amplified.

However, due to the integer calculation, there is a problem that must be paid attention to, that is, overflow: Because the denominator in the formula is DW * DH, the result should be a byte, which is an 8-bit binary number. A signed integer can represent a maximum of 31-bit binary numbers. Therefore, the DW * DH value cannot exceed 23-bit binary numbers, that is, the resolution of the target image cannot exceed 4096*2048 Based on the aspect ratio. Of course, this can also be expanded by using unsigned numbers (one digit can be added) and reduced computing accuracy. If you are interested, you can try it on your own.

Of course, this code is still far from being optimized to the extreme, and there are still many problems that have not been studied in depth, such as anti-aliasing. If you are interested, you can refer to relevant books and research on your own. (From: dipai image)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Detailed description of linear interpolation algorithm for Image Scaling

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Detailed description of linear interpolation algorithm for Image Scaling

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support