Delphi Image Processing-Gaussian blur

Source: Internet
Author: User

Reading Tips:

Delphi Image ProcessingThe series focuses on efficiency. The general code is Pascal, and the core code is BaSm.

C ++ Image ProcessingThe series focuses on code clarity and readability, and all uses C ++ code.

Make sure that the two items are consistent and can be compared with each other.

The code in this article must include the imagedata. Pas unit in "Delphi Image Processing-data type and public process.

 

Note: The image Gaussian fuzzy processing code is modified most often. Although this modification does not change the algorithm, the processing process has been modified. This alone can increase the processing speed by more than 40% on the original basis. At the same time, the SSE floating-point operation is used to replace the fixed-point operation of the original assembly. In order to facilitate the comparison, and do not want to destroy the original code, the original code is slightly modified and retained. The modified code is appended to the article.

In my article "Delphi Image Processing-image convolution", I once introduced the use of common image convolution processes to perform Gaussian blur on the image. The processing effect is not bad, it feels okay to process small images, but it is still too slow to process large images, on my P4 2.8 GB and 1 GB memory machines, perform Gaussian fuzzy q = 3, r = 5 on 10 million pixel images, excluding image loading and pre-data conversion, the processing speed has never been significantly improved after several modifications. The main reason is the general convolution process: the convolution template obtained with r = 5 is 11x11 pixels, and each pixel has four components (32-bit argb ), each pixel must be subject to 11*11*4 = 484 multiplication, 484 addition, and 4 Division. Finally, it must be subject to determining whether four components are super-bounded, it's hard to think about it! If you do not use BaSm to Process Code with fixed points, the processing speed is even harder to imagine.

I searched for image Gaussian Blur optimization algorithms multiple times on the Internet. Many algorithms and processing methods, including code optimization, were not as good as my Gaussian fuzzy processing process. I was very disappointed. When searching for other materials the day before yesterday, a foreign website found that the Gaussian Blur Processing Method for images seemed to be different from conventional algorithms, but there was no detailed information (because they do not understand foreign languages, I seldom access websites outside China, but I still read some formulas and pseudo-code.) After repeated efforts, I can summarize the processing process as follows:

1. Calculate a size + 1 long Gaussian distribution weight data using the given Q and Length Size:

// Calculate the initial data for (I =-radius; I <= radius; I ++) {x = I/Q; weights [I + radius] = exp (-x * X/2)} // sum = 0for (I =-radius; I <= radius; I ++) {sum + = weights [I + radius]} // Data normalization, that is, the sum of the data after normalization is equal to 1for (I =-radius; I <= radius; I ++) {weights [I + radius]/= sum}

2. Use weights to perform vertical fuzzy operations on the original image, that is, taking the pixel (x, y) as the center, right (X, Y-radius) and (X, Y + radius) after the pixels are multiplied by the values of weights, the new pixels are obtained and written to the corresponding points on a temporary image (because the data is normalized, product and Division operations are not required );

3. Use weights to perform horizontal Fuzzy Operations on temporary images, that is, taking the pixel (x, y) as the center, to (x-radius, Y) and (x + radius, Y) after the pixels are multiplied by weights, the new pixels are obtained and written to the corresponding points on the target image.

The process ends.

Since the above processing process only performs a "Ten" operation on each pixel of the image, the operation on each pixel point is greatly reduced, and the greater the fuzzy length, the more reduced. As mentioned above, the Q = 3 and r = 5 Fuzzy Operations only require 11*2*4 = 88 multiplication and 88 addition operations.

I still use BaSm to calculate the number of points according to the above process. The improved Gaussian fuzzy Process Code is as follows:

 

procedure CrossBlur(var Dest: TImageData; const Source: TImageData; Weights: Pointer; Size: Integer);var  height, srcStride: Integer;  _weights: Pointer;  dstOffset, srcOffset: Integer;  reds, greens, blues: Integer;asm    push    esi    push    edi    push    ebx    mov     _Weights, ecx    mov     ecx, [edx].TImageData.Stride    mov     srcStride, ecx    call    _SetCopyRegs    mov     height, edx    mov     dstOffset, ebx    push    esi    push    edi    push    edx    push    ecx    push    eax    // blur col    add     ecx, Size           // width = Source.Width    dec     ecx    mov     edi, _weights       // edi = weights@@cyLoop:    push    ecx@@cxLoop:    push    ecx    push    esi    push    edi    xor     ebx, ebx    mov     reds, ebx    mov     greens, ebx    mov     blues, ebx    mov     ecx, Size@@cblurLoop:    movzx   eax, [esi].TARGBQuad.Blue    movzx   edx, [esi].TARGBQuad.Green    imul    eax, [edi]    imul    edx, [edi]    add     blues, eax    add     greens, edx    movzx   eax, [esi].TARGBQuad.Red    movzx   edx, [esi].TARGBQuad.Alpha    imul    eax, [edi]    imul    edx, [edi]    add     reds, eax    add     ebx, edx    add     edi, 4    add     esi, srcStride    loop    @@cblurLoop    pop     edi    pop     esi    mov     eax, blues    mov     edx, greens    mov     ecx, reds    shr     eax, 16    shr     edx, 16    shr     ecx, 16    shr     ebx, 16    mov     [esi].TARGBQuad.Blue, al    mov     [esi].TARGBQuad.Green, dl    mov     [esi].TARGBQuad.Red, cl    mov     [esi].TARGBQuad.Alpha, bl    add     esi, 4    pop     ecx    loop    @@cxLoop    pop     ecx    dec     height    jnz     @@cyLoop    pop     srcOffset    pop     ecx    pop     height    pop     edi    pop     esi    // blur row@@ryLoop:    push    ecx@@rxLoop:    push    ecx    push    esi    push    edi    xor     ebx, ebx    mov     reds, ebx    mov     greens, ebx    mov     blues, ebx    mov     ecx, Size    mov     edi, _weights@@rblurLoop:    movzx   eax, [esi].TARGBQuad.Blue    movzx   edx, [esi].TARGBQuad.Green    imul    eax, [edi]    imul    edx, [edi]    add     blues, eax    add     greens, edx    movzx   eax, [esi].TARGBQuad.Red    movzx   edx, [esi].TARGBQuad.Alpha    imul    eax, [edi]    imul    edx, [edi]    add     reds, eax    add     ebx, edx    add     edi, 4    add     esi, 4    loop    @@rblurLoop    pop     edi    pop     esi    mov     eax, blues    mov     edx, greens    mov     ecx, reds    shr     eax, 16    shr     edx, 16    shr     ecx, 16    shr     ebx, 16    mov     [edi].TARGBQuad.Blue, al    mov     [edi].TARGBQuad.Green, dl    mov     [edi].TARGBQuad.Red, cl    mov     [edi].TARGBQuad.Alpha, bl    add     esi, 4    add     edi, 4    pop     ecx    loop    @@rxLoop    add     esi, srcOffset    add     edi, dstOffset    pop     ecx    dec     height    jnz     @@ryLoop    pop     ebx    pop     edi    pop     esiend;procedure ImageGaussiabBlur(var Data: TImageData; Q: double; Radius: Integer);var  src: TImageData;  fweights: array of Single;  weights: array of Integer;  i, size: Integer;  fx: Double;begin  if Radius <= 0 then  begin    if Abs(Q) < 1.0 then Radius := 1    else Radius := Round(Abs(Q)) + 2;  end;  size := Radius shl 1 + 1;  SetLength(fweights, size);  for i := 1 to Radius do  begin    fx := i / Q;    fweights[Radius + i] := exp(-fx * fx / 2);    fweights[Radius - i] := fweights[Radius + i];  end;  fweights[Radius] := 1.0;  fx := 0.0;  for i := 0 to size - 1 do    fx := fx + fweights[i];  SetLength(weights, size);  for i := 0 to size - 1 do    weights[i] := Round(fweights[i] / fx * 65536.0);  SetLength(fweights, 0);  src := _GetExpandData(Data, Radius);  CrossBlur(Data, src, weights, size);  FreeImageData(src);end;

The improved Gaussian fuzzy processing process is used on my machine to perform Gaussian fuzzy q = 3, r = 5 on 10 million pixel images, excluding image loading and pre-data conversion, the processing speed was significantly improved with a time consumption of 1390 Ms. I process the image pixels in 32-bit argb color. If I change the color to 24-bit RGB, the time consumption can be reduced. However, RGB colors cannot process 32-bit pixel images such as PNG.

What is the result of Gaussian fuzzy processing using ten words instead of template convolution? Please refer to the following simple example code and processing:

Sample Code:

procedure TForm1.Button3Click(Sender: TObject);var  bmp: TGpBitmap;  g: TGpGraphics;  data: TImageData;begin  bmp := TGpBitmap.Create('..\media\56-3.jpg');  g := TGpGraphics.Create(Canvas.Handle);  g.DrawImage(bmp, 0, 0);  data := LockGpBitmap(bmp);  ImageGaussiabBlur(Data, 3, 6);  UnlockGpBitmap(bmp, data);  g.DrawImage(bmp, data.Width, 0);  g.Free;  bmp.Free;end;

Process the source image:

Comparison between processing effect and Photoshop Gaussian fuzzy processing:

On the top left is the Photoshop radius 3.0 Gaussian blur, and on the top right is the process q = 3.0, r = 6 Gaussian blur.

The lower left is the 5.0 Gaussian blur of Photoshop radius, and the lower right is the process q = 5.0, r = 9 Gaussian blur.

How about it? The effect is good!

Unfortunately, I failed to find the method for automatically calculating the Blur radius according to Q, so two parameters Q and radius are provided in the processing process.

The following is the SSE code after this modification. The principle and algorithm are the same as above, but the processing method is somewhat different: Because the upper and lower sides of the Gaussian fuzzy matrix are symmetric, the radius point is the center, add the upper and lower symmetric rows (during column processing) or the left and right symmetric columns (during row processing) and multiply them with the Gaussian distribution weight data. In this way, except for the central row (column, only the previous 50% processing is required.

procedure CrossBlur(var Dest: TImageData; const Source: TImageData; Weights: Pointer; Radius: Integer);var  height, srcStride: Integer;  dstOffset, srcOffset: Integer;asm    push      esi    push      edi    push      ebx    push      ecx    mov       ecx, [edx].TImageData.Stride    mov       srcStride, ecx    call      _SetCopyRegs    mov       height, edx    mov       srcOffset, eax    mov       dstOffset, ebx    pop       ebx    pxor      xmm7, xmm7    push      esi           // pst = Source.Scan0    push      edi    push      edx    push      ecx    // blur col    mov       eax, srcStride    mov       edx, eax    shr       edx, 2        // width = Source.Width    mov       edi, Radius    shl       edi, 1    imul      edi, eax    add       edi, esi      // psb = pst + Radius * 2 * Source.Stride@@cyLoop:    push      edx@@cxLoop:    push      esi    push      edi    push      ebx    mov       ecx, Radius    pxor      xmm0, xmm0    // sum = 0@@cblurLoop:    movd      xmm1, [esi]   // for (i = 0; i < Radius; i ++)    movd      xmm2, [edi]   // {    punpcklbw xmm1, xmm7    punpcklbw xmm2, xmm7    paddw     xmm1, xmm2    //   ps = pst + psb    punpcklwd xmm1, xmm7    cvtdq2ps  xmm1, xmm1    //   pfs (flaot * 4) = ps (int * 4)    mulps     xmm1, [ebx]   //   pfs *= Weights[i]    addps     xmm0, xmm1    //   sum += pfs    add       ebx, 16    add       esi, eax      //   pst += Source.Stride    sub       edi, eax      //   psb -= Source.Stride    loop      @@cblurLoop   // }    movd      xmm1, [esi]    punpcklbw xmm1, xmm7    punpcklwd xmm1, xmm7    cvtdq2ps  xmm1, xmm1    // pfs (flaot * 4) = pst (int * 4)    mulps     xmm1, [ebx]   // pfs *= Weights[Radius]    addps     xmm0, xmm1    // sum += pfs    pop       ebx    pop       edi    pop       esi    cvtps2dq  xmm0, xmm0    // ps (int * 4) = sum (flaot * 4)    packssdw  xmm0, xmm7    packuswb  xmm0, xmm7    movd      [esi], xmm0   // pst (byte * 4) = ps (int * 4) pask    add       esi, 4    add       edi, 4    dec       edx    jnz       @@cxLoop    pop       edx    dec       height    jnz       @@cyLoop    pop       edx    pop       height    pop       edi           // pd = Dest.Scan0    pop       esi           // psl = pst    mov       eax, Radius    shl       eax, 1+2    add       eax, esi      // psr = psl + Radius * 2    // blur row@@ryLoop:    push      edx           // width = Dest.Width@@rxLoop:    push      esi    push      ebx    push      eax    mov       ecx, Radius    pxor      xmm0, xmm0    // sum = 0@@rblurLoop:    movd      xmm1, [esi]   // for (i = 0; i < Radius; i ++)    movd      xmm2, [eax]   // {    punpcklbw xmm1, xmm7    punpcklbw xmm2, xmm7    paddw     xmm1, xmm2    //   ps = psl + psr    punpcklwd xmm1, xmm7    cvtdq2ps  xmm1, xmm1    //   pfs (flaot * 4) = ps (int * 4)    mulps     xmm1, [ebx]   //   pfs *= Weights[i]    addps     xmm0, xmm1    //   sum += pfs    add       ebx, 16    add       esi, 4        //   psl ++    sub       eax, 4        //   psr --    loop      @@rblurLoop   // }    movd      xmm1, [esi]    punpcklbw xmm1, xmm7    punpcklwd xmm1, xmm7    cvtdq2ps  xmm1, xmm1    // pfs (flaot * 4) = psl (int * 4)    mulps     xmm1, [ebx]   // pfs *= Weights[Radius]    addps     xmm0, xmm1    // sum += pfs    cvtps2dq  xmm0, xmm0    // ps (int * 4) = sum (flaot * 4)    packssdw  xmm0, xmm7    packuswb  xmm0, xmm7    movd      [edi], xmm0   // pd (byte * 4) = ps (int * 4) pask    pop       eax    pop       ebx    pop       esi    add       eax, 4    add       esi, 4    add       edi, 4    dec       edx    jnz       @@rxLoop    add       eax, srcOffset    add       esi, srcOffset    add       edi, dstOffset    pop       edx    dec       height    jnz       @@ryLoop    pop       ebx    pop       edi    pop       esiend;// --> st x// <-- st e**x = 2**(x*log2(e))function _Expon: Extended;asm    fldl2e              // y = x*log2e    fmul    fld     st(0)       // i = round(y)    frndint    fsub    st(1), st   // f = y - i    fxch    st(1)       // z = 2**f    f2xm1    fld1    fadd    fscale              // result = z * 2**i    fstp    st(1)end;function GetWeights(var Buffer, Weights: Pointer; Q: Single; Radius: Integer): Integer;const  _fcd1: Single = 0.1;  _fc1: Single = 1.0;  _fc2: Single = 2.0;  _fc250: Single = 250.0;  _fc255: Single = 255.0;var  R: Integer;  v, QQ2: double;asm    mov     R, ecx    mov     ecx, eax    fld     Q    fabs    fcom    _fcd1    fstsw   ax    sahf    jae     @@1    fld     _fcd1    fstp    st(1)               // if (Q < 0.1) Q = 0.1    jmp     @@2@@1:    fcom    _fc250    fstsw   ax    sahf    jbe     @@2    fld     _fc250    fstp    st(1)               // if (Q > 250) Q = 250@@2:    fst     Q    fmul    Q    fmul    _fc2    fstp    QQ2                 // QQ2 = 2 * Q * Q    fwait    mov     eax, R    test    eax, eax    jg      @@10    push    eax                 // if (radius <= 0)    fld1                        // {    fadd    Q                   //   radius = Abs(Q) + 1    fistp   [esp].Integer    fwait    pop     eax@@testRadius:                   //   while (TRUE)    mov     R, eax              //   {    fldz                        //     sum = 0@@testLoop:                     //     for (R = radius; R > 0; R ++)    fild    R                   //     {    fld     st(0)    fmulp   st(1), st    fdiv    QQ2    fchs    call    _Expon              //       tmp = Exp(-(R * R) / (2.0 * Q * Q));    cmp     R, eax    jne     @@3    fst     v                   //       if (R == radius) v = tmp@@3:    faddp   st(1), st(0)        //       sum += tmp    dec     R    jnz     @@testLoop          //     }    fmul    _fc2                //     sum *= 2    fadd    _fc1                //     sum += 1    fdivr   v    fmul    _fc255    fistp   R    cmp     R, 0    je      @@4                 //     if ((INT)(v / sum * 255 + 0.5) = 0) break    inc     eax                 //     radius ++    jmp     @@testRadius        //   }@@4:    dec     eax    jnz     @@5    inc     eax@@5:    mov     R, eax              // }@@10:    inc     eax    shl     eax, 4    add     eax, 12    push    edx    push    ecx    mov     edx, eax    mov     eax, GHND    call    GlobalAllocPtr    pop     ecx    pop     edx    test    eax, eax    jz      @@Exit    mov     [ecx], eax          // buffer = GlobalAllocPtr(GHND, (Radius + 1) * 16 + 12)    add     eax, 12    and     eax, -16    mov     [edx], eax          // weights = ((char* )buffer + 12) & 0xfffffff0    mov     ecx, R              // ecx = radius    mov     edx, eax            // edx = weights    fldz                        // for (i = radius, sum = 0; i > 0; i --)@@clacLoop:                     // {    fild    R    fld     st(0)    fmulp   st(1), st    fdiv    QQ2    fchs    call    _Expon    fstp    [edx].Double        //   weights[i] = Expon(-(i * i) / (2 * Q * Q))    fadd    [edx].Double        //   sum += weights[i]    add     edx, 16    dec     R    jnz     @@clacLoop          // }    fmul    _fc2                // sum *= 2    fld1    fstp    [edx].Double        // weights[radius] = 1    fadd    [edx].Double        // sum += weights[radius]    push    ecx    inc     ecx@@divLoop:                      // for (i = 0; i <= Radius; i ++)    fld     st(0)               //   weights[i] = Round(weights[i] / sum)    fdivr   [eax].Double    fst     [eax].Single    fst     [eax+4].Single    fst     [eax+8].Single    fstp    [eax+12].Single    add     eax, 16    loop    @@divLoop    ffree   st(0)    fwait    pop     eax                 // return Radius@@Exit:end;procedure ImageGaussiabBlur(var Data: TImageData; Q: Single; Radius: Integer);var  Buffer, Weights: Pointer;  src: TImageData;begin  Radius := GetWeights(Buffer, Weights, Q, Radius);  if Radius = 0 then Exit;  if Data.AlphaFlag then    ArgbConvertPArgb(Data);  src := _GetExpandData(Data, Radius);  CrossBlur(Data, src, Weights, Radius);  FreeImageData(src);  GlobalFreePtr(Buffer);  if Data.AlphaFlag then    PArgbConvertArgb(Data);end;

For details about the use of GDI + units and descriptions in the "Delphi image processing" series, see the article 《GDI + for VCL basics-GDI + and VCL.

Due to limited levels, errors are inevitable. Correction and guidance are welcome. Email Address:Maozefa@hotmail.com

Here, you can access "Delphi Image Processing-Article Index".

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.