Getpixel is optimized to improve the speed by 300 times.

Source: Internet
Author: User

Optimization of DC. getpixel (J, I) Inefficiency

When we want to scan an image 350*350 = 122,500 pixels, it is actually a very small image. The original code requires 3.284 seconds after optimization, only 0.13 seconds, and then only 0.081 seconds after optimization.
When we were using the core I5 i7 airplane era, do comrades think about the feeling of MHz memory 32 m in the arm (tractor era?

Answer:
When DC. getpixel (J, I) is used to scan the graph point by point, we find that the time is 3.284 seconds, which is unacceptable.
Is there a good way to speed up?
The answer is yes. Direct Memory operations.
First, you need to convert the DC to the memory block.
Fortunately, such a function can be used in createdibsection.

Hbitmap m_hdibitmap1 = createdibsection (m_hdibdc1, (bitmapinfo *) & m_hdr, dib_rgb_colors, (void **) & m_bitmap1, null, 0 );
For more information, see msdn.

Bitblt (m_hdibdc1, m_nwidth, m_nheight, m_memdc, srccopy); // It is critical that the image can be transferred to our memory m_bitmap1 from the m_memdc diagram.
// Of course, these two three steps are not counted because they are large memory copies. of course, you can use dword dws = gettickcount (); to check whether it takes only a few milliseconds to convert to memory.

To our specific code segment

For (I = 0; I <m_nheight; I ++)
{
J = 0;
Do
{
While (j <= m_nWidth-1 & m_memdc.getpixel (J, I) = colortransparent)
{
J ++;
}
Ileftx = J;
While (j <= m_nWidth-1 & m_memdc.getpixel (J, I )! = Colortransparent)
{
++ J;
}


Prect-> left = m_ileftx + ileftx;
Prect-> right = m_ileftx + J;
Prect-> Top = m_itopy + I;
Prect-> bottom = m_itopy + I + 1;
Prgndata-> RDH. ncount ++;

Prect ++;
// Determine whether the memory is larger than the applied memory.
If (prgndata-> RDH. ncount> = maxnum)
{

}
} While (j <m_nWidth-1 );
}
// This Code takes more than 3 seconds.

To simplify the code, the code that determines whether it is equal is written as a function.
Bool getcoloryes (byte * byte, int iwidth, int iheight, int X, int y, byte R, byte g, byte B)
{// Whether one of the computing memory points is of the same color
Int npixelsize = 4;
Byte btb2, btg2, btr2;
 
Btb2 = byte [(iHeight-y-1) * iwidth * npixelsize + x * npixelsize];
Btg2 = byte [(iHeight-y-1) * iwidth * npixelsize + x * npixelsize + 1];
Btr2 = byte [(iHeight-y-1) * iwidth * npixelsize + x * npixelsize + 2];
 
If (btb2 = B & btg2 = G & btr2 = r)
Return true;
Else
Return false;
}

// Corresponding Assembly
Assembly instructions
. Text: 00013668 cdzqsplash_dlg _ getcoloryes; Code xref: cdzqsplash_dlg _ setrgnsplash + 368 P
. Text: 00013668; cdzqsplash_dlg _ setrgnsplash + 3ccp
. Text: 00013668; Data xref :...
. Text: 00013668
. Text: 00013668 arg_0 = 0 // parameter 1
. Text: 00013668 arg_4 = 4 // parameter 2
. Text: 00013668 arg_8 = 8 // parameter 3
. Text: 00013668 arg_c = 0xc // parameter 4
. Text: 00013668 arg_10 = 0x10 // parameter 5
. Text: 00013668
01. Text: 00013668 running fd sp !, {R4, LR} // store the R4-LR in the register list into the stack
02. Text: 0001366c LDR r0, [Sp, #8 + arg_4] // read parameter 2
03. Text: 00013670 ldr lr, [Sp, #8 + arg_0] // read parameter 1
04. Text: 00013674 ldrb R4, [Sp, #8 + arg_10] // read parameter 5
05. Text: 00013678 sub R3, R3, R0 // Subtraction
06. Text: 0001367c sub R3, R3, #1 // subtract 1
07. Text: 00013680 MLA R2, R3, R2, LR
08. Text: 00013684 ldrb R3, [R1, R2, LSL #2]!
09. Text: 00013688 CMP R3, r4
10. Text: 0001368c BNE loc_136b0
11. Text: 00013690 ldrb R2, [R1, #1]
12. Text: 00013694 ldrb R3, [Sp, #8 + arg_c]
13. Text: 00013698 CMP R2, R3
14. Text: 0001369c ldreqb R2, [R1, #2]
15. Text: 000136a0 ldreqb R3, [Sp, #8 + arg_8]
16. Text: 000136a4 cmpeq R2, R3
17. Text: 000136a8 moveq r0, #1
18. Text: 000136ac ldmeqfd SP !, {R4, PC}
19. Text: 000136b0
20. Text: 000136b0 loc_136b0; Code xref: cdzqsplash_dlg _ getcoloryes + 24j
21. Text: 000136b0 mov r0, #0
22. Text: 000136b4 ldmfd SP !, {R4, PC}
. Text: 000136b4; end of function cdzqsplash_dlg _ getcoloryes

// We can see that the CMP is executed three times, and the command line is still very long. The effect is much better than using getpixel directly, which is nearly 300 times faster. Don't underestimate these 300 times. When that big image processing only needs to run the program for one day, it will take 300 days for unoptimized code? A little scary...

For (I = 0; I <m_nheight; I ++)
{
J = 0;
Do
{
While (j <= m_nWidth-1 & getcoloryes (m_bitmap1, m_nwidth, m_nheight, J, I, btr1, btg1, btb1 ))
{
J ++;
}
Ileftx = J;
While (j <= m_nWidth-1 &&! Getcoloryes (m_bitmap1, m_nwidth, m_nheight, J, I, btr1, btg1, btb1 ))
{
++ J;
}


Prect-> left = m_ileftx + ileftx;
Prect-> right = m_ileftx + J;
Prect-> Top = m_itopy + I;
Prect-> bottom = m_itopy + I + 1;
Prgndata-> RDH. ncount ++;

Prect ++;
// Determine whether the memory is larger than the applied memory.
If (prgndata-> RDH. ncount> = maxnum)
{

}
} While (j <m_nWidth-1 );
}
// This Code takes 0.13 seconds.

Is there any possibility of optimization? Looking back, because the color mask of the image is just B, it only takes one CMP to jump, but what if it is R? That's a really sad thing...
The data in the memory is bgrp.32 bits corresponding to an int value. Another optimization is possible.
That is to construct an int type.

Unsigned int * irgb; // unsigned int
Prgb [0] = btb1;
Prgb [1] = btg1;
Prgb [2] = btr1;
Prgb [3] = 0;
Irgb = (unsigned int *) prgb; // forced conversion

Bool getcoloryesint (byte * byte, int iwidth, int iheight, int X, int y, unsigned int * irgb)
{// Whether one of the computing memory points is of the same color
Int npixelsize = 4;
Unsigned int * ibtrgb; // unsigned int

Ibtrgb = (unsigned int *) (byte + (iHeight-y-1) * iwidth * npixelsize + x * npixelsize ));
If (ibtrgb [0] = irgb [0])
Return true;
Else
Return false;
}
Corresponding assembly instruction
Assembly instructions
. Text: 00013668 cdzqsplash_dlg _ getcoloryesint; Data xref:. pdata: 00039338o
. Text: 00013668
. Text: 00013668 arg_0 = 0
. Text: 00013668 arg_4 = 4
. Text: 00013668 arg_8 = 8
. Text: 00013668
01. Text: 00013668 running fd sp !, {R4, LR}
02. Text: 0001366c LDR r0, [Sp, #8 + arg_4]
03. Text: 00013670 ldr lr, [Sp, #8 + arg_0]
04. Text: 00013674 LDR R4, [Sp, #8 + arg_8]
05. Text: 00013678 sub R3, R3, R0
06. Text: 0001367c sub R3, R3, #1
07. Text: 00013680 MLA R2, R3, R2, LR
08. Text: 00013684 LDR r0, [R4]
09. Text: 00013688 LDR R3, [R1, R2, LSL #2]
10. Text: 0001368c CMP R3, R0
11. Text: 00013690 moveq r0, #1
12. Text: 00013694 movne r0, #0
13. texts: 00013698 ldmfd SP !, {R4, PC}
. Text: 00013698; end of function cdzqsplash_dlg _ getcoloryesint

From the above Assembly, we can see that the pipeline is reduced to 13 lines of commands, and only one comparison is performed.
Unsigned int * irgb; // unsigned int
Prgb [0] = btb1;
Prgb [1] = btg1;
Prgb [2] = btr1;
Prgb [3] = 0;
Irgb = (unsigned int *) prgb; // forced conversion

For (I = 0; I <m_nheight; I ++)
{
J = 0;
Do
{
While (j <= m_nWidth-1 & getcoloryesint (m_bitmap1, m_nwidth, m_nheight, J, I, irgb ))
{
J ++;
}
Ileftx = J;
While (j <= m_nWidth-1 &&! Getcoloryesint (m_bitmap1, m_nwidth, m_nheight, J, I, irgb ))
{
++ J;
}


Prect-> left = m_ileftx + ileftx;
Prect-> right = m_ileftx + J;
Prect-> Top = m_itopy + I;
Prect-> bottom = m_itopy + I + 1;
Prgndata-> RDH. ncount ++;

Prect ++;
// Determine whether the memory is larger than the applied memory.
If (prgndata-> RDH. ncount> = maxnum)
{

}
} While (j <m_nWidth-1 );
}

The final curative effect is really good... the time is only 0.08 seconds.

Learning to view assembly is a good way to optimize code ..

Try to empty a picture to see it...

 

Source image

Last

 

 

In addition, the editor of csdn is weak. The code segment does not show good garbage, so it has to be stuck... I am not interested in writing any text...

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.