Optimization of DC. getpixel (J, I) Inefficiency
When we want to scan an image 350*350 = 122,500 pixels, it is actually a very small image. The original code requires 3.284 seconds after optimization, only 0.13 seconds, and then only 0.081 seconds after optimization.
When we were using the core I5 i7 airplane era, do comrades think about the feeling of MHz memory 32 m in the arm (tractor era?
Answer:
When DC. getpixel (J, I) is used to scan the graph point by point, we find that the time is 3.284 seconds, which is unacceptable.
Is there a good way to speed up?
The answer is yes. Direct Memory operations.
First, you need to convert the DC to the memory block.
Fortunately, such a function can be used in createdibsection.
Hbitmap m_hdibitmap1 = createdibsection (m_hdibdc1, (bitmapinfo *) & m_hdr, dib_rgb_colors, (void **) & m_bitmap1, null, 0 );
For more information, see msdn.
Bitblt (m_hdibdc1, m_nwidth, m_nheight, m_memdc, srccopy); // It is critical that the image can be transferred to our memory m_bitmap1 from the m_memdc diagram.
// Of course, these two three steps are not counted because they are large memory copies. of course, you can use dword dws = gettickcount (); to check whether it takes only a few milliseconds to convert to memory.
To our specific code segment
For (I = 0; I <m_nheight; I ++)
{
J = 0;
Do
{
While (j <= m_nWidth-1 & m_memdc.getpixel (J, I) = colortransparent)
{
J ++;
}
Ileftx = J;
While (j <= m_nWidth-1 & m_memdc.getpixel (J, I )! = Colortransparent)
{
++ J;
}
Prect-> left = m_ileftx + ileftx;
Prect-> right = m_ileftx + J;
Prect-> Top = m_itopy + I;
Prect-> bottom = m_itopy + I + 1;
Prgndata-> RDH. ncount ++;
Prect ++;
// Determine whether the memory is larger than the applied memory.
If (prgndata-> RDH. ncount> = maxnum)
{
}
} While (j <m_nWidth-1 );
}
// This Code takes more than 3 seconds.
To simplify the code, the code that determines whether it is equal is written as a function.
Bool getcoloryes (byte * byte, int iwidth, int iheight, int X, int y, byte R, byte g, byte B)
{// Whether one of the computing memory points is of the same color
Int npixelsize = 4;
Byte btb2, btg2, btr2;
Btb2 = byte [(iHeight-y-1) * iwidth * npixelsize + x * npixelsize];
Btg2 = byte [(iHeight-y-1) * iwidth * npixelsize + x * npixelsize + 1];
Btr2 = byte [(iHeight-y-1) * iwidth * npixelsize + x * npixelsize + 2];
If (btb2 = B & btg2 = G & btr2 = r)
Return true;
Else
Return false;
}
// Corresponding Assembly
Assembly instructions
. Text: 00013668 cdzqsplash_dlg _ getcoloryes; Code xref: cdzqsplash_dlg _ setrgnsplash + 368 P
. Text: 00013668; cdzqsplash_dlg _ setrgnsplash + 3ccp
. Text: 00013668; Data xref :...
. Text: 00013668
. Text: 00013668 arg_0 = 0 // parameter 1
. Text: 00013668 arg_4 = 4 // parameter 2
. Text: 00013668 arg_8 = 8 // parameter 3
. Text: 00013668 arg_c = 0xc // parameter 4
. Text: 00013668 arg_10 = 0x10 // parameter 5
. Text: 00013668
01. Text: 00013668 running fd sp !, {R4, LR} // store the R4-LR in the register list into the stack
02. Text: 0001366c LDR r0, [Sp, #8 + arg_4] // read parameter 2
03. Text: 00013670 ldr lr, [Sp, #8 + arg_0] // read parameter 1
04. Text: 00013674 ldrb R4, [Sp, #8 + arg_10] // read parameter 5
05. Text: 00013678 sub R3, R3, R0 // Subtraction
06. Text: 0001367c sub R3, R3, #1 // subtract 1
07. Text: 00013680 MLA R2, R3, R2, LR
08. Text: 00013684 ldrb R3, [R1, R2, LSL #2]!
09. Text: 00013688 CMP R3, r4
10. Text: 0001368c BNE loc_136b0
11. Text: 00013690 ldrb R2, [R1, #1]
12. Text: 00013694 ldrb R3, [Sp, #8 + arg_c]
13. Text: 00013698 CMP R2, R3
14. Text: 0001369c ldreqb R2, [R1, #2]
15. Text: 000136a0 ldreqb R3, [Sp, #8 + arg_8]
16. Text: 000136a4 cmpeq R2, R3
17. Text: 000136a8 moveq r0, #1
18. Text: 000136ac ldmeqfd SP !, {R4, PC}
19. Text: 000136b0
20. Text: 000136b0 loc_136b0; Code xref: cdzqsplash_dlg _ getcoloryes + 24j
21. Text: 000136b0 mov r0, #0
22. Text: 000136b4 ldmfd SP !, {R4, PC}
. Text: 000136b4; end of function cdzqsplash_dlg _ getcoloryes
// We can see that the CMP is executed three times, and the command line is still very long. The effect is much better than using getpixel directly, which is nearly 300 times faster. Don't underestimate these 300 times. When that big image processing only needs to run the program for one day, it will take 300 days for unoptimized code? A little scary...
For (I = 0; I <m_nheight; I ++)
{
J = 0;
Do
{
While (j <= m_nWidth-1 & getcoloryes (m_bitmap1, m_nwidth, m_nheight, J, I, btr1, btg1, btb1 ))
{
J ++;
}
Ileftx = J;
While (j <= m_nWidth-1 &&! Getcoloryes (m_bitmap1, m_nwidth, m_nheight, J, I, btr1, btg1, btb1 ))
{
++ J;
}
Prect-> left = m_ileftx + ileftx;
Prect-> right = m_ileftx + J;
Prect-> Top = m_itopy + I;
Prect-> bottom = m_itopy + I + 1;
Prgndata-> RDH. ncount ++;
Prect ++;
// Determine whether the memory is larger than the applied memory.
If (prgndata-> RDH. ncount> = maxnum)
{
}
} While (j <m_nWidth-1 );
}
// This Code takes 0.13 seconds.
Is there any possibility of optimization? Looking back, because the color mask of the image is just B, it only takes one CMP to jump, but what if it is R? That's a really sad thing...
The data in the memory is bgrp.32 bits corresponding to an int value. Another optimization is possible.
That is to construct an int type.
Unsigned int * irgb; // unsigned int
Prgb [0] = btb1;
Prgb [1] = btg1;
Prgb [2] = btr1;
Prgb [3] = 0;
Irgb = (unsigned int *) prgb; // forced conversion
Bool getcoloryesint (byte * byte, int iwidth, int iheight, int X, int y, unsigned int * irgb)
{// Whether one of the computing memory points is of the same color
Int npixelsize = 4;
Unsigned int * ibtrgb; // unsigned int
Ibtrgb = (unsigned int *) (byte + (iHeight-y-1) * iwidth * npixelsize + x * npixelsize ));
If (ibtrgb [0] = irgb [0])
Return true;
Else
Return false;
}
Corresponding assembly instruction
Assembly instructions
. Text: 00013668 cdzqsplash_dlg _ getcoloryesint; Data xref:. pdata: 00039338o
. Text: 00013668
. Text: 00013668 arg_0 = 0
. Text: 00013668 arg_4 = 4
. Text: 00013668 arg_8 = 8
. Text: 00013668
01. Text: 00013668 running fd sp !, {R4, LR}
02. Text: 0001366c LDR r0, [Sp, #8 + arg_4]
03. Text: 00013670 ldr lr, [Sp, #8 + arg_0]
04. Text: 00013674 LDR R4, [Sp, #8 + arg_8]
05. Text: 00013678 sub R3, R3, R0
06. Text: 0001367c sub R3, R3, #1
07. Text: 00013680 MLA R2, R3, R2, LR
08. Text: 00013684 LDR r0, [R4]
09. Text: 00013688 LDR R3, [R1, R2, LSL #2]
10. Text: 0001368c CMP R3, R0
11. Text: 00013690 moveq r0, #1
12. Text: 00013694 movne r0, #0
13. texts: 00013698 ldmfd SP !, {R4, PC}
. Text: 00013698; end of function cdzqsplash_dlg _ getcoloryesint
From the above Assembly, we can see that the pipeline is reduced to 13 lines of commands, and only one comparison is performed.
Unsigned int * irgb; // unsigned int
Prgb [0] = btb1;
Prgb [1] = btg1;
Prgb [2] = btr1;
Prgb [3] = 0;
Irgb = (unsigned int *) prgb; // forced conversion
For (I = 0; I <m_nheight; I ++)
{
J = 0;
Do
{
While (j <= m_nWidth-1 & getcoloryesint (m_bitmap1, m_nwidth, m_nheight, J, I, irgb ))
{
J ++;
}
Ileftx = J;
While (j <= m_nWidth-1 &&! Getcoloryesint (m_bitmap1, m_nwidth, m_nheight, J, I, irgb ))
{
++ J;
}
Prect-> left = m_ileftx + ileftx;
Prect-> right = m_ileftx + J;
Prect-> Top = m_itopy + I;
Prect-> bottom = m_itopy + I + 1;
Prgndata-> RDH. ncount ++;
Prect ++;
// Determine whether the memory is larger than the applied memory.
If (prgndata-> RDH. ncount> = maxnum)
{
}
} While (j <m_nWidth-1 );
}
The final curative effect is really good... the time is only 0.08 seconds.
Learning to view assembly is a good way to optimize code ..
Try to empty a picture to see it...
Source image
Last
In addition, the editor of csdn is weak. The code segment does not show good garbage, so it has to be stuck... I am not interested in writing any text...