2012 byoy conmajia@gmail.com
Preface:
This article tests the bitmap operation (get/setpixel) in GDI +, rather than seeking the fastest Bitmap Processing Method. If you want to increase the speed, use technologies other than GDI +, such as parallel computing, MMX/SSE command calling, and Cuda.
This is an old technique:
Getpixel and setpixel are often used when using the bitmap class, but these two methods are relatively slow, so lockbits/unlockbits are usually used to lock the bitmap in the memory to speed up the operation.
The standard reference on msdn is as follows:
Msdn example: copy after locking the memory
1 private void lockunlockbitsexample (painteventargs E) 2 {3 4 // create a new bitmap. create bitmap 5 bitmap BMP = new Bitmap ("C: \ fakephoto.jpg"); 6 7 // lock the bitmap's bits. lock bitmap 8 rectangle rect = new rectangle (0, 0, BMP. width, BMP. height); 9 system. drawing. imaging. bitmapdata bmp data = 10 BMP. lockbits (rect, system. drawing. imaging. imagelockmode. readwrite, 11 BMP. pixelformat); 12 13 // get the address of the first line. obtain the first line address 14 intptr = BMP data. scan0; 15 16 // declare an array to hold the bytes of the bitmap. define the array to save the bitmap 17 int bytes = math. ABS (BMP data. stride) * BMP. height; 18 byte [] rgbvalues = new byte [bytes]; 19 20 // copy the RGB values into the array. copy the RGB value to array 21 system. runtime. interopservices. marshal. copy (PTR, rgbvalues, 0, bytes); 22 23 // set every third value to 255. A 24bpp bitmap will look red. the bitmap with 3rd values per pixel set to ipv24bpp will become red 24 for (INT counter = 2; counter <rgbvalues. length; counter + = 3) 25 rgbvalues [Counter] = 255; 26 27 // copy the RGB values back to the bitmap to copy the RGB values back to the bitmap 28 system. runtime. interopservices. marshal. copy (rgbvalues, 0, PTR, bytes); 29 30 // unlock the bits. unlock 31 BMP. unlockbits (BMP data); 32 33 // draw the modified image. draw the updated bitmap 34 E. graphics. drawimage (BMP, 0,150); 35}
The pointer method is faster, so the actual code you see is generally similar to this:
1 unsafe public Color GetPixel(int x, int y) 2 { 3 if (this.bmpData.PixelFormat == PixelFormat.Format32bppArgb) 4 { 5 byte* numPtr = (byte*) ((((void*) this.bmpData.Scan0) + (y * this.bmpData.Stride)) + (x * 4)); 6 return Color.FromArgb(numPtr[3], numPtr[2], numPtr[1], numPtr[0]); 7 } 8 if (this.bmpData.PixelFormat == PixelFormat.Format24bppRgb) 9 {10 byte* numPtr2 = (byte*) ((((void*) this.bmpData.Scan0) + (y * this.bmpData.Stride)) + (x * 3));11 return Color.FromArgb(numPtr2[2], numPtr2[1], numPtr2[0]);12 }13 return Color.Empty;14 }
Because I am relatively idle, I am wondering how fast is the acceleration?
To this end, I slightly adjusted the bitmapex class I used previously (remember whether it should be facial recognition or code), changed it to fastbitmap, and then created a test program, A series of test cases were collected. (Click the left-side frame to open the image file. No exception is reported)
The configuration of the testing machine is as follows:
The test cases are as follows:
To ensure that the file format is not affected, use the BMP format of 24bpp. (Thanks to the development of science and technology, the memory cabbage price, otherwise a single file will be nearly MB, which will really bother me .)
The test is divided into getpixel and setpixel to separate read and write. The test code (using getpixel as an example) is very simple. it traverses each pixel in the bitmap, as shown below:
1 for (int y = 0; y < h; y++)2 {3 for (int x = 0; x < w; x++)4 {5 tmp = bmp.GetPixel(x, y);6 }7 }
BMP is bitmap and fastbitmap, respectively.
In order to focus on the comparison results, although pixel-by-pixel image traversal is very time-consuming, parallel computing is not deliberately used and completed using a single CPU core. So if you want to use this program to test extremely large images (10000 × 10000 orders of magnitude or above), please be careful.
Finally, a test record is obtained:
From the test results, the average efficiency of improvement is 90% ~ 95%, that is, the performance has been improved by 10 ~ 20 times.
Although this result is not very fast, I think it is basically the limit of GDI + (the rest is the improvement of machine performance, you can try technologies such as parallel computing, C ++ native, direct call of MMX/SSE commands, and Cuda.
I don't know how many times bitmap is used in the current technological development. I just think bitmap can be a good choice when we pursue a balance between development efficiency and performance. (It is easier than racking your brains to write Win32 ASM)
Test Program: Click to download
Postscript
Some friends pointed out:
The lockbits of GDI + temporarily read image data to the memory, which is not suitable for professional image processing software. If it is professional, the format of an image loaded in the memory should be fixed, in this way, the algorithm directly accesses the data in the memory. Functions such as getpixel do not exist for professional image processing, but are convenient to process small batches of data such as screen color or DC color.
To speed up, image processing algorithms are first written in common languages and optimized to the core of the algorithm. If the speed is not enough, consider further optimization using the Assembly. The simpler the algorithm, the higher the speed that can be improved by using assembly optimization, for example, the simplest color inversion algorithm. For images of 3000*4000*24, the processing time of a general language is about Ms, 20 ms is enough for assembly. However, for complex algorithms, it is not obvious that assembly can be upgraded.
Although this article does not aim at speed, it is just a "test" speed. I tried to optimize the code and used the "Reversed" operation as an example to test it.
Test Case selection #7 (4096x4096 @ 24bpp), 299 ms, as shown below:
Then the algorithm was improved again, and the speed was improved by 17% ms (about ).
This result is much faster than the general operation method, but it is a little more (Laugh) than the general language about Ms ). Compared with the compiled "20 ms" (without experimental data), the difference is far greater.
It seems interesting to optimize the code. I will continue to try to optimize it. By adjusting the call structure, improving the algorithm, and using multi-thread parallel computing, we finally entered 50 ms.
It is still based on lockbits/unlockbits of the bitmap class.
Language: C #, C # pointer
Testing Machine: I3 380 m @ 2.53 GHz, 2.92g DDR3-1333, Windows 7 32-bit
Speed: about 50 ms
Comparison of testing results by netizens
The following is the test result data provided by some enthusiastic netizens for their reference.
In the test project, bitmap images of the 4096x4096x24bpp type are reversed.
Test 1:
Imagewizard(Author: laviewpbt)
Implementation: Assembly + VB. NET
Configuration: I3 380 m @ 2.53 GHz, 2.92g DDR3-1333, Windows 7 32-bit
Time: 25 ms
Test 2:
Temporary Test(Author: Lan zhengpeng)
Implementation: VC ++. Net calls the SSE command
Configuration: i7 860@2.93GHz, 12g pc1333 memory, Windows 7 64-bit
Time: 12 ~ 19 ms
Test 3:
Gebimage(Author: xiaotie)
Implementation: C # rewrite all image libraries and unsafe pointers
Configuration: Better than Test 1
Time used: 33 Ms
Test 4:
This article(Author: byoy)
Implementation: GDI +, unsafe pointer
Configuration: Same as test 1
Time: 46 Ms
(End)
2012 byoy conmajia@gmail.com