Http://www.cnblogs.com/xzbrillia/archive/2012/07/22/2603638.html
After testing dozens of images, it is concluded that C # uses TPL (task parallel Library) 2-10 times faster than C ++.
Release vs2012 RC
By the way, you need a dx11 video card. If you don't have one, it means software simulation, which is dozens of times slower than GPU.
From the test, we can see that 10 million pixels are almost the same,
This is not my computer, or the video card. How can this problem be solved?
It's strange to go to the company computer on Monday.
By the way, the testing speed is slower than that of WPF in the past. The main difference is that the Memory Locking method is different,
Test the speed of WPF.
I,Code
1. C # TPL
1:Private Static UnsafeImage graybyparallelforeach (image)
2:{
3:VaR BMP = (Bitmap) image;
4:
5:IntHeight = BMP. height;
6:IntWidth = BMP. width;
7:
8:VaR DATA = BMP. lockbits (NewRectangle (0, 0, width, height), imagelockmode. readwrite, pixelformat. format32bppargb );
9:VaR startptr = (pixelcolor *) data. scan0.topointer ();
10:Parallelforeach (startptr, width, height );
11:BMP. unlockbits (data );
12:
13:ReturnBMP;
14:}
15:
16:Private Static Unsafe VoidParallelforeach (pixelcolor * startptr,IntWidth,IntHeight)
17:{
18:Parallel. foreach (partitioner. Create (0, height), (H) =>
19:{
20:VaR PTR = startptr + H. Item1 * width;
21:
22:For(IntY = H. Item1; y <H. item2; y ++)
23:{
24:For(IntX = 0; x <width; X ++)
25:{
26:VaR c = * PTR;
27:VaR gray = (C. Red * 38 + C. Green * 75 + C. Blue * 15)> 7 );
28:(* PTR). Green = (* PTR). Red = (* PTR). Blue = (Byte) Gray;
29:
30:PTR ++;
31:}
32:}
33:});
34:}
1:[Structlayout (layoutkind. Sequential)]
2:Public StructPixelcolor
3:{
4:Public ByteBlue;
5:Public ByteGreen;
6:Public ByteRed;
7:Public ByteAlpha;
8:
9:}
It mainly utilizes Microsoft's TPL Parallel library and pointer operations, as well as a color structure pointer type conversion.
2. c ++ 11 amp code
1: Extern "C"_ Declspec (dllexport)Void_ Stdcall gray_image (Unsigned Int* Image,IntHeight,IntWidth)
2:{
3:Concurrency: extent <2> image_extent (height, width );
4:
5:/* Texture of four 8-bit integers */
6:Array_view <Unsigned Int, 2> image_av (image_extent, image );
7:
8:Parallel_for_each (image_av.extent,
9:[=] (Index <2> idx) restrict (AMP)
10:{
11:Unsigned IntColor = image_av [idx];
12:Unsigned IntA = (color> 24) & 0xff;
13:Unsigned IntR = (color> 16) & 0xff;
14:Unsigned IntG = (color> 8) & 0xff;
15:Unsigned IntB = (color) & 0xff;
16:
17:AutoGray = (R * 38 + G * 75 + B * 15)> 7 );
18:
19:Image_av [idx] = A <24 | gray <16 | gray <8 | gray;
20:
21:});
22:
23:// Copy data from GPU to cpu
24:Image_av.synchronize ();
25:
26:}
It seems that byte cannot be used in the amp, so it can only be converted using Int.
For comparison, I tested the speed and speed of C ++ common code, so it should not be a call problem, but a problem with the performance of the AMP itself or the performance of the graphics card.
1 Int Size = width * height;
2 Pixelcolor * PTR = (pixelcolor *) image;
3
4 For ( Int I = 0 ; I <size; I ++)
5 {
6 Auto c = * PTR;
7 Auto gray = (C. Red * 38 + C. Green * 75 + C. Blue * 15 )> 7 );
8 (* PTR). Green = (* PTR). Red = (* PTR). Blue = ( Byte ) Gray;
9
10 PTR ++;
11 }
Interface 2
Three source codes
Http://files.cnblogs.com/xzbrillia/AMP_ImageGray.rar
After testing dozens of images, it is concluded that C # uses TPL (task parallel Library) 2-10 times faster than C ++.
Release vs2012 RC
By the way, you need a dx11 video card. If you don't have one, it means software simulation, which is dozens of times slower than GPU.
From the test, we can see that 10 million pixels are almost the same,
This is not my computer, or the video card. How can this problem be solved?
It's strange to go to the company computer on Monday.
By the way, the testing speed is slower than that of WPF in the past. The main difference is that the Memory Locking method is different,
Test the speed of WPF.
I. Code
1. C # TPL
1:Private Static UnsafeImage graybyparallelforeach (image)
2:{
3:VaR BMP = (Bitmap) image;
4:
5:IntHeight = BMP. height;
6:IntWidth = BMP. width;
7:
8:VaR DATA = BMP. lockbits (NewRectangle (0, 0, width, height), imagelockmode. readwrite, pixelformat. format32bppargb );
9:VaR startptr = (pixelcolor *) data. scan0.topointer ();
10:Parallelforeach (startptr, width, height );
11:BMP. unlockbits (data );
12:
13:ReturnBMP;
14:}
15:
16:Private Static Unsafe VoidParallelforeach (pixelcolor * startptr,IntWidth,IntHeight)
17:{
18:Parallel. foreach (partitioner. Create (0, height), (H) =>
19:{
20:VaR PTR = startptr + H. Item1 * width;
21:
22:For(IntY = H. Item1; y <H. item2; y ++)
23:{
24:For(IntX = 0; x <width; X ++)
25:{
26:VaR c = * PTR;
27:VaR gray = (C. Red * 38 + C. Green * 75 + C. Blue * 15)> 7 );
28:(* PTR). Green = (* PTR). Red = (* PTR). Blue = (Byte) Gray;
29:
30:PTR ++;
31:}
32:}
33:});
34:}
1:[Structlayout (layoutkind. Sequential)]
2:Public StructPixelcolor
3:{
4:Public ByteBlue;
5:Public ByteGreen;
6:Public ByteRed;
7:Public ByteAlpha;
8:
9:}
It mainly utilizes Microsoft's TPL Parallel library and pointer operations, as well as a color structure pointer type conversion.
2. c ++ 11 amp code
1: Extern "C"_ Declspec (dllexport)Void_ Stdcall gray_image (Unsigned Int* Image,IntHeight,IntWidth)
2:{
3:Concurrency: extent <2> image_extent (height, width );
4:
5:/* Texture of four 8-bit integers */
6:Array_view <Unsigned Int, 2> image_av (image_extent, image );
7:
8:Parallel_for_each (image_av.extent,
9:[=] (Index <2> idx) restrict (AMP)
10:{
11:Unsigned IntColor = image_av [idx];
12:Unsigned IntA = (color> 24) & 0xff;
13:Unsigned IntR = (color> 16) & 0xff;
14:Unsigned IntG = (color> 8) & 0xff;
15:Unsigned IntB = (color) & 0xff;
16:
17:AutoGray = (R * 38 + G * 75 + B * 15)> 7 );
18:
19:Image_av [idx] = A <24 | gray <16 | gray <8 | gray;
20:
21:});
22:
23:// Copy data from GPU to cpu
24:Image_av.synchronize ();
25:
26:}
It seems that byte cannot be used in the amp, so it can only be converted using Int.
For comparison, I tested the speed and speed of C ++ common code, so it should not be a call problem, but a problem with the performance of the AMP itself or the performance of the graphics card.
1 Int Size = width * height;
2 Pixelcolor * PTR = (pixelcolor *) image;
3
4 For ( Int I = 0 ; I <size; I ++)
5 {
6 Auto c = * PTR;
7 Auto gray = (C. Red * 38 + C. Green * 75 + C. Blue * 15 )> 7 );
8 (* PTR). Green = (* PTR). Red = (* PTR). Blue = ( Byte ) Gray;
9
10 PTR ++;
11 }
Interface 2
Three source codes
http://files.cnblogs.com/xzbrillia/AMP_ImageGray.rar