Amp learning 2: the gray scale of the image. Is C # Faster than C ++ ?)

Source: Internet
Author: User

Http://www.cnblogs.com/xzbrillia/archive/2012/07/22/2603638.html

After testing dozens of images, it is concluded that C # uses TPL (task parallel Library) 2-10 times faster than C ++.

Release vs2012 RC

By the way, you need a dx11 video card. If you don't have one, it means software simulation, which is dozens of times slower than GPU.

 

From the test, we can see that 10 million pixels are almost the same,

 

This is not my computer, or the video card. How can this problem be solved?

It's strange to go to the company computer on Monday.

 

By the way, the testing speed is slower than that of WPF in the past. The main difference is that the Memory Locking method is different,

Test the speed of WPF.

I,Code

1. C # TPL

 

 

1:Private Static UnsafeImage graybyparallelforeach (image)
 
2:{
 
3:VaR BMP = (Bitmap) image;
 
4: 
 
5:IntHeight = BMP. height;
 
6:IntWidth = BMP. width;
 
7: 
 
8:VaR DATA = BMP. lockbits (NewRectangle (0, 0, width, height), imagelockmode. readwrite, pixelformat. format32bppargb );
9:VaR startptr = (pixelcolor *) data. scan0.topointer ();
 
10:Parallelforeach (startptr, width, height );
 
11:BMP. unlockbits (data );
 
12: 
 
13:ReturnBMP;
 
14:}
 
15: 
 
16:Private Static Unsafe VoidParallelforeach (pixelcolor * startptr,IntWidth,IntHeight)
 
17:{
18:Parallel. foreach (partitioner. Create (0, height), (H) =>
 
19:{
 
20:VaR PTR = startptr + H. Item1 * width;
 
21: 
 
22:For(IntY = H. Item1; y <H. item2; y ++)
 
23:{
 
24:For(IntX = 0; x <width; X ++)
 
25:{
 
26:VaR c = * PTR;
27:VaR gray = (C. Red * 38 + C. Green * 75 + C. Blue * 15)> 7 );
 
28:(* PTR). Green = (* PTR). Red = (* PTR). Blue = (Byte) Gray;
 
29: 
 
30:PTR ++;
 
31:}
 
32:}
 
33:});
 
34:}
 
 
 
 
 
1:[Structlayout (layoutkind. Sequential)]
2:Public StructPixelcolor
 
3:{
 
4:Public ByteBlue;
 
5:Public ByteGreen;
 
6:Public ByteRed;
 
7:Public ByteAlpha;
 
8: 
 
9:}

 

It mainly utilizes Microsoft's TPL Parallel library and pointer operations, as well as a color structure pointer type conversion.

 

2. c ++ 11 amp code

 
1: Extern "C"_ Declspec (dllexport)Void_ Stdcall gray_image (Unsigned Int* Image,IntHeight,IntWidth)
 
2:{
 
3:Concurrency: extent <2> image_extent (height, width );
 
4:
 
5:/* Texture of four 8-bit integers */ 
6:Array_view <Unsigned Int, 2> image_av (image_extent, image );
 
7:
 
8:Parallel_for_each (image_av.extent,
 
9:[=] (Index <2> idx) restrict (AMP)
 
10:{
 
11:Unsigned IntColor = image_av [idx];
12:Unsigned IntA = (color> 24) & 0xff;
 
13:Unsigned IntR = (color> 16) & 0xff;
 
14:Unsigned IntG = (color> 8) & 0xff;
 
15:Unsigned IntB = (color) & 0xff;
 
16:
17:AutoGray = (R * 38 + G * 75 + B * 15)> 7 );
 
18:
 
19:Image_av [idx] = A <24 | gray <16 | gray <8 | gray;
 
20:
 
21:});
 
22:
 
23:// Copy data from GPU to cpu
 
24:Image_av.synchronize ();
25:
 
26:}

 

It seems that byte cannot be used in the amp, so it can only be converted using Int.

 

For comparison, I tested the speed and speed of C ++ common code, so it should not be a call problem, but a problem with the performance of the AMP itself or the performance of the graphics card.

 

1 Int Size = width * height;
2 Pixelcolor * PTR = (pixelcolor *) image;
3
4 For ( Int I = 0 ; I <size; I ++)
5 {
6 Auto c = * PTR;
7 Auto gray = (C. Red * 38 + C. Green * 75 + C. Blue * 15 )> 7 );
8 (* PTR). Green = (* PTR). Red = (* PTR). Blue = ( Byte ) Gray;
9
10 PTR ++;
11 }

 

 
 
Interface 2

 

Three source codes

Http://files.cnblogs.com/xzbrillia/AMP_ImageGray.rar

After testing dozens of images, it is concluded that C # uses TPL (task parallel Library) 2-10 times faster than C ++.

Release vs2012 RC

By the way, you need a dx11 video card. If you don't have one, it means software simulation, which is dozens of times slower than GPU.

 

From the test, we can see that 10 million pixels are almost the same,

 

This is not my computer, or the video card. How can this problem be solved?

It's strange to go to the company computer on Monday.

 

By the way, the testing speed is slower than that of WPF in the past. The main difference is that the Memory Locking method is different,

Test the speed of WPF.

I. Code

1. C # TPL

 

 

1:Private Static UnsafeImage graybyparallelforeach (image)
 
2:{
 
3:VaR BMP = (Bitmap) image;
 
4: 
 
5:IntHeight = BMP. height;
 
6:IntWidth = BMP. width;
 
7: 
 
8:VaR DATA = BMP. lockbits (NewRectangle (0, 0, width, height), imagelockmode. readwrite, pixelformat. format32bppargb );
9:VaR startptr = (pixelcolor *) data. scan0.topointer ();
 
10:Parallelforeach (startptr, width, height );
 
11:BMP. unlockbits (data );
 
12: 
 
13:ReturnBMP;
 
14:}
 
15: 
 
16:Private Static Unsafe VoidParallelforeach (pixelcolor * startptr,IntWidth,IntHeight)
 
17:{
18:Parallel. foreach (partitioner. Create (0, height), (H) =>
 
19:{
 
20:VaR PTR = startptr + H. Item1 * width;
 
21: 
 
22:For(IntY = H. Item1; y <H. item2; y ++)
 
23:{
 
24:For(IntX = 0; x <width; X ++)
 
25:{
 
26:VaR c = * PTR;
27:VaR gray = (C. Red * 38 + C. Green * 75 + C. Blue * 15)> 7 );
 
28:(* PTR). Green = (* PTR). Red = (* PTR). Blue = (Byte) Gray;
 
29: 
 
30:PTR ++;
 
31:}
 
32:}
 
33:});
 
34:}
 
 
 
 
 
1:[Structlayout (layoutkind. Sequential)]
2:Public StructPixelcolor
 
3:{
 
4:Public ByteBlue;
 
5:Public ByteGreen;
 
6:Public ByteRed;
 
7:Public ByteAlpha;
 
8: 
 
9:}

 

It mainly utilizes Microsoft's TPL Parallel library and pointer operations, as well as a color structure pointer type conversion.

 

2. c ++ 11 amp code

 
1: Extern "C"_ Declspec (dllexport)Void_ Stdcall gray_image (Unsigned Int* Image,IntHeight,IntWidth)
 
2:{
 
3:Concurrency: extent <2> image_extent (height, width );
 
4:
 
5:/* Texture of four 8-bit integers */ 
6:Array_view <Unsigned Int, 2> image_av (image_extent, image );
 
7:
 
8:Parallel_for_each (image_av.extent,
 
9:[=] (Index <2> idx) restrict (AMP)
 
10:{
 
11:Unsigned IntColor = image_av [idx];
12:Unsigned IntA = (color> 24) & 0xff;
 
13:Unsigned IntR = (color> 16) & 0xff;
 
14:Unsigned IntG = (color> 8) & 0xff;
 
15:Unsigned IntB = (color) & 0xff;
 
16:
17:AutoGray = (R * 38 + G * 75 + B * 15)> 7 );
 
18:
 
19:Image_av [idx] = A <24 | gray <16 | gray <8 | gray;
 
20:
 
21:});
 
22:
 
23:// Copy data from GPU to cpu
 
24:Image_av.synchronize ();
25:
 
26:}

 

It seems that byte cannot be used in the amp, so it can only be converted using Int.

 

For comparison, I tested the speed and speed of C ++ common code, so it should not be a call problem, but a problem with the performance of the AMP itself or the performance of the graphics card.

 

1 Int Size = width * height;
2 Pixelcolor * PTR = (pixelcolor *) image;
3
4 For ( Int I = 0 ; I <size; I ++)
5 {
6 Auto c = * PTR;
7 Auto gray = (C. Red * 38 + C. Green * 75 + C. Blue * 15 )> 7 );
8 (* PTR). Green = (* PTR). Red = (* PTR). Blue = ( Byte ) Gray;
9
10 PTR ++;
11 }

 

 
 
Interface 2

 

Three source codes

http://files.cnblogs.com/xzbrillia/AMP_ImageGray.rar

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.