Cuda _ global _ function parameter analysis

Source: Internet
Author: User
A question was discussed in the Forum: How the parameters passed in the _ global _ function were transmitted to every thread, and the following analysis was made;

This is a question discussion post: http://topic.csdn.net/u/20090210/22/2d9ac353-9606-4fa3-9dee-9d41d7fb2b40.html

 

C/C ++ code
_ Global _ static void hellocuda (char * result, int num)
{
_ Shared _ int I;
I = 0;
Char p_hellocuda [] = "Hello Cuda! ";
For (I = 0; I result [I] = p_hellocuda [I];
}
}

 
 
PTx code
. Const. align 1. b8 _ constant432 [12] = {0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x43,0x55,0x44,0x41,0x21,0x0 };

. Entry _ z9hellocudapci
{
. Reg. 2010% RH;
. Reg. u32 % R;
. Reg. Pred % P;
. Param. u32 _ cudaparm _ z9hellocudapci_result;
. Param. s32 _ cudaparm _ z9hellocudapci_num;
. Local. Align 4. B8 _ Cuda ___ cuda_p_hellocuda_168 [12];
. Shared. s32 I;
. Loc 14 15 0
$ Lbb1 _ z9hellocudapci:
MoV. u32 % R1, _ constant432 ;//
MoV. u32 % R2, _ Cuda ___ cuda_p_hellocuda_168 ;//
LD. Const. u32 % R3, [% R1 + 0]; // ID: 17 not_variable + 0x0
St. Local. u32 [% r2 + 0], % R3; // ID: 18 _ Cuda ___ cuda_p_hellocuda_168 + 0x0
LD. Const. u32 % R4, [% R1 + 4]; // ID: 17 not_variable + 0x0
St. Local. u32 [% r2 + 4], % R4; // ID: 18 _ Cuda ___ cuda_p_hellocuda_168 + 0x0
LD. Const. u32 % R5, [% R1 + 8]; // ID: 17 not_variable + 0x0
St. Local. u32 [% r2 + 8], % R5; // ID: 18 _ Cuda ___ cuda_p_hellocuda_168 + 0x0
. Loc 14 20 0
MoV. s32 % R6, 0 ;//
LD. Param. s32 % R7, [_ cudaparm _ z9hellocudapci_num]; // ID: 16 _ cudaparm _ z9hellocudapci_num + 0x0
MoV. u32 % R8, 0 ;//
Setp. Le. s32 % P1, % R7, % R8 ;//
@ % P1 bra $ lt_0_9 ;//
MoV. s32 % R9, % R7 ;//
MoV. u32 % R10, _ Cuda ___ cuda_p_hellocuda_168 ;//
MoV. u32 % R11, _ Cuda ___ cuda_p_hellocuda_168 ;//
Add. u32 % R12, % R7, % R11 ;//
LD. Param. u32 % R13, [_ cudaparm _ z9hellocudapci_result]; // ID: 19 _ cudaparm _ z9hellocudapci_result + 0x0
MoV. s32 % R14, % R9 ;//
$ Lt_0_7:
// Loop body line 20, nesting depth: 1, estimated iterations: Unknown
. Loc 14 21 0
LD. Local. S8 % Rh1, [% R10 + 0]; // ID: 20 _ Cuda ___ cuda_p_hellocuda_168 + 0x0
St. Global. S8 [% R13 + 0], % Rh1; // ID: 21
Add. u32 % R13, % R13, 1 ;//
Add. u32 % R10, % R10, 1 ;//
Setp. ne. s32 % P2, % R10, % R12 ;//
@ % P2 bra $ lt_0_7 ;//
St. Shared. s32 [I], % R7; // ID: 22 I + 0x0
Bra. Uni $ lt_0_5 ;//
$ Lt_0_9:
St. Shared. s32 [I], % R6; // ID: 22 I + 0x0
$ Lt_0_5:
. Loc 14 23 0
Exit ;//
$ Ldwend _ z9hellocudapci:
} // _ Z9hellocudapci

 
 
Cubin code

Architecture {sm_10}
Abiversion {1}
Modname {Cubin}
Consts {
Name = _ constant432
Segname = const
Segnum = 0
Offset = 0
Bytes = 12
Mem {
0x6c6c6548 0x5543206f 0x00214144
}
}
Code {
Name = _ z9hellocudapci
Lmem = 12
Smem = 28 // Pay attention to the number of smem
Reg = 3
Bar = 0
Bincode {
0x10000001 0x2400c780 0xd0000001 0x60c00780
0x10000201 0x2400c780 0xd0000801 0x60c00780
0x10000401 0x2400c780 0x307ccbfd 0x6c20c7c8
0xd0001001 0x60c00780 0x10014003 0x00000280
0x1000f801 0x0403c780 0x1000c805 0x0423c780
0x00000005 0xc0000780 0 d4000009 0x40200780
0x20018001 0x00000003 0xd00e0209 0xa0200780
0x3000cbfd 0x6c2147c8 0x20018205 0x00000003
0x1000a003 0x00000280 0x1000ca01 0x0423c780
0x00000c01 0xe4200780 0x30000003 0x00000780
0x00000c01 0xe43f0781
}
}
 
 

This is a piece of PTX code with shared memory and constant ~ And Cubin.

 

From the perspective of Cubin, it is indeed possible to come in through global memory, but it must be distributed to their respective. in the param variable, because the passed parameters can be modified in every thread;
From this point of view: Generally it should be the parameter (global memory --> (constant memory or shared memory) and then broadcast to each thread --> Each thread has a parameter in its own register.
This process should be generally like this;

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.