The kernel function is a program that runs on each thread of the GPU. Must be defined by the __GLOABL__ function type qualifier. The form is as follows:
__global__ void kernel (param list) {}
The kernel function can only be called on the host side, and the execution parameters must be declared when invoked. The invocation form is as follows:
kernel<<<dg,db, Ns, s>>> (param list);
The <<<>>> operation characters is the execution parameter of the kernel function that tells the compiler how to start the kernel function at run time to illustrate the number of threads in the kernel function and how the threads are organized.
The full execution configuration parameter of the <<<>>> operator for the kernel function is <<<DG, Db, Ns, s>>>
- Parameter DG is used to define the dimensions and dimensions of the entire grid, that is, how many blocks a grid has. is a dim3 type. DIM3 Dg (dg.x, DG.Y, 1) indicates that each row in the grid has a block of dg.x, DG.Y blocks per column, and a third-dimensional constant of 1 (currently a kernel function has only one grid). The entire grid has DG.X*DG.Y blocks, of which the maximum value of dg.x and Dg.y is 65535.
- The parameter DB defines the dimensions and dimensions of a block, that is, how many thread a block has. is a dim3 type. DIM3 Db (db.x, DB.Y, db.z) indicates that each row in the block has db.x thread, each column has db.y thread and a height of db.z. The maximum value for db.x and Db.y is 512,db.z maximum of 62. A block has db.x*db.y*db.z a thread. Hardware with 1.0,1.1 computing power the maximum value for this product is 768, and the maximum value for hardware supported by 1.2,1.3 is 1024.
- The parameter ns is an optional parameter that sets the shared memory size, in bytes, that can be dynamically allocated for each block, in addition to the statically allocated shared memory. This value is 0 or omitted when dynamic allocation is not required.
- The parameter s is an optional parameter of type cudastream_t, with an initial value of zero, indicating which stream the kernel function is in.
Cuda kernel function parameters schematic: kernel<<<dg,db, Ns, s>>> (param list)