Cuda basics (1): operational procedures and kernel concepts, cudakernel
Cuda is a parallel computing framework released by Nvidia. GPU is no longer limited to processing graphics and images. It contains a large number of computing units to execute tasks that are large in computing but can be processed in parallel.
Cuda operations include five steps:
1. Memory allocated by the CPU on the GPU: cudaMalloc;
2. the CPU sends data to the GPU: cudaMemcpy;
3. The CPU starts the kernel on the GPU. It is a program written by itself and runs on each thread;
4. The CPU retrieves data from the GPU: cudaMemcpy;
5. CPU releases GPU memory.
The key is step 1. Whether the proper kernel can be written determines whether the problem can be solved correctly and whether the problem can be solved efficiently.
Cuda makes appropriate plans for threads and introducesGridAndBlockBlock is composed of threads, and grid is composed of blocks. Generally, blocksize refers to the number of threads in a block; gridsize refers to the number of blocks in a grid.
A kernel structure is as follows:Kernel <Dg, Db, Ns, S> (param1, param2 ,...)
-Dg: the size of the grid, indicating the number of blocks contained in a grid, which is of the dim3 type. A grid can contain a maximum of 65535*65535*65535 blocks, and Dg. x, Dg. y, Dg. the maximum value of z is 65535;
-Db: the size of a block. It indicates that a block contains multiple threads of the dim3 type. A block can contain up to 1024 threads (cuda2.x) and Db. x and Db. the maximum value of y is 1024, Db. z maximum 64;
(For example, the size of a block can be: 1024*1*1 | 256*2*2 | 1*1024*1 | 2*8*64 | 4*4*64, etc)
-Ns: an optional parameter. If the shared memory in the kernel is dynamically allocated by memory, the size must be specified here, in bytes;
-S: optional parameter, indicating the stream in which the kernel is located.
See:
About cuda version: http://blog.cuvilib.com/2010/06/09/nvidia-cuda-difference-between-fermi-and-previous-architectures/
Blocksize: http://stackoverflow.com/questions/5062781/cuda-max-threads-in-a-block
Gridsize: http://stackoverflow.com/questions/6048907/maximum-blocks-per-gridcuda