Cuda basics (1): operational procedures and kernel concepts, cudakernel

Source: Internet
Author: User

Cuda basics (1): operational procedures and kernel concepts, cudakernel

Cuda is a parallel computing framework released by Nvidia. GPU is no longer limited to processing graphics and images. It contains a large number of computing units to execute tasks that are large in computing but can be processed in parallel.

 

Cuda operations include five steps:

1. Memory allocated by the CPU on the GPU: cudaMalloc;

2. the CPU sends data to the GPU: cudaMemcpy;

3. The CPU starts the kernel on the GPU. It is a program written by itself and runs on each thread;

4. The CPU retrieves data from the GPU: cudaMemcpy;

5. CPU releases GPU memory.

The key is step 1. Whether the proper kernel can be written determines whether the problem can be solved correctly and whether the problem can be solved efficiently.

 

Cuda makes appropriate plans for threads and introducesGridAndBlockBlock is composed of threads, and grid is composed of blocks. Generally, blocksize refers to the number of threads in a block; gridsize refers to the number of blocks in a grid.

 

A kernel structure is as follows:Kernel <Dg, Db, Ns, S> (param1, param2 ,...)

-Dg: the size of the grid, indicating the number of blocks contained in a grid, which is of the dim3 type. A grid can contain a maximum of 65535*65535*65535 blocks, and Dg. x, Dg. y, Dg. the maximum value of z is 65535;

-Db: the size of a block. It indicates that a block contains multiple threads of the dim3 type. A block can contain up to 1024 threads (cuda2.x) and Db. x and Db. the maximum value of y is 1024, Db. z maximum 64;

(For example, the size of a block can be: 1024*1*1 | 256*2*2 | 1*1024*1 | 2*8*64 | 4*4*64, etc)

-Ns: an optional parameter. If the shared memory in the kernel is dynamically allocated by memory, the size must be specified here, in bytes;

-S: optional parameter, indicating the stream in which the kernel is located.

 

See:

About cuda version: http://blog.cuvilib.com/2010/06/09/nvidia-cuda-difference-between-fermi-and-previous-architectures/

Blocksize: http://stackoverflow.com/questions/5062781/cuda-max-threads-in-a-block

Gridsize: http://stackoverflow.com/questions/6048907/maximum-blocks-per-gridcuda

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.