Cuda Basic Concepts

Last Update:2015-01-15 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

CUDA Computational Model

Cuda is calculated in two parts, the serial part executes on the host, namely the CPU, while the parallel part executes on the device, namely the GPU.

Cuda has added some extensions, including libraries and keywords, compared to the traditional C language.

Cuda code is submitted to the NVCC compiler, which divides the code into both the host code and the device code.

The host code is the original C language, referred to GCC,ICC or other compiler processing;

The Device Code section is given to a component called the Just in Time compiler, which is compiled before the code is run. Device code compiles a Java-like bytecode file called ptx, and then generates ISA running on top of the GPU, or co-processing.

Parallel thread pattern on device

The parallel threading array consists of the Grid--block--thread three-level structure, as shown in:

Each grid contains n blocks, and each block contains n thread.

Here you need to mention the concept of SPMD: SPMD, the single program multiple data, refers to the same programs that handle the different numbers. The thread that executes on the device side belongs to this type, and all threads in each grid execute the same program (shared PC and IR pointer). But these threads need to get their own data from the shared storage, which requires a data-positioning mechanism. Cuda's positioning formula is as follows:

i = blockidx.x * blockdim.x + threadidx.x

Bllockidx identifies Block,blockdim as the size of the block on that dimension, Threadidx is the identity of the thread inside the block.

Note the suffix of. x, which is because Cuda's threading array can be multidimensional (such as), BLOCKIDX and Threadidx can reach up to 3 dimensions. This can provide great convenience for processing images and spatial data.

The memory model on the device

The memory model on the device is as follows:

Each thread has its own copy of the register and local memory space.

Each thread in the same block has a shared copy of the share memory.

In addition, all thread (including thread of different blocks) shares a copy of the global memory,constant memory, texture memory.

different grids have their own global memory, constant memory and texture memory

Each grid has a shared storage, where each thread has its own register. The host code is responsible for allocating shared memory space in the grid, as well as the transfer of data between host and device. The device code interacts only with shared memory and local registers.

Function ID

Cuda's functions are divided into three types:

Note that both are double-underlined. The __GLOBAL__ function is the entry in the C code that is calculated on the device that is called.

The __host__ function is a traditional C function and is also the default function type. The reason for this increase is that sometimes __device__ and __host__ can be used together, allowing the compiler to know that a two-version function needs to be compiled.

Cuda Basic Concepts

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Cuda Basic Concepts

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Cuda Basic Concepts

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support