Cuda Learning and Summary 1

Last Update:2016-06-19 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I. Basic CONCEPTS

1. CUDA

In 2007, NVIDIA launched the programming model of CUDA (Compute Unified device Architecture, unified Computing Device architecture) in order to make full use of the advantages of CPUs and GPUs in the application for CPU/GPU joint execution. The need for this co-execution has been reflected in the latest centralized programming model (opencl,openacc,c++ AMP).

2. Parallel programming languages and models

The most widely used are the messaging interfaces designed for scalable cluster computing (message passing INTERFACE,MPI) and OpenMP for multi-processor systems for shared storage. At present, many HPC (high-performance Computing) clusters adopt heterogeneous CPU/GPU node model, that is, MPI and Cuda mixed programming, to realize multi-machine multi-card model. Currently, CUDA-enabled programming languages are C,c++,fortran,python,java [2]. CUDA uses the parallel programming style of SPMD (Single-program multiple-data, single program multi-data).

3. Data parallelism, Task parallelism

Analytical:

Task parallelism is usually done by the task decomposition of the application. For example, for a simple application that needs to do vector addition and matrix-vector multiplication, each operation can be considered a task. If these two tasks can be executed independently, then the task parallelism can be obtained.

4. Cuda's extension to function declarations in C

Analytical:

(1) __device__ float devicefunc (): Executed on the device and can only be called from the device.

(2) __global__ float Kernelfunc (): Executes on the device and can only be called from the host.

(3) __host__ float Hostfunc (): Executes on the host and can only be called from the host.

Description

If the cuda extension keyword is not specified when the function is declared, the default function is the host function.

5. Thread,block,grid,warp,sp,sm

Analytical:

(1) grid, block, Thread: When using Cuda for programming, a grid is divided into blocks, and a block is divided into multiple thread.

(2) SP: the most basic processing unit, the final specific instructions and tasks are processed on the SP.
(3) SM: multiple SP plus other resources (such as storage resources, shared memory, reservoirs, etc. ) constitute a SM.
(4) Warp:gpu the dispatch unit when executing the program. Currently Cuda's warp size is 32, same as in a warp thread, executing the same instruction with different data resources.

6. Cuda kernel function

The complete execution configuration parameter form of the kernel function is <<<DG, Db, Ns, S>>>, as follows:

(1) Parameter DG is used to define the dimensions and dimensions of the entire grid, that is, how many blocks a grid has.

(2) Parameter DB defines the dimensions and dimensions of a block, that is, how many thread a block has.

(3) The parameter NS is an optional parameter that sets the shared memory size, in bytes, that can be dynamically allocated for each block, in addition to the statically allocated shared memory. This value is 0 or omitted when dynamic allocation is not required.

(4) The parameter s is an optional parameter of type cudastream_t, with an initial value of zero, indicating which stream the kernel function is in.

7. Cuda Storage System

(1) Register

(2) Local memory

(3) Shared memory

(4) Global memory

(5) Constant memory

8. CUDA SDK

9. Cuda Software Stack

Description

the Cuda software stack contains multiple tiers, device drivers, application programming interfaces (APIs) and their runtimes, two higher-level general-purpose math libraries, cufft and Cublas. In contrast, Cuda driver is the underlying API, and Cuda runtime is a high-level API.

Reference documents:

[1] Java bindings for cuda:http://jcuda.org/

Cuda Learning and Summary 1

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Cuda Learning and Summary 1

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Cuda Learning and Summary 1

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support