Introduction to Cuda C Programming-Programming Interface

Last Update:2014-08-06 Source: Internet

Author: User

Tags nvcc

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Cuda C provides a simple way for people familiar with the C programming language to write code executed on a device (GPU.

It consists of a minimal C Language extension set and Runtime Library.

Core language extensions have been introduced in the programming model section. Allow programmers to define core functions and use New syntaxes to specify the grid and block dimensions of each kernel function run. You can find the complete description of the extension in the C language Extension Section. All source code containing these extensions must be compiled using nvcc. For an overview of nvcc, see the section "compile with nvcc.

This section describes the runtime of Cuda C. The runtime provides the c Functions executed by the host to allocate and recycle device memory, transfer data between devices and host memory, and manage multiple devices. You can view the complete description about the runtime in the Cuda Examination Manual.

The runtime is based on low-level C APIs-cuda-driven APIs that can be accessed by applications. The driver API provides additional level of control through low-level concepts such as Cuda context, just like the context of the host processor, and cuad module, just like the device dynamically loads the library. Because most applications do not require additional level of control, they do not use driver APIs, but use runtime. Context and module management are implicit, the result is that the written code is simple and clear. The driver API is described in the driver API section. The complete description is provided in the reference manual.

3.1 compile with nvcc

You can use a Cuda instruction set architecture called PTX to write core functions. The PTX description is provided in the PTX reference manual. However, we generally use more effective advanced languages, such as C. In both cases, the core functions must be compiled into binary code through nvcc so that they can be executed on the device.

Nvcc is a compiler that simplifies the process of compiling C and PTx code: Provides simple and familiar command line options and executes relevant commands to call a set of tools for different compilation stages. This section provides an overview of nvcc workflows and command line options. You can find the complete description in the nvcc user manual.

3.1.1 compilation Workflow

3.1.1.1 offline Compilation

The source code compiled by nvcc can be mixed with the code of the host (executed on the host) and the device (executed on the device. The ncvv workflow mainly involves code for separating hosts and devices:

Compile the device code into an Assembly Form (PTX Code) or binary form (Cubin object ).
Modify the host code: <...> replace it with the required Cuda C runtime functions that load and run compiled core functions from PTX code and Cubin objects.

The modified host code output is not the C Code Compiled by another tool, but the object code that allows nvcc to directly call the host compiler to complete the final compilation phase.

Then the application can:

Link compiled host code (in most cases)
You can also ignore the modified host code (if any) and use the Cuda driver API to load and execute PTX or Cubin objects.

3.1.1.2 instant Compilation

Any PTX code loaded by the application at runtime will be compiled into binary code by the device driver. This is called Instant compilation. Real-time compilation increases the loading time of the application, but it can benefit the application from the new compiler with better performance driven by the new device. Only this path allows applications that do not have any devices to run on the device during compilation. This section describes application compatibility in detail.

When the device driver compiles some PTX code for the application in real time, to avoid repeated compilation when the application is called again, a copy of the generated binary code is automatically cached. Cache-it refers to the computer cache, which automatically becomes invalid when the device driver is updated. Therefore, applications can benefit from the improved instant compiler of the new device driver.

The available environment variables for controlling instant compilation are described in the Cuda environment variables section.

3.1.2 binary compatibility

To be continued...

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More