Introduction to Cuda C Programming-Programming Interface

Source: Internet
Author: User
Tags nvcc

Cuda C provides a simple way for people familiar with the C programming language to write code executed on a device (GPU.

It consists of a minimal C Language extension set and Runtime Library.

Core language extensions have been introduced in the programming model section. Allow programmers to define core functions and use New syntaxes to specify the grid and block dimensions of each kernel function run. You can find the complete description of the extension in the C language Extension Section. All source code containing these extensions must be compiled using nvcc. For an overview of nvcc, see the section "compile with nvcc.

This section describes the runtime of Cuda C. The runtime provides the c Functions executed by the host to allocate and recycle device memory, transfer data between devices and host memory, and manage multiple devices. You can view the complete description about the runtime in the Cuda Examination Manual.

The runtime is based on low-level C APIs-cuda-driven APIs that can be accessed by applications. The driver API provides additional level of control through low-level concepts such as Cuda context, just like the context of the host processor, and cuad module, just like the device dynamically loads the library. Because most applications do not require additional level of control, they do not use driver APIs, but use runtime. Context and module management are implicit, the result is that the written code is simple and clear. The driver API is described in the driver API section. The complete description is provided in the reference manual.

3.1 compile with nvcc

You can use a Cuda instruction set architecture called PTX to write core functions. The PTX description is provided in the PTX reference manual. However, we generally use more effective advanced languages, such as C. In both cases, the core functions must be compiled into binary code through nvcc so that they can be executed on the device.

Nvcc is a compiler that simplifies the process of compiling C and PTx code: Provides simple and familiar command line options and executes relevant commands to call a set of tools for different compilation stages. This section provides an overview of nvcc workflows and command line options. You can find the complete description in the nvcc user manual.

3.1.1 compilation Workflow

3.1.1.1 offline Compilation

The source code compiled by nvcc can be mixed with the code of the host (executed on the host) and the device (executed on the device. The ncvv workflow mainly involves code for separating hosts and devices:

  1. Compile the device code into an Assembly Form (PTX Code) or binary form (Cubin object ).
  2. Modify the host code: <...> replace it with the required Cuda C runtime functions that load and run compiled core functions from PTX code and Cubin objects.

The modified host code output is not the C Code Compiled by another tool, but the object code that allows nvcc to directly call the host compiler to complete the final compilation phase.

Then the application can:

  1. Link compiled host code (in most cases)
  2. You can also ignore the modified host code (if any) and use the Cuda driver API to load and execute PTX or Cubin objects.

3.1.1.2 instant Compilation

Any PTX code loaded by the application at runtime will be compiled into binary code by the device driver. This is called Instant compilation. Real-time compilation increases the loading time of the application, but it can benefit the application from the new compiler with better performance driven by the new device. Only this path allows applications that do not have any devices to run on the device during compilation. This section describes application compatibility in detail.

When the device driver compiles some PTX code for the application in real time, to avoid repeated compilation when the application is called again, a copy of the generated binary code is automatically cached. Cache-it refers to the computer cache, which automatically becomes invalid when the device driver is updated. Therefore, applications can benefit from the improved instant compiler of the new device driver.

The available environment variables for controlling instant compilation are described in the Cuda environment variables section.

3.1.2 binary compatibility

To be continued...

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.