Cuda C provides a simple way for people familiar with the C programming language to write code executed on a device (GPU.
It consists of a minimal C Language extension set and Runtime Library.
Core language extensions have been introduced in the programming model section. Allow programmers to define core functions and use New syntaxes to specify the grid and block dimensions of each kernel function run. You can find the complete description of the extension in the C language Extension Section. All source code containing these extensions must be compiled using nvcc. For an overview of nvcc, see the section "compile with nvcc.
This section describes the runtime of Cuda C. The runtime provides the c Functions executed by the host to allocate and recycle device memory, transfer data between devices and host memory, and manage multiple devices. You can view the complete description about the runtime in the Cuda Examination Manual.
The runtime is based on low-level C APIs-cuda-driven APIs that can be accessed by applications. The driver API provides additional level of control through low-level concepts such as Cuda context, just like the context of the host processor, and cuad module, just like the device dynamically loads the library. Because most applications do not require additional level of control, they do not use driver APIs, but use runtime. Context and module management are implicit, the result is that the written code is simple and clear. The driver API is described in the driver API section. The complete description is provided in the reference manual.
3.1 compile with nvcc
You can use a Cuda instruction set architecture called PTX to write core functions. The PTX description is provided in the PTX reference manual. However, we generally use more effective advanced languages, such as C. In both cases, the core functions must be compiled into binary code through nvcc so that they can be executed on the device.
Nvcc is a compiler that simplifies the process of compiling C and PTx code: Provides simple and familiar command line options and executes relevant commands to call a set of tools for different compilation stages. This section provides an overview of nvcc workflows and command line options. You can find the complete description in the nvcc user manual.
3.1.1 compilation Workflow
3.1.1.1 offline Compilation
The source code compiled by nvcc can be mixed with the code of the host (executed on the host) and the device (executed on the device. The ncvv workflow mainly involves code for separating hosts and devices:
- Compile the device code into an Assembly Form (PTX Code) or binary form (Cubin object ).
- Modify the host code: <...> replace it with the required Cuda C runtime functions that load and run compiled core functions from PTX code and Cubin objects.
The modified host code output is not the C Code Compiled by another tool, but the object code that allows nvcc to directly call the host compiler to complete the final compilation phase.
Then the application can:
- Link compiled host code (in most cases)
- You can also ignore the modified host code (if any) and use the Cuda driver API to load and execute PTX or Cubin objects.
3.1.1.2 instant Compilation
Any PTX code loaded by the application at runtime will be compiled into binary code by the device driver. This is called Instant compilation. Real-time compilation increases the loading time of the application, but it can benefit the application from the new compiler with better performance driven by the new device. Only this path allows applications that do not have any devices to run on the device during compilation. This section describes application compatibility in detail.
When the device driver compiles some PTX code for the application in real time, to avoid repeated compilation when the application is called again, a copy of the generated binary code is automatically cached. Cache-it refers to the computer cache, which automatically becomes invalid when the device driver is updated. Therefore, applications can benefit from the improved instant compiler of the new device driver.
The available environment variables for controlling instant compilation are described in the Cuda environment variables section.
3.1.2 binary compatibility
To be continued...