Cuda programming Interface (ii)------18 weapons------the GPU revolution

Source: Internet
Author: User
Tags definition manual execution interface thread linux

4. Program operation control: Like Stream,event,context, Module, Execution controls such as we are classified to run management. There are also runtime levels and driver levels. Stream: If you know the video card of the AGP era, you know that there is a data exchange between device and host as part of the transit data, called stream; after the development to g8x, there is a new GPGPU unified design for easy data transmission of the stream Out of the hardware device.

The function of this layer is to vertex Shader and Pixel Shader (in the design of the g8x has almost no need to separate them, almost unified) processing the completed data output to the user, by the user processing and then feedback to the pipeline to continue processing. It can read and write local video memory directly. If the memory is aligned, the speed will be faster, and this section can refer to the Simplestream code. Event: And ordinary everyone in the programming use of the event a truth, are to play the role of notification, create an event, and then synchronization; event is frequently encountered in multithreaded programming. Context: Contexts, what is called context--! (Chinese translation is called the context, really, I think this translation is not appropriate, but many traditional books are so translated, we can only call context) here and CPU inside the use of the contextual similar, are a process inside the need to use some "resources" (System resources: stacks, memory ... etc.). I think it's more appropriate to translate the stuff that the process contains. Module: Can be understood as the meaning of the module in Linux, if you do not understand Linux also do not worry, module is a specific device program. Know a long time ago in the DOS era, there are. com files? That Dongdong is like a file that can be invoked directly on the CPU, and it can be run in memory.

The module here is loaded onto the device and can be run. Execution control: This is how the thread runs on the device from the drive level. Stream and event are available in both the runtime API and the driver API, and the function interface is similar. Context, Module, and Execution control are only available at the driver API level. Above these are not many difficulties, in fact, the interpretation of the API, generally not many difficulties, the difficulty lies in how to flexible use of the API, it has to go through many exercises, more with API practice; In fact, sometimes can also find the lack of APIs, will produce a new API. There is no specific code here, the code can refer to the programming manual.

5. OK, here is the left OpenGL and Direct3D interface functions, he also has two levels of API, there are runtime levels, there are driver levels. Runtime and driver levels are called api,cuda2.0 here are also some optimizations in memory swapping. Specific code reference Programming manual. Previously said to explain the definition of function, in fact, in the previous post just have translated a paragraph, but now to explain, also put forward, specifically to turn those things out: GPU nvidia_cuda_programming_cuide_1.0 Chapter 4. Application Programming Interface

4.2.1 function type Qualifiers functions types

4.2.1.1 __device__

__device__ the function specified:

Executed on the device.

Can only be invoked on device.

4.2.1.2 __global__

__global__ defines a kernel function:

Run on the device.

Can only be invoked on host

4.2.1.3 __host__

Functions defined by __host__:

Running on the host,

Can only be invoked on host.

Functions that do not define __host__,__device__ or __global__ are equivalent to the __HOST__ function, and the system compiles functions into a host function.

In addition, the __HOST__ definition can be used with the __device__ definition, and the compiler compiles the function into a universal function of host and device.

4.2.1.4 restrictions (emphasis, restriction)

__device__ functions are usually inline, so you need to add uninline if you don't inline

Recursive calls are not supported by both the __device__ and __global__ functions.

Neither the __device__ nor the __global__ function can define a static variable inside a function.

__DEVICE__ and __GLOBAL__ functions cannot use variable parameters.

The __DEVICE__ function has no function address and no function pointer to it, but the __global__ function has.

__GLOBAL__ definitions and __HOST__ definitions cannot be used together.

The __global__ function must be a void return type.

Any function that invokes __global__ must indicate the run configuration. (Section 4.2.3) is the calling method of the thread kernel.

The __global__ function is called asynchronously. is returned before the end of the run.

The parameters of the __global__ function are usually called through the shared memory to the device, up to 265bytes.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.