Cuda Programming Interface (ii) ------ 18 weapons
------ GPU revolution
4. Program Running Control: operations such as stream, event, context, module, and execution control are classified into operation management. Here, the score is clearly at the runtime level and driver level.
Stream: If you are familiar with the graphics card in the Age of AGP, you will know that when data is exchanged between the device and the host, there is a part of the transit data called stream; after the development of g8x, there will be a new stream out hardware device specifically designed for gpgpu to facilitate data transmission. The function of this layer is to output the processed data of vertex shader and pixel shader (in the design of g8x, there is almost no need to separate them and unify them) to the user, after being processed by the user, it is reported to the pipeline for further processing. It can directly read and write local video memory. If the memory is aligned, the speed will be faster. For more information, see simplestream code.
Event: the event principle used by everyone in programming in peacetime serves as a notification, creating an event and synchronizing it. events are often encountered in multi-threaded programming.
Context: context. What is context --! (Chinese translation is called context. Actually, I think this translation is inappropriate, but many traditional books have translated this way. We can only call context) the context here is similar to the context used in the CPU. It is a "resource" (system resource: Stack, memory...) that needs to be used in a process ...... And so on ). I think it is more appropriate to translate the content into processes.
Module: it can be understood as the module in Linux. If you don't know Linux, you don't have to worry about it. modules are programs specific to device. Do you know that a. com file exists long ago in the DOS era? That stuff is like a file that can be called directly on the CPU and loaded into the memory to run. The module can be run after being loaded to the device.
Execution control: this is how the thread runs on the device from the driver level.
Stream and event are available in both the Runtime API and driver API, and function interfaces are similar.
Context, module, and execution control are available only at the driver API level.
There are not many difficulties in the above. In fact, there are not many difficulties in API explanation. The difficulty lies in how to use APIs flexibly. This makes it possible to use API practices multiple times; in fact, sometimes you can also find out the shortcomings of the API to generate a new API. No specific code is provided here. The code can be found in the Programming Manual.
5. Now, we have the OpenGL and direct3d interface functions. They also have two levels of APIS, runtime and driver. The runtime and driver APIs are called. cuda2.0 has also made some optimizations here, in terms of memory switching. For specific code, see the Programming Manual.
As mentioned above, I want to explain the definition of a function. In fact, I have translated a paragraph in my previous post. But now I want to explain it and I will introduce it. I will try to translate those things: GPU nvidia_cuda_programming_cuide_1.0 chapter 4. application Programming Interface
4.2.1 function type qualifiers Function Type
4.2.1.1 _ DEVICE __
_ DEVICE _ required functions:
Run the command on the device.
It can only be called on the device.
4.2.1.2 _ global __
_ Global _ defines a kernel function:
Run on device.
Can only be called on the host
4.2.1.3 _ host __
_ Host _ defined functions:
Run on host,
It can only be called on the host.
The _ host __,__ DEVICE _ or _ global _ function is equivalent to the _ host _ function. The system will compile the function into a host function.
In addition, the _ host _ definition can be used with the _ DEVICE _ definition. The compiler will compile this function as a common function for host and device.
4.2.1.4 restrictions (emphasis, restrictions)
The _ DEVICE _ function is usually inline, so if you do not need inline, you need to add uninline.
The _ DEVICE _ and _ global _ functions do not support recursive calls.
The _ DEVICE _ and _ global _ functions cannot define static variables within the function.
Variable parameters cannot be used for the _ DEVICE _ and _ global _ functions.
The _ DEVICE _ function does not have a function address or pointer to it, but the _ global _ function does.
The _ global _ definition and _ host _ definition cannot be used together.
The _ global _ function must be of the void return type.
Any function that calls _ global _ must specify the running configuration. (Section 4.2.3)
That is, the call method of the thread kernel.
The _ global _ function is called asynchronously. It will be returned before the end of the operation.
The parameter of the _ global _ function is usually called to the device through shared memory, up to 265 bytes.