1.OpenCL concept
OpenCL is a framework for writing programs for heterogeneous platforms, which can be composed of cpui, GPU, or other types of processors. OpenCL consists of a language for writing kernels (functions that run on OpenCL devices) (based on C99) and a set of APIs for defining and controlling the platform.
OpenCL provides two kinds of parallel mechanisms: task parallelism and data parallelism.
The difference between 2.OpenCL and Cuda
Different points: OpenCL is a common heterogeneous platform programming language, in order to take into account different devices, the use of cumbersome.
Cuda is a framework for the programming of GPGPU by NVIDIA, which is simple to use and a good primer.
Same point: Both are based on task parallelism and data parallelism.
3.OpenCL Programming Steps
(1) Discover and initialize the platforms
Call the two-time clgetplatformids function, get the number of available platforms for the first time, and get an available platform for the second time.
(2) Discover and initialize the devices
Call the two-time clgetdeviceids function, get the number of available devices for the first time, and get an available device for the second time.
(3) Create a context (call Clcreatecontext function)
Context contexts may manage multiple device device.
(4) Create a command queue (call Clcreatecommandqueue function)
A device device corresponds to a command queue.
The context Conetxt sends commands to the corresponding command queue of the device, and the device can execute commands in the command queue.
(5) Create device buffers (invoke Clcreatebuffer function)
The data object is stored in the buffer, where the data required by the device execution program is stored.
The buffer is created by the context Conetxt, so that multiple devices that are managed by the context share the data in the buffer.
(6) Write host data to device buffers (invoke Clenqueuewritebuffer function)
(7) Create and compile the program
Create a Program object that represents your program source file or binary code data.
(8) Create the kernel (call Clcreatekernel function)
According to your program object, generate a kernel object that represents the entry of the device program.
(9) Set the kernel arguments (call clsetkernelarg function)
(a) Configure the Work-item structure (set worksize)
Configuration of Work-item (dimensions, group composition, etc.)
(one) Enqueue the kernel for execution (invoke Clenqueuendrangekernel function)
Put the kernel object, and the Work-item parameter, into the command queue for execution.
() Read the output buffer back to the host (call Clenqueuereadbuffer function)
(OPENCL) Release (this concludes the entire run process)