Comparison of the similarities and differences between Cuda and OpenCL
I. Overview
There is some programming experience with Cuda and OpenCL, but careful people can find that OpenCL is modeled after Cuda. Now that the two GPU programming frameworks are so similar, what's the difference between them? Here's the one by one-way.
Second, the data parallel model
The data parallel model adopted by OPENCL is a data parallel model using CUDA. The table below reflects the mapping between CUDA and OpenCL parallel models.
OpenCL |
CUDA |
Kernel function |
Kernel function |
Host Program |
Host Program |
N-drange |
Grid |
Work Item |
Thread |
Team |
Thread block |
Speaking of N-drange, work items, and workgroups, Cuda and OpenCL are very similar, even in the same way, in device-side programs, CUDA is primarily accessed through predefined variables, and OpenCL is accessed through predefined APIs. The following table is the specific comparison:
OPENCL |
Meaning |
CUDA |
GET_GLOBAL_ID (0) |
Global index of work item on X dimension |
blockidx.x*blockdim.x+threadidx.x |
GET_LOCAL_ID (0) |
Local index of the work item in the X dimension of the workgroup |
threadidx.x |
Get_global_size (0) |
Size of x dimension in N-drange, that is, number of threads |
Griddim.x*blockdim.x |
Get_local_size (0) |
Size on x dimension of each workgroup |
Blockdim.x |
Third, the equipment system structure
Cuda and OpenCL are similar and are heterogeneous parallel computing systems. There is one host and one or more computing devices in the system. For computing devices, Cuda is called a multi-drink stream Processor (SM), whereas OpenCL is called a computational unit (CU). Also, in OpenCL, a CU is equivalent to a CPU core in the CPU.
In addition, there are some similarities between Cuda and OpenCL in the memory model, and there are some different places, as shown in the following table:
OpenCL Memory Type |
Host Access |
Device Access |
Cuda Memory Type |
Global memory |
to allocate dynamically; Read and write |
cannot be assigned; can be accessed by all work items, large space; slow |
Global memory |
Constant memory |
to allocate dynamically; Read and write |
Static assignment, work item Read only |
Constant memory |
Local memory |
dynamically allocated; not accessible |
static assignment; Work item access for the same workgroup |
Local memory |
Private storage |
cannot be assigned; cannot be accessed |
static allocation; Each work item can read and write to its own part |
Registers and local memory |
There are several different places where opencl can dynamically allocate constant memory on the host side, and the private memory in OpenCL corresponds to the automatic variable in Cuda.
Iv. kernel functions and start-up
The kernel function in OpenCL begins with __kernel and corresponds to the __global__ in Cuda. Also, in terms of program compilation, Cuda compiles the code on the device and executes it, while OpenCL is compiled at run time. The OpenCL boot kernel function is called by the runtime API, and Cuda is started directly by the function name <<<dimGrid,dimBlock>>>. Finally, the N-drange (grid) configuration is different, Cuda is the parameter that is sandwiched between <<< and >>> when the function call is configured, and OPENCL is configured when the Clenqueuendrangekernel function is called.
V. Last
Cuda and OpenCL are very similar, can be said to learn one easy to go to another, it is recommended to learn Cuda first, and then learn OpenCL. In addition, there is what is not all the place also hope that we have a lot of guidance!
Comparison of the similarities and differences between Cuda and OpenCL