1. opencl Architecture
Opencl can implement parallel computing on hybrid devices, including CPU, GPU, and Other Processors, such as cell processors and DSPs. With opencl programming, you can achieve portable parallel acceleration code. [However, due to the different hardware performance of each opencl device, specific hardware features may be considered for program optimization].
Generally, the opencl architecture consists of four parts:
- Platform model)
- Execution model)
- Memory Model)
- Programming Model)
2. opencl platform model
The opencl implementation of different vendors defines different opencl platforms. Through the opencl platform, the host can interact with opencl devices. Currently, Major opencl platforms include AMD, vodda, and Intel. Opencl uses an installable client driver model, so that platforms of different vendors can coexist in the system. I have installed AMD and Intel opencl platform on my computer [the current opencl driver model does not allow GPU running by different vendors].
The opencl platform usually includes one host and multiple opencl devices. Each opencl device includes one or more Cus (compute units ), each Cu includes one or more PES (process element ). Each PE has its own program counter (PC ). The host is the host device of the opencl Runtime Library. In the opencl platform of AMD and mongoda, the host generally refers to the x86 CPU.
For the amd platform, all CPUs are one device, each core of the CPU is a Cu, and each GPU is an independent device.
3. General steps of opencl Programming
Next we will use an example to learn about the opencl programming procedure. Suppose we are using the amd opencl platform (because my GPU is hd5730), we have installed amd stream SDK 2.6, set the include and Lib directories in vs2008.
First, create a console program. The initial code is as follows:
1: #include "stdafx.h"
2: #include <CL/cl.h>
3: #include <stdio.h>
4: #include <stdlib.h>
5:
6: #pragma comment (lib,"OpenCL.lib")
7:
8: int main(int argc, char* argv[])
9: {
10: return 0;
11: }
Step 1: select an opencl platform. The function used is
Generally, this function is called twice to obtain the number of platforms available in the system for the first time, and then allocate space for (Platform) platform objects. The second call is to query all platforms, select the desired opencl platform. The code is relatively long. For details, refer to the templatec example in amd stream SDK 2.6, which describes how to build a minimum robust opencl program. To simplify the code and make the program look less complicated, I directly call this function and select the first opencl platform in the system. I installed AMD and Intel platforms in my system, the first platform is AMD. In addition, I have not added Code such as error detection, but added a status variable. Normally, if the function is correctly executed, the returned value is 0.
1: #include "stdafx.h"
2: #include <CL/cl.h>
3: #include <stdio.h>
4: #include <stdlib.h>
5:
6: #pragma comment (lib,"OpenCL.lib")
7:
8: int main(int argc, char* argv[])
9: {
10: cl_uint status;
11: cl_platform_id platform;
12:
13: status = clGetPlatformIDs( 1, &platform, NULL );
14:
15: return 0;
16: }
Step 2: Obtain the opencl device,
This function is usually called twice. The first time you query the number of devices, the second time you retrieve the desired device. To simplify the code, we directly specify the GPU device.
1: #include "stdafx.h"
2: #include <CL/cl.h>
3: #include <stdio.h>
4: #include <stdlib.h>
5:
6: #pragma comment (lib,"OpenCL.lib")
7:
8: int main(int argc, char* argv[])
9: {
10: cl_uint status;
11: cl_platform_id platform;
12:
13: status = clGetPlatformIDs( 1, &platform, NULL );
14:
15: cl_device_id device;
16:
17: clGetDeviceIDs( platform, CL_DEVICE_TYPE_GPU,
18: 1,
19: &device,
20: NULL);
21:
22: return 0;
23: }
Next, let's take a look at the concept of context in opencl:
Generally, context refers to the context environment for managing opencl objects and resources. To manage opencl programs, the following objects must be associated with context:
-Device (devices): executes the kernel program object.
-Program object: the source code of the kernel program
-Kernels: The function running on the opencl device.
-Memory Object (memory objects): The data object processed by the device.
-Command queue: the interaction between devices.
Note: When creating a context, we must associate one or more devices with it. For other opencl resources, they must be associated with the context when they are created. Generally, context is included in the input parameters of the opencl function for these resources.
This function specifies one or more Device objects associated with context. The properties parameter specifies the platform to be used. If it is null, the default value selected by the vendor is used, this function also provides a callback mechanism for users to report errors.
The current Code is as follows:
1: #include "stdafx.h"
2: #include <CL/cl.h>
3: #include <stdio.h>
4: #include <stdlib.h>
5:
6: #pragma comment (lib,"OpenCL.lib")
7:
8: int main(int argc, char* argv[])
9: {
10: cl_uint status;
11: cl_platform_id platform;
12:
13: status = clGetPlatformIDs( 1, &platform, NULL );
14:
15: cl_device_id device;
16:
17: clGetDeviceIDs( platform, CL_DEVICE_TYPE_GPU,
18: 1,
19: &device,
20: NULL);
21: cl_context context = clCreateContext( NULL,
22: 1,
23: &device,
24:
25:
26: return 0;
27: }
Next, let's look at the command queue. In opencl, the command queue is a host request and a mechanism for executing on the device.
- Before the kernel is executed, we generally need to copy some memory, such as transmitting data from the host memory to the device memory.
Note that different devices have their own independent command queues. commands in command queues (kernel functions) may be synchronized, it may also be asynchronous, and their execution order can be ordered or out of order.
The command queue establishes a connection between device and context.
The command queue properties specifies the following content:
- Whether execution in disorder (in amd GPU, it seems that execution in disorder is not supported yet)
- Whether to start profiling. Profiling uses the event mechanism to obtain useful information such as the kernel execution time, but it also has some overhead.
As shown in, the command queue associates devices with the context, although they are not physically connected.
The code after adding a command queue is as follows:
1: #include "stdafx.h"
2: #include <CL/cl.h>
3: #include <stdio.h>
4: #include <stdlib.h>
5:
6: #pragma comment (lib,"OpenCL.lib")
7:
8: int main(int argc, char* argv[])
9: {
10: cl_uint status;
11: cl_platform_id platform;
12:
13: status = clGetPlatformIDs( 1, &platform, NULL );
14:
15: cl_device_id device;
16:
17: clGetDeviceIDs( platform, CL_DEVICE_TYPE_GPU,
18: 1,
19: &device,
20: NULL);
21: cl_context context = clCreateContext( NULL,
22: 1,
23: &device,
24: NULL, NULL, NULL);
25:
26: cl_command_queue queue = clCreateCommandQueue( context,
27: device,
28: CL_QUEUE_PROFILING_ENABLE, NULL );
29:
30: return 0;
31: }