Here we will introduce how to write the program function in opencl. The program function is usually in the text format and load it in using interfaces such as clcreateprogramwithsource. This type of code is often used in shader programming to write the code running on the GPU. So for clarity and understanding, let's call the source code text of these program functions as the shader of opencl.
The following are some syntaxes written in the shader.
1 shader is a C-like language derived from the c99 standard (standard accepted by ansi c for 99 years and the latest standard for C)
Not supported:
Header file, function pointer, recursion, and variable-length array (this is not supported by)
Additional types:
The vector type char2 ushaort4 int8 will eventually become the length alignment.
Image Type: image2d_t image3d_t sampler_t...
Event Type event_t (associated with cl_event in API)
2. Work item and Work Group functions
3. Vector Operation
The first half of the vector is Lo, And the last half is hi.
Int4 v = (int4) 7 = (int4) (7,7, 7,7)
V = (in4) (1, 2, 4)
Int2 v2 = V. Lo-> (1, 2)
V2 = V. Hi-> (3, 4)
V2.v. Odd-> (2, 4)
Perform four arithmetic operations on the vector and ABS calculate each element separately.
4. Addressing space descriptor, written at the beginning of the variable, used for the address space in which the variable is located
_ Global
_ Local
_ Private
_ Constant
These four correspond to the storage areas (device global, Work Group, compute unit, and device constant) in the CL architecture respectively)
- The preceding _ can also be removed.
- Currently, global must be constant, that is, it must be assigned a value when declaring global (Global is equal to global constant)
- Pointer conversions in different address spaces are not defined.
5. type conversion 5.1convert type conversion; this is type conversion according to the meaning of the Variable
In the format of convert_desttype <_ sat> <_ roundingmode>,
For example, float4 F4 = (float4) (1.0f, 2.0f, 3.0f, 4.0f)
Int4 I4 = convert_int4_sat_rte (F4)
Desttype: Target type
_ SAT: the maximum or minimum number of displayed items is automatically exceeded.
_ Roundingmode:
_ RTE: indicates the nearest even number.
_ RTZ: approaching 0
_ RTP: positive infinity
_ RTN: Toward negative infinity
The rules here are more complex, see the http://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/convert_T.html
5.2 As conversion: This is a type conversion reinterpreted Based on bit values
Written as as_desttype
The vetctor size of the type before and after conversion is the same, and desttype is the target type. This conversion will keep the bit value unchanged. Based on this, the value is re-interpreted based on desttype.
There are essential differences between as conversion and convert conversion!
For example, float4 F4 = (float4) (1.0f, 2.0f, 3.0f, 4.0f)
Int4 I4 = as_int4 (F4)
6. built-in functions: 6.1 various mathematical functions
: For details, see the built_in function section of http://www.khw.s.org/registry/cl/sdk/1.2/docs/man/xhtml/
Paste a simplified table
6.2work _ group function:
It is mainly used for the interaction between computer items in a group.
- Synchronous Functions
void barrier ( |
Cl_mem_fence_flagsFlags) |
All items in a goup must be executed after the barrier function is executed to continue with the subsequent tasks. It can also be seen as a synchronization point for all items, no matter who is fast or slow, you have to stop at this point. After everyone has reached this point, continue.
The parameters here are divided into two situations:
Clk_local_mem_fence and clk_global_mem_fence
I am not very familiar with this parameter yet. The general idea is to add a mem fence to ensure that the Loca mem or globalmem synchronization is normal at this time. For the concept of MEM fence, let's look at the description of opencl.
- Asynchronous memory copy and prefetch Functions
Async_work_group_copy:It will complete asynchronous memory copy between global and local, which may use the DMA engine (DMA data transmission will not be interrupted by traditional hardware, it will soon ), this function is asynchronous, so an event event_t will be returned for synchronization.
UseWait_group_eventsFunction to wait for the above event to return, used for synchronization
Async_work_group_strided_copy:The document says that it is used for gather data from SRC to DEST, but the meaning of gather in this document cannot be well understood. After careful analysis, the difference between this function and async_work_group_copy is stride, it also completes asynchronous copying, but it can extract some fields from SRC to DST. For example, in graphics, we often use a large array to represent colors, Normal directions, texture coordinates, and so on. They are connected together, such as {color1, ccolor2, color3, tex0, tex1, color1, color2, color3, text0, tex1 ,....}, in this case, we need to extract the color information, so we need to use this stride.
Copy.