Spare some time to write a blog post, hoping to help new contacts. I'm here today to learn a little bit about function types
Here is a description of the program function in OpenCL, the program function is usually in textual form, and then the application of Clcreateprogramwithsource such an interface load comes in. This form is often used in shader programming to write code that runs on the GPU, so for clarity and convenience, the source text for these program functions is called the shader of OpenCL.
It's all written in shader. 1 Shader is the language of Class C, derived from the C99 standard (99 ANSI C accepted standard, also the latest standard of C)
Not supported:
header file, function pointer, recursive, variable-length array (this vs also does not support)
Additional Types of addition:
Vector type Char2 ushaort4 int8 these will eventually become length-aligned
Image type image2d_t image3d_t sampler_t ...
Event Type event_t (associated with API Cl_event) 2.work Item and work group related functions
3.vector Manipulation
The first half of the vector is lo and the latter half is hi
Int4 v= (int4) 7 = (INT4) (7,7,7,7)
v= (IN4) (1,2,3,4)
Int2 V2=v.lo
V2=v.hi (3,4)
V2.v.odd (2,4)
For vector arithmetic, ABS is calculated for each element separately
4. Addressing space descriptor, written at the top of the variable, for the address space in which the variable is located
__global
__local
__private
__constant
These four respectively correspond to the storage area in CL Architecture (device Global, work Group, Compute unit, device constant)
The previous __ can also remove the current global must be constant, that is, declare global must be assigned value (Global is the global constant) in different address space of the pointer conversion is not defined
5. Type conversion
5.1convert type conversion; This is the conversion of the variable semantics by type
Written in convert_desttype<_sat><_roundingmode> form,
such as Float4 f4= (FLOAT4) (1.0f,2.0f,3.0f,4.0f)
Int4 I4=convert_int4_sat_rte (f4)
Desttype: Target Type
_sat: Out of range automatically boils down to the number of maximum or minimum performance
_roundingmode:
_rte: Represents the closest even
_rtz: close to 0
_RTP: Toward the positive infinity
_RTN: Towards the negative infinity
The rules are more complicated, see http://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/convert_T.html.
5.2 As conversion: This is a new type conversion based on the bit value
Written As_desttype
Where the type of the conversion before and after the Vetctor size is the same, Desttype is the target type, this conversion will adhere to the bit value stability, based on desttype new interpretation of the value
The AS and convert transformations have a substantial distinction.
such as Float4 f4= (FLOAT4) (1.0f,2.0f,3.0f,4.0f)
Int4 I4=as_int4 (f4)
6. Built-in functions:
6.1 More and more mathematical functions
: see
http://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/a daily truth
Some cold, some cold, some helpless in my heart, I walk in the night, some trembling, the body huddled, the new is also shaking, I can not see the road ahead, where, feeling confused, the chest is a bit stuffy, I look around, no one's street seems deserted, I feel the whole world will give up. The footsteps of the wandering between, tears already drip ...
The BUILT_IN function section
Put a short summary
6.2work_group function:
Important for the interaction between computer item within a group
Synchronization functions
void Barrier ( |
Cl_mem_fence_flags flags) |
All item within a goup must be completed after this barrier function to continue the subsequent things, but also as this is a synchronization point of all item, no matter who is fast who is slow, must stop at this point, everyone to this point, and then continue.
The parameters here are in two cases:
Clk_local_mem_fence and Clk_global_mem_fence
This parameter I did not make very understand, careless is to join a MEM fence guarantee at this time Loca mem or Globalmem synchronization normal, about MEM fence concept also to see the description of OpenCL
Asynchronous memory copy and prefetch functions
Async_work_group_copy: He will implement an asynchronous memory copy between global and local, which may apply to the DMA engine (DMA data transfer does not apply traditional hardware interrupts, will soon), this function is asynchronous, So it returns an event event_t for synchronization
Apply the Wait_group_events function to wait for the above event to return, for synchronization
Async_work_group_strided_copy: The document says it is used for gather data from SRC to dest, but the meaning of gather in the document can not be well understood, careful analysis, this function with Async_work_group_ The difference between copy and stride is that he is also an asynchronous copy, but it can extract part of the domain from SRC out of DST. For example, in graphics we often use a large array to represent color, normal, texture coordinates, and so on, and they are joined together, such as {color1,ccolor2,color3,tex0,tex1,color1,color2,color3,text0,tex1, ....}, when we need to extract the color information from it, it is necessary to use this stride copy.
Http://www.cnblogs.com/jisi5789/archive/2013/05/22/3093354.html