Here we will introduce more concepts. Our exampleProgramYesOpenclTo realize gradient color coloring of a square. Here, we will involve variable storage attributes. We also introduce vector data, how vector data is operated, and how vector data is operated in combination with scalar data.
I first paste it on the homepageOpenclKernelCodeAnd then attach the complete project.
// Render a square
// Left-top: red (1, 0, 0)
// Left-bottom: Green (0, 1, 0)
// Right-top: Blue (0, 0, 1)
// Right-bottom: Black (0, 0, 0)
_ Constant float4 left_top = (float4 )( 1.0f , 0.0f , 0.0f , 0.0f );
_ Constant float4 left_bottom = (float4 )( 0.0f , 1.0f , 0.0f ,0.0f );
_ Constant float4 right_top = (float4 )( 0.0f , 0.0f , 1.0f , 0.0f );
_ Constant float4 right_bottom = (float4 )( 0.0f , 0.0f , 0.0f , 0.0f );
_ KERNEL Void Colorshading (
_ Global float4 output [ 256 ] [ 256 ]
)
{
Int Dimx = get_global_id ( 0 );
Int Dimy = get_global_id ( 1 );
_ Local float4 deltaleft = (left_top-left_bottom )/ 255.0f ;
_ Local float4 deltaright = (right_top-right_bottom )/ 255.0f ;
Float4 left = left_bottom + deltaleft *( Float ) Dimy;
Float4 right = right_bottom + deltaright *( Float ) Dimy;
Float4 Delta = (right-left )/ 255.0f ;
Float4 result = left + Delta *( Float ) Dimx;
// Clamp
If (Result. x> 1.0f )
Result. x = 1.0 ;
If (Result. Y> 1.0f )
Result. Y = 1.0f ;
If (Result. x < 0.0f )
Result. x = 0.0f ;
If (Result. Y < 0.0f )
Result. Y = 0.0f ;
Output [dimy] [dimx] = Result + (float4 )( 0.0f , 0.0f , 0.0f , 1.0f );
Attachment: Opencl_shading.zip (36 K) Downloads: 73 let's first talk about vector types. In the above Code, we introduce the float4 type. It is a vector type. The rule for defining vector types is to add N after the basic type. N can be 2, 4, 8, or 16. For example, uchar8, float2, int16, and long4. For the access of each component of the vector type, if the number of components of the vector is within four, we can use X, Y, Z, W in sequence. This identification method is the same as vertex shader in OpenGL shader's access to vector components. In addition, we can use numeric indexes to access each component of a vector. At this time, we can regard a vector variable as an array. If the number of elements in a vector is 16, the numbers of 0th to 9th elements are represented by indexes 0 to 9, respectively, and the numbers of 10th to 15th elements are represented by indexes 0 to 9, we use A to F or A to F (that is, hexadecimal. When we use indexes for representation, the. Of the vector variable must be followed by a letter S. For example, int4 A = int4 (1, 2, 3, 4); then. X is 1;. Y is 2;. Z is 3;. W is 4. Similarly, A. S0 is 1; A. S1 is 2; A. S2 is 3; A. S3 is 4.
For vector variables, we can also assign values to each component flexibly. We will introduce the concept of swizzle here. Swizzle refers to the ability to assign values to a vector using any element corresponding to another vector. For example, int4 A = int4 (1, 2, 3, 4); int4 B =. wzyx; this indicates that the 3rd elements of a are assigned to the 0th elements of B; the 2nd elements of a are assigned to the 1st elements of B; assign the 1st elements of a to the 2nd elements of B, and assign the 0th elements of a to the 3rd elements of B. Then we can do this: B. xz =. s32; indicates that the third element of A is assigned to the 0th elements of B, and the 2nd elements of a are assigned to the 2nd elements of B.
Is it flexible? Haha. The addition, subtraction, multiplication, division, and logical operations between vector variables are performed on each component of the vector. For example, int4 A; int4 B; A * = B; equivalent to:. x * = B. x;. y * = B. y;. z * = B. z;. W * = B. w;
If a vector is computed with a scalar, the scalar is used to perform the same operation with no component of the vector. For example, int4 A; int I; A * = I; is equivalent: a. x * = I;. y * = I;. z * = I;. W * = I; and I * = A; is invalid. Therefore, we must note that when performing arithmetic logic operations using vectors and scalar values, we must place the vectors on the left side of the operator, while the scalar must be on the right side of the operator. In the above Code:
//Clamp if (result. x> 1.0f)
Result. x =1.0;
If(Result. Y>1.0f)
Result. Y =1.0f;
If(Result. x <0.0f)
Result. x =0.0f;
If(Result. Y <0.0f)
Result. Y =0.0f;
}
This part is saturated with the R component and G component of the result respectively. Here, we will introduce Opencl Built-in functions to replace the code. Opencl Built-in functions are generally directly supported by the GPU instruction set. Therefore, a call can basically be completed with only one instruction. So we are writing Opencl Use built-in functions whenever possible. Of course, some built-in mathematical functions sacrifice precision for efficiency. At this time, we need to determine the trade-off.
Next we will introduce anotherOpenclBuilt-in functions --
Gentype clamp (gentype X, gentype minval, gentype maxval)
The syntax is returned: fmin (fmax (x, minval), maxval) is used to obtain values in the minval and maxval ranges for each component of a vector. If the upper limit is exceeded, use minval. If the upper limit is exceeded, use maxval. Then Opencl Program updates:
// Render a square // Left-top: red (1, 0, 0)
// Left-bottom: Green (0, 1, 0)
// Right-top: Blue (0, 0, 1)
// Right-bottom: Black (0, 0, 0)
_ Constant float4 left_top = (float4 )( 1.0f , 0.0f , 0.0f , 0.0f ); _ Constant float4 left_bottom = (float4 )(0.0f , 1.0f , 0.0f , 0.0f ); _ Constant float4 right_top = (float4 )( 0.0f , 0.0f , 1.0f , 0.0f ); _ Constant float4 right_bottom = (float4 )( 0.0f , 0.0f , 0.0f , 0.0f ); _ Constant float4 minvalue = (float4 )(0.0f , 0.0f , 0.0f , 0.0f );
_ Constant float4 maxvalue = (float4 )( 1.0f , 1.0f , 1.0f , 0.0f );
_ KERNEL Void Colorshading (_ global float4 output [ 256 ] [ 256 ])
{
Int Dimx = get_global_id ( 0 ); Int Dimy = get_global_id ( 1 );
_ Local float4 deltaleft = (left_top-left_bottom )/ 255.0f ;
_ Local float4 deltaright = (right_top-right_bottom )/ 255.0f ; Float4 left = left_bottom + deltaleft *( Float ) Dimy;
Float4 right = right_bottom + deltaright *( Float ) Dimy;
Float4 Delta = (right-left )/ 255.0f ;
Float4 result = left + Delta *( Float ) Dimx;
// Clamp result = clamp (result, minvalue, maxvalue );
Output [dimy] [dimx] = Result + (float4 )( 0.0f , 0.0f , 0.0f , 1.0f );
}
Finally, let's discuss this Article Topic -- the address space qualifier of the variable. Opencl There are four types of address space delimiters: Global (_ global or global), local (_ local or local), and constant (_ constant or constant ), private (_ private or private ). The global address space is used to reference the memory objects allocated from the global storage space pool. The memory object can be declared as a pointer to a scalar, a pointer to a vector or a pointer to a user-defined structure. This allows the kernel program to read or write any location of the cache. Note that __global (or global) modifies the address referenced by the pointer variable. Therefore:
__ global long4 g; /// error
__ global image2d_t texture; // OK. A 2D Texture Image
void kernelmain (_ global int * P // OK)
{< BR >__ global float4 A; /// error
}
the local address space is used to describe the variables to be allocated to the local bucket and can be shared by all work items of a working group. This qualifier can be used for real parameters of a function or declared variables in a function. When used to modify the real parameters of a function, the variable must be of the pointer type.
constant address space is used to describe the variables allocated in the global storage space, and they are read-only in the kernel program. These global read-only variables can be shared by all work items of all working groups. This qualifier can be used to modify pointer variable parameters of kernel functions, modify pointer variables in kernel functions, or act as global variables. In this example, we modify the global variable _ constant. Note that the _ constant variable cannot be written. Therefore, when used as a global variable, it must be declared and initialized with a constant immediately. Constants here refer to the expressions used to calculate numerical results during compilation.
the private address space has a wide range. All function parameters and local variables defined in the function are private. Therefore, we can omit the _ private keyword.
note opencl supports the const keyword. This keyword is only checked during compilation. The modified variables cannot be modified, but are irrelevant to the bucket in which the variable is allocated during runtime. Finally, we will give a brief introduction to these keywords and images of actual performance. Currently, popular HPC stream processors such as GPU use a Hierarchical Storage Architecture. The global storage space is very large (equivalent to what we call video memory, which currently has at least 128 MB, and the Mac Mini is the size), but the bandwidth is very expensive, therefore, data transmission is the slowest. The second layer is a local bucket, or a local bucket. A local bucket can only be shared by all work items of a working group, and each working group has its own local storage space. Each local storage space is relatively small, generally around kb. However, its data transmission performance is much higher than that of global storage. A private bucket is private to each work item. That is to say, each work item has its own private storage space. This is actually a register file in the GPU Storage Architecture. For example, if the register file is kb in total and all work items are divided, the storage space allocated to each work item is very small. However, register access is the fastest. reading or writing only takes one period at a time.