OpenCL learning stepbystep (2) A simple OpenCL Program

Source: Internet
Author: User
Now, we start to write a simple OpenCL program, calculate the sum of the two arrays, and put them in another array. The program calculates with the CPU and GPU respectively, and finally verifies whether they are equal. The process of the OpenCL program is roughly as follows: The main code in sourcecode is: intmain (intargc, char * argv []) {

Now, we start to write a simple OpenCL program, calculate the sum of the two arrays, and put them in another array. The program calculates with the CPU and GPU respectively, and finally verifies whether they are equal. The process of the OpenCL program is roughly as follows: The main code in the source code is: int main (int argc, char * argv []) {// create in the host memory

Now, we start to write a simple OpenCL program, calculate the sum of the two arrays, and put them in another array. The program calculates with the CPU and GPU respectively, and finally verifies whether they are equal. The procedure of the OpenCL program is roughly as follows:

The main code in source code is as follows:

Int main (int argc, char * argv [])

{

// Create three buffers in the host memory

Float * buf1 = 0;

Float * buf2 = 0;

Float * buf = 0;

Buf1 = (float *) malloc (BUFSIZE * sizeof (float ));

Buf2 = (float *) malloc (BUFSIZE * sizeof (float ));

Buf = (float *) malloc (BUFSIZE * sizeof (float ));

// Initialize buf1 and buf2 with some random values

Int I;

Srand (unsigned) time (NULL ));

For (I = 0; I <BUFSIZE; I ++)

Buf1 [I] = rand () %65535;

Srand (unsigned) time (NULL) + 1000 );

For (I = 0; I <BUFSIZE; I ++)

Buf2 [I] = rand () %65535;

// Compute the sum of buf1, buf2, and cpu.

For (I = 0; I <BUFSIZE; I ++)

Buf [I] = buf1 [I] + buf2 [I];

Cl_uint status;

Cl_platform_id platform;

// Create a platform object

Status = clGetPlatformIDs (1, & platform, NULL );

NOTE: If more than one opencl platform is installed in our system, for example, there are two opencl platforms in my OS: intel and amd. Using the above Code may cause errors, because it has obtained intel's opencl platform, while intel's platform only supports cpu, and our subsequent operations are based on gpu, we can use the following code to obtain AMD's opencl platform.

cl_uint numPlatforms;

std::string platformVendor;

status = clGetPlatformIDs(0, NULL, &numPlatforms);

if(status != CL_SUCCESS)

{

return 0;

}

if (0 < numPlatforms)

{

cl_platform_id* platforms = new cl_platform_id[numPlatforms];

status = clGetPlatformIDs(numPlatforms, platforms, NULL);

char platformName[100];

for (unsigned i = 0; i < numPlatforms; ++i)

{

status = clGetPlatformInfo(platforms[i],

CL_PLATFORM_VENDOR,

sizeof(platformName),

platformName,

NULL);

platform = platforms[i];

platformVendor.assign(platformName);

if (!strcmp(platformName, "Advanced Micro Devices, Inc."))

{

break;

}

}

std::cout << "Platform found : " << platformName << "\n";

delete[] platforms;

}

Cl_device_id device;

// Create a GPU Device

ClGetDeviceIDs (platform, CL_DEVICE_TYPE_GPU, 1, & device, NULL );

// Create context

Cl_context context = clCreateContext (NULL, 1, & device, NULL );

// Create a command queue

Cl_command_queue queue = clCreateCommandQueue (context,

Device,

CL_QUEUE_PROFILING_ENABLE, NULL );

// Create three OpenCL memory objects and copy buf1 content by implicit copying

// Copy the content to clbuf1. The buf2 content is copied to clbuf2 by means of display copy.

Cl_mem clbuf1 = clCreateBuffer (context,

CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,

BUFSIZE * sizeof (cl_float), buf1,

NULL );

Cl_mem clbuf2 = clCreateBuffer (context,

CL_MEM_READ_ONLY,

BUFSIZE * sizeof (cl_float), NULL,

NULL );

Cl_event writeEvt;

Status = clEnqueueWriteBuffer (queue, clbuf2, 1, 0, BUFSIZE * sizeof (cl_float), buf2, 0, 0, 0 );

The above code copies the content in buf2 to clbuf2. Because buf2 is located on the host and clbuf2 is located on the device, this function will perform a host-to-device transmission operation, or a copy operation from system memory to video memory, so I placed the clFush function behind this function to submit all the commands in the command queue to the device (note: this command does not ensure that the command execution is complete). Therefore, we call the waitForEventAndRelease function to wait for the write buffer to complete. swaitForEventAndReleae is a user-defined function with the following content, the main code is to use the event to check whether our operation is complete. If the operation is not completed, the program will always block in this line of code, in addition, we can use the built-in clWaitForEvents function in opencl to replace clFlush and swaitForEventAndReleae.

// Wait until the event is completed

Int waitForEventAndRelease (cl_event * event)

{

Cl_int status = CL_SUCCESS;

Cl_int eventStatus = CL_QUEUED;

While (eventStatus! = CL_COMPLETE)

{

Status = clGetEventInfo (

* Event,

CL_EVENT_COMMAND_EXECUTION_STATUS,

Sizeof (cl_int ),

& EventStatus,

NULL );

}

Status = clReleaseEvent (* event );

Return 0;

}

Status = clFlush (queue );

// Wait until the data transmission is complete before proceeding

WaitForEventAndRelease (& writeEvt );

Cl_mem buffer = clCreateBuffer (context,

CL_MEM_WRITE_ONLY,

BUFSIZE * sizeof (cl_float ),

NULL, NULL );

In the kernel file, the Code executed in the gpu is put in a separate file add. cl. In this program, the kernel code is very simple, but the two arrays are added. The kernel code is as follows:

__kernel void vecadd(__global const float* A, __global const float* B, __global float* C)

{

int id = get_global_id(0);

C[id] = A[id] + B[id];

}

// The kernel File is add. cl.

Const char * filename = "add. cl"

Std: string sourceStr;

Status = convertToString (filename, sourceStr );

ConvertToString is also a user-defined function that reads the kernel source file into a string. Its code is as follows:

// Read the text file into a string to read the kernel source file

Int convertToString (const char * filename, std: string & s)

{

Size_t size;

Char * str;

Std: fstream f (filename, (std: fstream: in | std: fstream: binary ));

If (f. is_open ())

{

Size_t fileSize;

F. seekg (0, std: fstream: end );

Size = fileSize = (size_t) f. tellg ();

F. seekg (0, std: fstream: beg );

Str = new char [size + 1];

If (! Str)

{

F. close ();

Return NULL;

}

F. read (str, fileSize );

F. close ();

Str [size] = '\ 0 ';

S = str;

Delete [] str;

Return 0;

}

Printf ("Error: Failed to open file % s \ n", filename );

Return 1;

}

Const char * source = sourceStr. c_str ();

Size_t sourceSize [] = {strlen (source )};

// Create a program object

Cl_program program = clCreateProgramWithSource (context, 1, & source, sourceSize, NULL );

// Compile the program object

Status = clBuildProgram (program, 1, & device, NULL );

If (status! = 0)

{

Printf ("clBuild failed: % d \ n", status );

Chartbuf [0x10000];

ClGetProgramBuildInfo (program, device, CL_PROGRAM_BUILD_LOG, 0x10000, tbuf, NULL );

Printf ("\ n % s \ n", tbuf );

Return-1;

}

// Create a Kernel object

Cl_kernel kernel = clCreateKernel (program, "vecadd", NULL );

// Set the Kernel Parameter

Cl_int clnum = BUFSIZE;

ClSetKernelArg (kernel, 0, sizeof (cl_mem), (void *) & clbuf1 );

ClSetKernelArg (kernel, 1, sizeof (cl_mem), (void *) & clbuf2 );

ClSetKernelArg (kernel, 2, sizeof (cl_mem), (void *) & buffer );

Note: When executing the kernel, we only set the number of global work items, but not the group size. At this time, the system will use the default work group size, which may be 256.

// Run the kernel command. The Range value is 1 dimension, and the work itmes size is BUFSIZE.

Cl_event ev;

Size_t global_work_size = BUFSIZE;

ClEnqueueNDRangeKernel (queue, kernel, 1, NULL, & global_work_size, NULL, 0, NULL, & ev );

Status = clFlush (queue );

WaitForEventAndRelease (& ev );

// Copy data back to host memory

Cl_float * ptr;

Cl_event mapevt;

Ptr = (cl_float *) clEnqueueMapBuffer (queue, buffer, CL_TRUE, CL_MAP_READ, 0, BUFSIZE * sizeof (cl_float), 0, NULL );

Status = clFlush (queue );

WaitForEventAndRelease (& mapevt );

// Verify the result and compare it with the cpu computing result

If (! Memcmp (buf, ptr, BUFSIZE ))

Printf ("Verify passed \ n ");

Else printf ("verify failed ");

If (buf)

Free (buf );

If (buf1)

Free (buf1 );

If (buf2)

Free (buf2 );

After the program ends, these opencl objects are generally automatically released, but for the sake of completeness of the program, a good habit is developed. Here I add the code to manually release the opencl object.

// Delete the OpenCL resource object

ClReleaseMemObject (clbuf1 );

ClReleaseMemObject (clbuf2 );

ClReleaseMemObject (buffer );

ClReleaseProgram (program );

ClReleaseCommandQueue (queue );

ClReleaseContext (context );

Return 0;

}

After the program is executed, the interface is as follows:

Complete code can be found:

Project File gclTutorial1

Code download: http://files.cnblogs.com/mikewolf2002/gclTutorial.zip

Author: Mike laolo

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.