transferred from: http://www.cnblogs.com/mikewolf2002/archive/2012/09/06/2674125.html
Author: Mike Old Wolf
In tutorial 2, we read the kernel source file into a string string using the function converttostring, and then use the function Clcreateprogramwithsource to load the program object. Call the function Clbuildprogram to compile the program object again. In fact, we can also directly call the binary kernel file, so that when you do not want to kernel file to others to see, play a certain role in secrecy. In this tutorial, we will store the read-in source file in a binary file, and a timer class will be created to record the time that the array addition executes on the CPU and GPU side respectively.
First we set up the project file GclTutorial2, where the class Gclfile is added, which is used primarily to read text kernel files, or to read and write binary kernel files.
Class Gclfile
{
Public
Gclfile (void);
~gclfile (void);
Open OpenCL Kernel source file (text mode)
BOOL Open (const char* fileName);
Read and write binary kernel files
BOOL Writebinarytofile (const char* fileName, const char* birary, size_t numbytes);
BOOL Readbinaryfromfile (const char* fileName);
...
}
The function codes for the three read-write kernel files in Gclfile are:
BOOL Gclfile::writebinarytofile (const char* fileName, const char* birary, size_t numbytes) {FILE *output = NULL;
Output = fopen (FileName, "WB");
if (output = = NULL) return false;
Fwrite (birary, sizeof (char), numbytes, output);
fclose (output);
return true;
} bool Gclfile::readbinaryfromfile (const char* fileName) {FILE * input = NULL;
size_t size = 0;
char* binary = NULL;
input = fopen (FileName, "RB");
if (input = = NULL) {return false;
} fseek (Input, 0L, seek_end);
Size = Ftell (input);
Point to file start position rewind (input);
binary = (char*) malloc (size);
if (binary = = NULL) {return false;
} fread (binary, sizeof (char), size, input);
Fclose (input);
Source_.assign (binary, size);
Free (binary);
return true;
} bool Gclfile::open (const char* fileName)//!< file name {size_t size;
char* str; Stream Open File Std::fstream f (fileName, (std::fstream:: In |
Std::fstream::binary));
Check if file flow is open if (F.is_open ()) {size_t sizefile;
Get file Size f.seekg (0, Std::fstream::end);
Size = Sizefile = (size_t) f.tellg ();
F.SEEKG (0, Std::fstream::beg);
str = new Char[size + 1];
if (!str) {f.close ();
return false;
}//Read file F.read (str, sizefile);
F.close ();
Str[size] = ' + ';
Source_ = str;
Delete[] STR;
return true;
} return false; }
Now, in main.cpp, we can use the open function of the Gclfile class to read into the kernel source file:
Kernel files are add.cl
Gclfile Kernelfile;
if (!kernelfile.open ("add.cl"))
{
printf ("Failed to load kernel file \ n");
Exit (0);
}
const char * source = Kernelfile.source (). C_STR ();
size_t sourcesize[] = {strlen (source)};
To create a program object
Cl_program program = Clcreateprogramwithsource (
Context
1,
&source,
Sourcesize,
NULL);
After compiling the kernel, we can use the following code, the compiled kernel stored in a binary file Addvec.bin, in Tutorial 4, we will be loaded directly into this binary kernel file.
Store compiled kernel file
char **binaries = (char * *) malloc (sizeof (char *) * 1);//Only one device
size_t *binarysizes = (size_t*) malloc (sizeof (size_t) * 1);
Status = Clgetprograminfo (program,
cl_program_binary_sizes,
sizeof (size_t) * 1,
binarysizes, NULL);
Binaries[0] = (char *) malloc (sizeof (char) * binarysizes[0]);
Status = Clgetprograminfo (program,
cl_program_binaries,
sizeof (char *) * 1,
BINARIES,
NULL);
Kernelfile.writebinarytofile ("Vecadd.bin", Binaries[0],binarysizes[0]);
We will also establish a timer class Gcltimer, used to statistical time, this class is mainly used QueryPerformanceFrequency to get the clock frequency, with QueryPerformanceCounter to get the passing ticks number, Finally get the elapsed time. The function is very simple,
Class Gcltimer
{
Public
Gcltimer (void);
~gcltimer (void);
Private
Double _freq;
Double _clocks;
Double _start;
Public
void Start (void); Start timer
void Stop (void); Stop Timer
void Reset (void); Reset Timer
Double GetElapsedTime (void); Calculate the elapsed time
};
Here we add the timer code when we perform array addition on the CPU side:
Gcltimer Cltimer;
Cltimer.reset ();
Cltimer.start ();
CPU calculates the BUF1,BUF2 and
for (i = 0; i < BUFSIZE; i++)
Buf[i] = Buf1[i] + buf2[i];
Cltimer.stop ();
printf ("CPU costs time:%.6f MS \ n", Cltimer.getelapsedtime () *1000);
Similarly, when the GPU executes the kernel code and the copy GPU results to the CPU, the timer code is added:
Execute kernel,range with 1 dimensions, work itmes size for bufsize,
cl_event ev;
size_t global_work_size = BUFSIZE;
Cltimer.reset ();
Cltimer.start ();
Clenqueuendrangekernel (queue,
kernel,
1,
null,
&global_work_size,
null, 0, NULL, &ev) ;
Status = Clflush (queue);
Waitforeventandrelease (&ev);
Clwaitforevents (1, &ev);
Cltimer.stop ();
printf ("kernal total time:%.6f ms \ n", Cltimer.getelapsedtime () *1000);
Data copy back to host memory
Cl_float *ptr;
Cltimer.reset ();
Cltimer.start ();
Cl_event mapevt;
PTR = (cl_float *) clenqueuemapbuffer (queue,
buffer,
cl_true,
Cl_map_read,
0,
BUFSIZE * sizeof (cl_float),
0, NULL, &MAPEVT, NULL);
Status = Clflush (queue);
Waitforeventandrelease (&MAPEVT);
Clwaitforevents (1, &mapevt);
Cltimer.stop ();
printf ("Copy from device to host:%.6f MS \ n", Cltimer.getelapsedtime () *1000);
The final program execution interface is as follows, when the bufsize is 262144, on my graphics card GPU also has CPU fast ..., in the program directory, we can see also produced the Vecadd.bin file.
For complete code, please refer to:
Project Document GclTutorial2
Code Download:
Http://files.cnblogs.com/mikewolf2002/gclTutorial.zip