Transferred from: http://m.blog.csdn.net/blog/oHanTanYanYing/39855829
This article is about how CPP files call Cuda. cu files for graphics acceleration related programming. Of course, this is done in the case where Cuda is already configured by default, and if you have questions about how to configure Cuda, you can read this article before. In addition, now Cuda has released the support VS2013 version 6.5, so it is recommended to use the latest, after all, VS2013 easy to work with too much, configuration is no different. About the configuration article, and did not solve the cuda correlation function occasionally error hints, although for the compilation has no effect, but for the people with obsessive-compulsive disorder is more tangled, I will update after the study, hope well known.
There are two ways to implement graphics acceleration-related programming issues with how to call Cuda. cu files via CPP files. This article first discusses this method of creating a Cuda project based on the VS2013 template (as you can see after installing version 6.5 Cuda) and then adding the CPP file. As for the additional in MFC or Win32 engineering, such as adding. cu files to call this is essentially the same, it will be more troublesome, I have time to update later.
Before the topic begins, let's say how to call Cuda for graphics acceleration, in fact the big direction is very simple. The process is generally as follows:
Initialize video card memory, copy the host's pending memory data into the video card memory, copying the processed graphics memory data back to the host memory using the video card processing related data
OK, enter the topic below
First you create a Cuda project, and after the project is created, a. cu file is replaced with the contents of the file as follows
1#include"cuda_runtime.h"2#include"Device_launch_parameters.h"3#include"Main.h"4 5InlinevoidCheckcudaerrors (Cudaerror Err)//Error Handling functions6 {7 if(Cudasuccess! =err)8 {9fprintf (stderr,"CUDA Runtime API error:%s.\n", cudageterrorstring (err));Ten return; One } A } - -__global__voidAddint*a,int*b,int*C)//Processing kernel functions the { - intTid = blockidx.x*blockdim.x+threadidx.x; - for(size_t k =0; K <50000; k++) - { +C[tid] = A[tid] +B[tid]; - } + } A at extern "C" intRuntest (int*host_a,int*host_b,int*Host_c) - { - int*dev_a, *dev_b, *Dev_c; - -Checkcudaerrors (Cudamalloc (void* *) &dev_a,sizeof(int) * datasize));//allocating video card memory -Checkcudaerrors (Cudamalloc (void* *) &dev_b,sizeof(int)*datasize)); inCheckcudaerrors (Cudamalloc (void* *) &dev_c,sizeof(int)*datasize)); - toCheckcudaerrors (cudamemcpy (dev_a, Host_a,sizeof(int) * DataSize, Cudamemcpyhosttodevice));//Copy the host pending data memory block into the video card memory +Checkcudaerrors (cudamemcpy (Dev_b, Host_b,sizeof(int)*datasize, Cudamemcpyhosttodevice)); - theAdd << <datasize/ -, ->> > (dev_a, Dev_b, Dev_c);//calling the graphics card to process data *Checkcudaerrors (cudamemcpy (Host_c, Dev_c,sizeof(int) * DataSize, cudamemcpydevicetohost));//Copy the video card after processing the data back $ Panax NotoginsengCudafree (dev_a);//clean up the video card memory - Cudafree (dev_b); the Cudafree (dev_c); + return 0; A}
Then add the Main.h file to the project and add the following:
1 #include <time.h>// time-related header files, where functions can be used to calculate image processing Speed 2 #include <iostream >3#define datasize 50000
The following is the implementation file of Main, CPP, implemented in CPP for Cuda. cu file calls. The contents are as follows
#include"Main.h"extern "C" intRuntest (int*host_a,int*host_b,int*host_c);//graphics card processing functionsintMain () {intA[datasize], b[datasize], c[datasize]; for(size_t i =0; i < datasize; i++) {A[i]=i; B[i]= i*i; } LongNow1 = Clock ();//storage image processing start timeRuntest (A,B,C);//Calling graphics accelerationprintf"GPU Run time:%dms\n",int(((Double) (Clock ()-NOW1)/clocks_per_sec * +));//output GPU Processing time LongNow2 = Clock ();//storage image processing start time for(size_t i =0; i < datasize; i++) { for(size_t k =0; K <50000; k++) {C[i]= (A[i] +B[i]); }} printf ("CPU Run time:%dms\n",int(((Double) (Clock ()-now2)/clocks_per_sec * +));//output GPU Processing time /*for (size_t i = 0; i <; i++)//View calculation result {printf ("%d+%d=%d\n", A[i], b[i], c[i]); }*/GetChar (); return 0;}
It should be noted that in the Cuda function used to be called to add the extern "C" declaration, and in the CPP file declaration (extern "C" int runtest (int *host_a, int *host_b, int *host_c), and then called.
By the end of the first part of this article, the compiler runs to see that the GPU is actually much faster than the CPU when it comes to processing complex parallel computations. On the other way mentioned before the next time, the holiday is over, the amount ...
Well, from the above article has been completed half a year long, to pits, another method of the blog address here.
"Blogger" Note: I tried, my situation is available: visual studio2010 + Cuda 6.0
"Reprint" CPP file call Cuda. cu file for graphics acceleration related programming