This article describes how the CPP file calls the Cuda. Cu file to implement programming related to video card acceleration. Of course, this is done when Cuda has been configured by default. If you have any questions about how to configure cuda, please refer to this article. In addition, Cuda has now released version 6.5 that supports vs2013, so we recommend that you use the latest version. After all, vs2013 is too easy to use and there is no difference in configuration. The configuration article does not solve the problem of occasional error prompts related to Cuda functions. Although it does not affect compilation, it is still quite tangled for people with obsessive-compulsive disorder, I will update my research later. Hope you can.
There are two methods to solve the problem of how to use the CPP file to call the Cuda. Cu file to implement programming related to video card acceleration. This article describes how to create a Cuda project based on the vs2013 template (which can be seen after Cuda 6.5 is installed) and then add the CPP file. In addition, adding a. Cu file to the MFC or Win32 project and calling it again is essentially similar and will be troublesome. I will have time to update it later.
Before the topic starts, let's talk about how to call Cuda to accelerate the video card. In fact, the big direction is very simple. The process is roughly as follows:
Initialize the graphics card memory-> copy the memory data to be processed by the host to the graphics card memory-> use the graphics card to process related data-> copy the processed graphics card memory data back to the host memory
OK. Enter the topic
First, create a Cuda project. After the project is created, A. Cu file is generated to replace the file content with the following content:
# include # include "Main. H" inline void checkcudaerrors (cudaerror ERR) // error handling function {If (cudasuccess! = ERR) {fprintf (stderr, "Cuda Runtime API error: % S. \ n ", cudageterrorstring (ERR); Return ;}__ global _ void add (int * a, int * B, int * C) // process the kernel function {int tid = blockidx. x * blockdim. X + threadidx. x; For (size_t K = 0; k <50000; k ++) {C [TID] = A [TID] + B [TID];} extern "C" int runtest (int * host_a, int * host_ B, int * host_c) {int * dev_a, * dev_ B, * dev_c; checkcudaerrors (cudamalloc (void **) & dev_a, sizeof (INT) * datasize); // allocate the graphics card memory checkcudaerrors (cudamalloc (void **) & dev_ B, sizeof (INT) * datasize )); checkcudaerrors (cudamalloc (void **) & dev_c, sizeof (INT) * datasize); checkcudaerrors (cudamemcpy (dev_a, host_a, sizeof (INT) * datasize, summary )); // copy the memory block of the host's waiting for processing data to the checkcudaerrors (cudamemcpy (dev_ B, host_ B, sizeof (INT) * datasize, cudamemcpyhosttodevice) in the graphics card memory )); add (dev_a, dev_ B, dev_c); // call the checkcudaerrors (cudamemcpy (host_c, dev_c, sizeof (INT) * datasize, cudamemcpydevicetohost); // copy the processed data of the video card back to cudafree (dev_a); // clear the GPU memory cudafree (dev_ B); cudafree (dev_c); Return 0 ;}
Add the main. h file to the project and add the following content:
# Include <time. h> // time-related header file, in which the function compute image processing speed # include <iostream> # define datasize 50000
Next we will add the main implementation file CPP, which will call the Cuda. Cu file in CPP. The content is as follows:
# Include "Main. H "extern" C "int runtest (int * host_a, int * host_ B, int * host_c); // the video card processing function int main () {int A [datasize], B [datasize], C [datasize]; for (size_t I = 0; I <datasize; I ++) {A [I] = I; B [I] = I * I;} Long now1 = clock (); // the start time of Image Processing: runtest (A, B, C ); // call the video card to accelerate printf ("GPU running time: % DMS \ n", INT (double) (clock ()-now1)/clocks_per_sec * 1000 )); // output GPU processing time long now2 = clock (); // storage image processing start time for (size_t I = 0; I <datasize; I ++) {for (size_t K = 0; k <50000; k ++) {C [I] = (a [I] + B [I]);} printf ("CPU running time: % DMS \ n", INT (double) (clock ()-now2)/clocks_per_sec * 1000 )); // output GPU processing time/* For (size_t I = 0; I <100; I ++) // view the calculation result {printf ("% d + % d = % d \ n", a [I], B [I], C [I]);} */getchar (); Return 0 ;}
Note that the declaration of extern "C" must be added to the Cuda function to be called, declare in the CPP file (extern "C" int runtest (int * host_a, int * host_ B, int * host_c);) before calling.
The first part of this article has been completed. The compilation and running shows that the GPU is indeed much faster than the CPU when processing complex parallel computing. Another method mentioned above will be discussed next time. The holiday is about to end, amount...
The CPP file calls the Cuda. Cu file to implement programming related to video card acceleration.