1. precautions
to compile the method see:
http://blog.csdn.net/wangyaninglm/article/details/39997113
The following is the program code, online search examples:
Note: 32-bit projects add 64-bit support (mainly depending on the version you compiled), and the project path of the CUDA is configured to include
2. Code
//SWAP.CU # Include "Cuda_runtime.h" #include "device_launch_parameters.h" #include <opencv2/core/cuda_devptrs.hpp> using
namespace CV;
using namespace Cv::gpu; Custom kernel function __global__ void Swap_rb_kernel (const ptrstepsz<uchar3> src,ptrstep<uchar3> DST) {int x = thread
idx.x + blockidx.x * blockdim.x;
int y = threadidx.y + blockidx.y * BLOCKDIM.Y;
if (x < src.cols && y < src.rows) {Uchar3 v = src (y,x);
DST (y,x) = Make_uchar3 (v.z,v.y,v.x); }} extern "C" void Swap_rb_caller (const ptrstepsz<uchar3>& src,ptrstep<uchar3> dst,cudastream_t Stream
) {dim3 block (32,8);
Dim3 Grid ((Src.cols + block.x-1)/block.x, (src.rows + block.y-1)/block.y);
Swap_rb_kernel<<<grid,block,0,stream>>> (SRC,DST);
if (stream = = 0) cudadevicesynchronize (); }
Swap.cpp
#include <opencv2/gpu/gpu.hpp>
#include <opencv2/gpu/stream_accessor.hpp>
using namespace CV;
using namespace Cv::gpu;
extern "C" void Swap_rb_caller (const ptrstepsz<uchar3>& src,ptrstep<uchar3> dst,cudastream_t stream);
extern "C" void Swap_rb (const gpumat& src,gpumat& dst,stream& Stream = Stream::null ())
{
Cv_assert (src.type () = = CV_8UC3);
Dst.create (Src.size (), Src.type ());
cudastream_t s = streamaccessor::getstream (stream);
Swap_rb_caller (src,dst,s);
}
Main.cpp
#include <iostream>
#include <opencv2/opencv.hpp>
#include <opencv2/gpu/ gpu.hpp>
#pragma comment (lib, "Opencv_gpu2410d.lib")
#pragma comment (lib, "Opencv_core2410d.lib")
# pragma comment (lib, "Opencv_highgui2410d.lib")
using namespace CV;
using namespace Cv::gpu;
extern "C" void Swap_rb (const gpumat& src,gpumat& dst,stream& Stream = Stream::null ());
int main ()
{
Mat image = Imread ("lena.jpg");
Imshow ("src", image);
Gpumat Gpumat,output;
Gpumat.upload (image);
SWAP_RB (gpumat,output);
Output.download (image);
Imshow ("GPU", image);
GetChar ();
Waitkey (0);
return 0;
}
3. Achieve the effect:
4. Other Precautions
Suppose there are two projects: Cuda Engineering testcuda;c++ Engineering Callcuda
1. In the Cuda engineering Testcuda,
(1). cpp file (class member function definition) calling a function under a. cu file
For example, a function void Run_kernel () under a. cu file; The front must be decorated with extern "C".
The class member functions under the. cpp file (defined by the class member function), such as void Cpp_run ();
If it wants to call Run_kernel (), the C function under the. cu file can first be declared outside the class definition in the. h file (class definition), for example, extern "C" void Run_kernel ();
(2) Cuda Project Properties--general, choose a configuration type of "Static library (. lib)"--application;
At the same time, under Project Properties, in the Library Manager--Additional dependencies under the general item, add Cuda Library: Cudart.lib,curand.lib, etc; Add the appropriate library directory in the additional library directory.
2. Additional C + + engineering Callcuda
Under Callcuda Engineering Properties, find additional dependencies, add: Cuda libraries (cudart.lib, etc.) and Testcuda generated static libraries (TestCuda.lib), and add additional library directories.
At this point, the function under the. cpp file under the project can invoke the Cpp_run () function under the Cuda project, but first instantiate the class.
1. Add the example.cu to the project. Right-click on an existing project and select Add Existing Item.
2. Add a compilation rule. Right-click the project file, select Custom Build Rule, and select Cuda build Rule x.x in the popup dialog box.
3. Modify the compiler for the. cu file. Right-click the. cu file, click Properties, modify the compilation rule, and select the Cuda compiler that you just added.
4. Add the Include directory. Add the Cuda SDK directory to the Include directory in the project Properties-"C++-> General". For example "C:\Program files\nvidia corporation\nvidia GPU Computing SDK 3.2\c\common\inc"; C:\Program files\nvidia GPU Computing toolkit\cuda\v4.0\include "
5. Add the. lib file. Add Cudart.lib cutil32D.lib in linker-"input"
6. Modify the code to become multithreaded (/MT) mode.
7.Done. The above is the engineering configuration.
In addition, C + + functions that call Cuda code are included in the. cu file with extern "C". And the function is declared in the call file. cpp with extern "C" and then called.