Cuda and OpenCV combined Programming (i) __ programming

Source: Internet
Author: User
Tags assert error handling explode scalar cuda toolkit nvcc
Learning computer image processing algorithm of children's shoes, you have to learn Cuda, why. Because image processing is usually a matrix operation, it is very important to calculate the calculation time of millions at this time is essential. OPENCV itself provides a number of CUDA functions that meet the needs of most users. But not absolutely, sometimes we need to define a kernel function to optimize, of course, you can also use OpenGL or multithreading, OPENCV also provide better support, master one or more acceleration algorithms, for programmers, especially algorithmic engineers is very important.
Gossip not much said, and then learned the basis of CUDA "Cuda Parallel Programming Foundation (i), Cuda Parallel Programming Foundation (II)", we actually have with the OPENCV joint programming ability, although not the best optimization, but already can meet most of the requirements.
combination method of Cuda and OpenCV
(Only the Windows environment is covered below)
1. We know that Cuda code generally ends with. CU (Windows, except for other systems, the same below), its compiler is NVCC, compile it will CPU code and GPU code separate, CPU part actually and GCC compiled almost, GPU part according to the rules of NVCC compiled, this thing is not complicated;
2.openCV code is generally end with. cpp, its compiler is generally GCC, g++ (or other similar compiler), then can OPENCV code with NVCC compile it? The answer is yes, but in the Windows system, you have to change it to. cu end.
3. So, in the Windows system, you have two ways to get OPENCV to combine Cuda programming:
A.OPENCV Normal compilation, Cuda code compiled, as a static library into the OPENCV call;
B.OPENCV and Cuda code together, unified with NVCC compiled.
second, how to write code
(OpenCV based on 3.2.0 version below)
OpenCV is a very powerful visual algorithm library, of course, also support cuda slightly.
Cv::cuda is a namespace that deals specifically with Cuda, and you can see a lot of integrated functions in this namespace.
such as: Cuda::remap (), Cuda::add (), etc.
We're going to use the Cuda::P trstepsz<t> templates, and Cuda::gpumat
For example: If we have a cuda::gpumat type of IMG, how do we get into the cuda? The answer is, directly to the IMG to Cuda::P trstepsz inside, they are not equivalent, but you can transfer data, see examples. As for the Cuda::P trstepsz inside how to operate, it is similar to Cuda.
In addition to Cuda::P TRSTEPSZ,OPENCV There are other interfaces that can provide mutual transmission, to explore their own, here is not long-winded. As for the combination of CUDA and OPENCV programming efficiency problem. Ha, who used to know, you do not need to know, you are interested in their own to test it, anyway, the author is a wall crack recommended, after the time to speak of efficiency.
third, common mistakes
1.cudaErrorMemoryAllocation, the main application space is too large, beyond the GPU limit;
2.cudaErrorLaunchFailure, access to the illegal address, such as index exceeded the size of the array;
3.cuda and vs2015 combination programming, occasionally will appear cramp problem, such as you compile error, correct later compile also error, suggest to recompile, the previous compilation generated things all deleted, so that more insurance, the author met many times this situation;
4.<<<>>> kernel symbol error, to make sure it appears in the CU file instead of CPP file, the CU file will show red, do not care about it;
5. Static Library of the preparation of norms, the amount of their own Internet research bar, in fact, I write is not too normative, spit slot, online a lot of technical articles copied to copy is very boring, a lot of Daniel and write too advanced, research can not come out, also want to be able to at all levels have appropriate article introduction bar, so that the introduction and advanced also will not be too difficult.

The first program, directly in the CU file implementation Cuda and OPENCV combined programming, very important OH
Generally we do not use this, because CUDA as a stand-alone programming way, put together easily confusing, and in order to support high-speed operations, generally use C operations, rather than C + +
OPENCV_CUDA.CU: Use a custom function to implement CUDA version of the picture rollover//authored by ALPC40//version:visual Studio 2015\cuda Toolkit 8.0\opencv 3.2.0 # Include "Cuda_runtime.h" #include "device_launch_parameters.h" #include <stdio.h> #include <opencv2/
Opencv.hpp> #include <iostream> using namespace std;
using namespace CV; #ifdef _DEBUG #pragma comment (lib, "Opencv_core320d.lib") #pragma comment (lib, "Opencv_highgui320d.lib") #pragma commen T (Lib, "Opencv_calib3d320d.lib") #pragma comment (lib, "Opencv_imgcodecs320d.lib") #pragma comment (lib, "opencv_ Imgproc320d.lib ") #pragma comment (lib," Opencv_cudaimgproc320d.lib ") #pragma comment (lib," Opencv_cudaarithm320d.lib ") #pragma comment (lib," Cudart.lib ") #else #pragma comment (lib," Opencv_core320.lib ") #pragma comment (lib," Opencv_hig ") Hgui320.lib ") #pragma comment (lib," Opencv_calib3d320.lib ") #pragma comment (lib," Opencv_imgcodecs320.lib ") #pragma Comment (lib, "Opencv_imgproc320.lib") #pragma comment (lib, "Opencv_cudaimgproc320.lib") #pragma cOmment (Lib, "Opencv_cudaarithm320.lib") #pragma comment (lib, "Cudart.lib") #endif//Error handler function #define CHECK_ERROR (call) {\     Const cudaerror_t Err = call;\     if (err!= cudasuccess) \     {         printf ("error:%s,%d,", __file__,__line__); \          printf ("code:%d,reason:%s\n", err,cudageterrorstring (err)); \          exit (1); \    }\}//kernel function: Enable flip __global__ void Swap_image_kernel (cuda::P trstepsz<uchar3> CU_SRC, Cuda::P trstepsz<uchar3> cu_dst, int h, int w) {   //calculation method: see previous two     unsign
ed int x = blockdim.x * blockidx.x + threadidx.x;
    unsigned int y = blockdim.y * blockidx.y + threadidx.y;    //Why this limitation: see previous two     if (x < cu_src.cols && y < cu_src.rows)      {       Why not h-y-1, not h-y, think for yourself oh         cu_dst (y, x) = CU_SRC (H-y-1, x);    }//Call function, which mainly deals with the relationship between block and grid void Swap_image (Cuda::gpumat src,cuda::gpumat dst,int h, int w) {&NBSP;&N
bsp;  assert (Src.cols = = w && src.rows ==h);
    int uint = 32;
   //Refer to the block and grid calculation methods in the previous two texts, and be careful not to exceed the GPU limit     dim3 block (uint, uint);
    dim3 Grid (w + block.x-1)/block.x, (H + block.y-1)/block.y);
    printf ("Grid =%4d%4d%4d\n", grid.x,grid.y,grid.z);
    printf ("block=%4d%4d%4d\n", block.x,block.y,block.z);
    Swap_image_kernel << <grid, block >> > (src,dst,h,w);
   //sync, because the calculation may be very large     check_error (Cudadevicesynchronize ()); int main (int argc,char **argv) {    Mat src, dst;     Cuda::gpumat cu_src, CU_DST;  
   int H, W;     According to Argv[1] read into picture data, BGR format read in     src = imread (argv[1));    //Detect correct read in     if (src.data = NULL)     {    &
nbsp;   cout << "Read image Error" << Endl;
        return-1;    }     h = src.rows;
W = src.cols;
    cout << "picture High:" << h << ", Picture width:" << w << Endl;    //Uploading CPU image data to GPU, like Cudamalloc and cudamemcpy, in fact, upload is written in the     cu_src.upload (
SRC);    /Application for GPU space, you can also apply to the function, in any case always to apply, otherwise the kernel function will explode oh     CU_DST = Cuda::gpumat (H, W, CV_8UC3,
Scalar (0, 0, 0));
   /Application CPU space     DST = Mat (H, W, CV_8UC3, Scalar (0, 0, 0));
   //Call function Swap_image, the function calls the kernel function, so that the hierarchy, not easy to error    //Of course, you can also call the kernel function directly here, things too much code easy to mess
    swap_image (Cu_src,cu_dst,h, W);    //download GPU data to CPU, with upload () corresponds to     cu_dst.download (DST);
   //Display CPU image, if installed OPENCV integrated OpenGL, that can directly display Gpumat     imshow ("DST", DST);
   /Waiting button     waitkey ();    /write pictures to file     if (argc==3)         Imwrite (argv[
2],DST);
    return 0;
 }
The second program, the use of static library to achieve Cuda and OPENCV combination, very important OH
This method of two-phase separation, better implementation of this function//SWAP_IMAGE.CU: Generate swap_image.lib for the main function call
Authored by ALPC40//version:visual Studio 2015\cuda toolkit 8.0\opencv 3.2.0 #include "cuda_runtime.h" #include "device
_launch_parameters.h "#include <stdio.h> #include <opencv2/opencv.hpp> using namespace CV; Error handling function #define CHECK_ERROR (call) {\     const cudaerror_t Err = call;\     if (Err!= Cudas uccess) \     {\         printf ("error:%s,%d,", __file__,__line__) ; \         printf ("code:%d,reason:%s\n", err,cudageterrorstring (err)); \          exit (1); \    }\}//Kernel function: Flip __global__ void Swap_image_kernel
(Cuda::P trstepsz<uchar3> cu_src, Cuda::P trstepsz<uchar3> cu_dst, int h, int w) {   //calculation method: Refer to the previous two     unsigned int x = blockdim.x * blockidx.x + threadidx.x;  &nbsp
;  unsigned int y = blockdim.y * blockidx.y + threadidx.y;    //Why this limitation: see FrontTwo texts     if (x < cu_src.cols && y < cu_src.rows)     {    & nbsp;  //Why not h-y-1, not h-y, think for yourself oh         cu_dst (y, x) = CU_SRC (H-y-1, x)
;    }//Call function, mainly dealing with the relationship between block and grid, pay attention to extern oh, it is the library file writing specification extern "C" void Swap_image (Cuda::gpumat src, Cuda::  Gpumat DST, int h, int w) {    assert (Src.cols = = W && src.rows = h);     int UINT
= 32;
   //Refer to the block and grid calculation methods in the previous two texts, and be careful not to exceed the GPU limit     dim3 block (uint, uint);
    dim3 Grid (w + block.x-1)/block.x, (H + block.y-1)/block.y);
    printf ("Grid =%4d%4d%4d\n", Grid.x, Grid.y, grid.z);
    printf ("block=%4d%4d%4d\n", block.x, Block.y, block.z);
    Swap_image_kernel << <grid, block >> > (src, DST, H, W);    //sync, because the calculation may be very large     check_error (cudadeVicesynchronize ());
 }
 
Opencv_cuda.cpp: The second program main function, using a custom static library, to implement CUDA version of the picture rollover//authored by ALPC40//version:visual Studio 2015\cuda Toolkit 8.0\
OpenCV 3.2.0 #include <stdio.h> #include <opencv2/opencv.hpp> #include <iostream> using namespace std;
using namespace CV; #ifdef _DEBUG #pragma comment (lib, "Opencv_core320d.lib") #pragma comment (lib, "Opencv_highgui320d.lib") #pragma commen T (Lib, "Opencv_calib3d320d.lib") #pragma comment (lib, "Opencv_imgcodecs320d.lib") #pragma comment (lib, "opencv_ Imgproc320d.lib ") #pragma comment (lib," Opencv_cudaimgproc320d.lib ") #pragma comment (lib," Opencv_cudaarithm320d.lib ") #pragma comment (lib," Cudart.lib ") #pragma comment (lib," Swap_image.lib ")//Don't forget Gacou #else #pragma comment (lib," opencv_ Core320.lib ") #pragma comment (lib," Opencv_highgui320.lib ") #pragma comment (lib," Opencv_calib3d320.lib ") #pragma Comment (lib, "Opencv_imgcodecs320.lib") #pragma comment (lib, "Opencv_imgproc320.lib") #pragma comment (lib, "opencv_ Cudaimgproc320.lib ") #pragma comment(Lib, "Opencv_cudaarithm320.lib") #pragma comment (lib, "Cudart.lib") #pragma comment (lib, "Swap_image.lib")//Don't forget Gacou
endif//This declaration is important to invoke the static library extern "C" void Swap_image (Cuda::gpumat src,cuda::gpumat dst,int w,int h); int main (int argc, char **argv) {    Mat src, dst;     Cuda::gpumat cu_src, CU_DST;  &
nbsp;  int H, W;
   //According to ARGV[1] read image data, BGR format read in     src = imread (argv[1));    //Detect correct read in     if (src.data = NULL)     {    &
nbsp;   cout << "Read image Error" << Endl;
        return-1;    }     h = src.rows;
W = src.cols;
    cout << "picture High:" << h << ", Picture width:" << w << Endl;    //Uploading CPU image data to GPU, like Cudamalloc and cudamemcpy, in fact, upload is written in the     cu_src.upload (
SRC);    /Application for GPU space, can also go to the function of ShenPlease, either always apply, or the kernel function will explode oh     CU_DST = Cuda::gpumat (H, W, CV_8UC3, Scalar (0, 0, 0));
   /Application CPU space     DST = Mat (H, W, CV_8UC3, Scalar (0, 0, 0));
   //Call function Swap_image, the function calls the kernel function, so that the hierarchy, not easy to error    //Of course, you can also call the kernel function directly here, things too much code easy to mess
    swap_image (CU_SRC, Cu_dst, H, W);
   //download GPU data to CPU, corresponding to upload ()     cu_dst.download (DST);
   //Display CPU image, if installed OPENCV integrated OpenGL, that can directly display Gpumat     imshow ("DST", DST);
   /Waiting button     waitkey ();    //write picture to File     if (argc = 3)         Imwrite (argv[
2], DST);
    return 0; }
PS: The following for the picture effect, by the way for my home gannan navel orange to play an ad


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.