"OpenCV & CUDA" OpenCV and CUDA combined programming

Source: Internet
Author: User

One, using the GPU module provided in the OPENCV

At present, many GPU functions have been provided in OpenCV, and the GPU modules provided by OPENCV can be used to accelerate most image processing.

Basic use method, please refer to: http://www.cnblogs.com/dwdxdy/p/3244508.html

The advantage of this method is simple, using Gpumat to manage the data transfer between CPU and GPU, and does not need to pay attention to the setting of kernel function call parameter, only need to pay attention to the logic operation of the process.

The disadvantage is limited to the development and update of the OpenCV library, when the need to complete some custom operations (OpenCV did not provide the corresponding library), difficult to meet the requirements of the application, you need to implement the parallel implementation of custom actions. In addition, for some special needs, OPENCV provides parallel processing functions, its performance optimization is not optimal, in the specific application, it may need further optimization, improve performance.

Second, the use of Cuda API programming alone

With the Cuda Runtime API, Cuda Driver API to achieve some parallel acceleration of operations, the use of the process needs to manage the CPU and GPU data transmission between the parameters of kernel function calls, kernel function optimization.

The advantage is that the processing process is controlled by the user, and the user can implement more parallel acceleration processing operations.

The disadvantage is that the use of complex, more code writing, you need to be familiar with CUDA related data and API interface. The following is a simple sample program:

__global__ void Swap_rb_kernel (const uchar3* src,uchar3* dst,int width,int height) {int x = threadidx.x + blockidx.x
    * BLOCKDIM.X;
    
    int y = threadidx.x + blockidx.y * BLOCKDIM.Y;
        if (x < width && y < height) {Uchar3 v = src[y * width + x];
        Dst[y * width + x].x = v.z;
        Dst[y * width + x].y = v.y;
    Dst[y * width + x].z = v.x;
    } void Swap_rb_caller (const uchar3* src,uchar3* dst,int width,int height) {dim3 block (32,8);
    
    Dim3 Grid ((width + block.x-1)/block.x, (height + block.y-1)/block.y);
    Swap_rb_kernel<<<grid,block,0>>> (Src,dst,width,height);
Cudathreadsynchronize ();
    int main () {Mat image = Imread ("lena.jpg");
    
    Imshow ("src", image);
    size_t memsize = image.cols*image.rows*sizeof (UCHAR3);
    uchar3* d_src = NULL;
    uchar3* d_dst = NULL;
    Cuda_safe_call (Cudamalloc (void**) &d_src,memsize));
  Cuda_safe_call (Cudamalloc (void**) &d_dst,memsize));  Cuda_safe_call (Cudamempcy (D_src,image.data,memsize,cudamemcpyhosttodevice));
    
    Swap_rb_caller (d_src,d_dst,image.cols,image.rows);
    Cuda_safe_call (Cudamempcy (image.data,d_dst,memsize,cudamemcpydevicetohost));
    Imshow ("GPU", image);
    
    Waitkey (0);
    Cuda_safe_call (Cudafree (D_SRC));
    Cuda_safe_call (Cudafree (D_DST));
return 0; }

In the above code, you use Cudamalloc,cudamemcpy,cudafree to manage the allocation, transfer, and release of memory.

Note: If the image.data contains byte-aligned blank data, the above program cannot complete the normal processing operation.

Third, the use of OPENCV to provide interface, and combined with CUDA API programming

With some of the interfaces already provided by OPENCV, the basic processing of some CUDA programming is accomplished, and the complexity of programming is simplified; the kernel functions that are provided by the kernel function or extension OpenCV are simply customized to their own business needs. This can not only make full use of the characteristics of OPENCV, but also can meet the different needs of the business, easy to use, and easy to expand. The following is a simple sample program:

//swap_rb.cu #include <opencv2/core/cuda_devptrs.hpp> using namespace CV; using
namespace Cv::gpu; Custom kernel functions __global__ void Swap_rb_kernel (const ptrstepsz<uchar3> src,ptrstep<uchar3> DST) {int x = thread
    idx.x + blockidx.x * blockdim.x;

    int y = threadidx.y + blockidx.y * BLOCKDIM.Y;
        if (x < src.cols && y < src.rows) {Uchar3 v = src (y,x);
    DST (y,x) = Make_uchar3 (v.z,v.y,v.x); } void Swap_rb_caller (const ptrstepsz<uchar3>& src,ptrstep<uchar3> dst,cudastream_t stream) {dim3
    Block (32,8);

    Dim3 Grid ((Src.cols + block.x-1)/block.x, (src.rows + block.y-1)/block.y);
    Swap_rb_kernel<<<grid,block,0,stream>>> (SRC,DST);
if (stream = = 0) cudadevicesynchronize (); }
//swap_rb.cpp #include <opencv2/gpu/gpu.hpp> #include <opencv2/gpu/stream_
Accessor.hpp> using namespace CV;

using namespace Cv::gpu;

void Swap_rb_caller (const ptrstepsz<uchar3>& src,ptrstep<uchar3> dst,cudastream_t stream); void Swap_rb (const gpumat& src,gpumat& dst,stream& Stream = Stream::null ()) {Cv_assert (src.type () = = CV_8
    UC3);
    Dst.create (Src.size (), Src.type ());
    cudastream_t s = streamaccessor::getstream (stream);
Swap_rb_caller (src,dst,s); }
Main.cpp
#include <iostream>
#include <opencv2/opencv.hpp>
#include <opencv2/gpu/ Gpu.hpp>
using namespace CV;
using namespace Cv::gpu;
void Swap_rb (const gpumat& src,gpumat& dst,stream& Stream = Stream::null ());
int main ()
{
    Mat image = Imread ("lena.jpg");
    Imshow ("src", image);
    Gpumat Gpumat,output;

    Gpumat.upload (image);
    SWAP_RB (gpumat,output);
    Output.download (image);

    Imshow ("GPU", image);
    Waitkey (0);
    return 0;
}

The Swap_rb.cu file defines the call function for kernel functions and kernel functions, and in the calling function, sets the call parameters of the kernel function.

The Swap_rb.cpp file defines the entry function of the parallel operation, that is, the function that the main program needs to call to complete the parallel operation, which is mainly the function of encapsulating kernel function, and the validation of input parameters and the selection of different kernel functions according to the input parameters.

Main.cpp file main program, the completion of data input, business processing and data output.

Summarize

Programming simplicity and controllability is relative, the more convenient programming, the more difficult to control. In the practical application process, we should seek the balance point of simplicity and controllability of programming, and should choose the appropriate method according to the application demand, and generally recommend adopting method three.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.