"OpenCV & CUDA" OpenCV and CUDA combined programming

Last Update:2018-07-18 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

One, using the GPU module provided in the OPENCV

At present, many GPU functions have been provided in OpenCV, and the GPU modules provided by OPENCV can be used to accelerate most image processing.

Basic use method, please refer to: http://www.cnblogs.com/dwdxdy/p/3244508.html

The advantage of this method is simple, using Gpumat to manage the data transfer between CPU and GPU, and does not need to pay attention to the setting of kernel function call parameter, only need to pay attention to the logic operation of the process.

The disadvantage is limited to the development and update of the OpenCV library, when the need to complete some custom operations (OpenCV did not provide the corresponding library), difficult to meet the requirements of the application, you need to implement the parallel implementation of custom actions. In addition, for some special needs, OPENCV provides parallel processing functions, its performance optimization is not optimal, in the specific application, it may need further optimization, improve performance.

Second, the use of Cuda API programming alone

With the Cuda Runtime API, Cuda Driver API to achieve some parallel acceleration of operations, the use of the process needs to manage the CPU and GPU data transmission between the parameters of kernel function calls, kernel function optimization.

The advantage is that the processing process is controlled by the user, and the user can implement more parallel acceleration processing operations.

The disadvantage is that the use of complex, more code writing, you need to be familiar with CUDA related data and API interface. The following is a simple sample program:

__global__ void Swap_rb_kernel (const uchar3* src,uchar3* dst,int width,int height) {int x = threadidx.x + blockidx.x
    * BLOCKDIM.X;
    
    int y = threadidx.x + blockidx.y * BLOCKDIM.Y;
        if (x < width && y < height) {Uchar3 v = src[y * width + x];
        Dst[y * width + x].x = v.z;
        Dst[y * width + x].y = v.y;
    Dst[y * width + x].z = v.x;
    } void Swap_rb_caller (const uchar3* src,uchar3* dst,int width,int height) {dim3 block (32,8);
    
    Dim3 Grid ((width + block.x-1)/block.x, (height + block.y-1)/block.y);
    Swap_rb_kernel<<<grid,block,0>>> (Src,dst,width,height);
Cudathreadsynchronize ();
    int main () {Mat image = Imread ("lena.jpg");
    
    Imshow ("src", image);
    size_t memsize = image.cols*image.rows*sizeof (UCHAR3);
    uchar3* d_src = NULL;
    uchar3* d_dst = NULL;
    Cuda_safe_call (Cudamalloc (void**) &d_src,memsize));
  Cuda_safe_call (Cudamalloc (void**) &d_dst,memsize));  Cuda_safe_call (Cudamempcy (D_src,image.data,memsize,cudamemcpyhosttodevice));
    
    Swap_rb_caller (d_src,d_dst,image.cols,image.rows);
    Cuda_safe_call (Cudamempcy (image.data,d_dst,memsize,cudamemcpydevicetohost));
    Imshow ("GPU", image);
    
    Waitkey (0);
    Cuda_safe_call (Cudafree (D_SRC));
    Cuda_safe_call (Cudafree (D_DST));
return 0; }

In the above code, you use Cudamalloc,cudamemcpy,cudafree to manage the allocation, transfer, and release of memory.

Note: If the image.data contains byte-aligned blank data, the above program cannot complete the normal processing operation.

Third, the use of OPENCV to provide interface, and combined with CUDA API programming

With some of the interfaces already provided by OPENCV, the basic processing of some CUDA programming is accomplished, and the complexity of programming is simplified; the kernel functions that are provided by the kernel function or extension OpenCV are simply customized to their own business needs. This can not only make full use of the characteristics of OPENCV, but also can meet the different needs of the business, easy to use, and easy to expand. The following is a simple sample program:

//swap_rb.cu #include <opencv2/core/cuda_devptrs.hpp> using namespace CV; using
namespace Cv::gpu; Custom kernel functions __global__ void Swap_rb_kernel (const ptrstepsz<uchar3> src,ptrstep<uchar3> DST) {int x = thread
    idx.x + blockidx.x * blockdim.x;

    int y = threadidx.y + blockidx.y * BLOCKDIM.Y;
        if (x < src.cols && y < src.rows) {Uchar3 v = src (y,x);
    DST (y,x) = Make_uchar3 (v.z,v.y,v.x); } void Swap_rb_caller (const ptrstepsz<uchar3>& src,ptrstep<uchar3> dst,cudastream_t stream) {dim3
    Block (32,8);

    Dim3 Grid ((Src.cols + block.x-1)/block.x, (src.rows + block.y-1)/block.y);
    Swap_rb_kernel<<<grid,block,0,stream>>> (SRC,DST);
if (stream = = 0) cudadevicesynchronize (); }

//swap_rb.cpp #include <opencv2/gpu/gpu.hpp> #include <opencv2/gpu/stream_
Accessor.hpp> using namespace CV;

using namespace Cv::gpu;

void Swap_rb_caller (const ptrstepsz<uchar3>& src,ptrstep<uchar3> dst,cudastream_t stream); void Swap_rb (const gpumat& src,gpumat& dst,stream& Stream = Stream::null ()) {Cv_assert (src.type () = = CV_8
    UC3);
    Dst.create (Src.size (), Src.type ());
    cudastream_t s = streamaccessor::getstream (stream);
Swap_rb_caller (src,dst,s); }

Main.cpp
#include <iostream>
#include <opencv2/opencv.hpp>
#include <opencv2/gpu/ Gpu.hpp>
using namespace CV;
using namespace Cv::gpu;
void Swap_rb (const gpumat& src,gpumat& dst,stream& Stream = Stream::null ());
int main ()
{
    Mat image = Imread ("lena.jpg");
    Imshow ("src", image);
    Gpumat Gpumat,output;

    Gpumat.upload (image);
    SWAP_RB (gpumat,output);
    Output.download (image);

    Imshow ("GPU", image);
    Waitkey (0);
    return 0;
}

The Swap_rb.cu file defines the call function for kernel functions and kernel functions, and in the calling function, sets the call parameters of the kernel function.

The Swap_rb.cpp file defines the entry function of the parallel operation, that is, the function that the main program needs to call to complete the parallel operation, which is mainly the function of encapsulating kernel function, and the validation of input parameters and the selection of different kernel functions according to the input parameters.

Main.cpp file main program, the completion of data input, business processing and data output.

Summarize

Programming simplicity and controllability is relative, the more convenient programming, the more difficult to control. In the practical application process, we should seek the balance point of simplicity and controllability of programming, and should choose the appropriate method according to the application demand, and generally recommend adopting method three.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

"OpenCV & CUDA" OpenCV and CUDA combined programming

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

"OpenCV & CUDA" OpenCV and CUDA combined programming

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support