Opencv 2.3 GPU APIs (1): Data Structures

Source: Internet
Author: User

Translation at will, some deletions, just to familiarize yourself with the basic interface (in fact, I didn't translate a few words)

GPU: devmem2d _

ClassGPU: devmem2d _

A lightweight class is used to represent the alignment data allocated on the global memory of the GPU. Applicable to users who need to write Cuda code by themselves

template <typename T> struct DevMem2D_{    int cols;    int rows;    T* data;    size_t step;    DevMem2D_() : cols(0), rows(0), data(0), step(0){};    DevMem2D_(int rows, int cols, T *data, size_t step);    template <typename U>    explicit DevMem2D_(const DevMem2D_<U>& d);    typedef T elem_type;    enum { elem_size = sizeof(elem_type) };    __CV_GPU_HOST_DEVICE__ size_t elemSize() const;    /* returns pointer to the beginning of the given image row */    __CV_GPU_HOST_DEVICE__ T* ptr(int y = 0);    __CV_GPU_HOST_DEVICE__ const T* ptr(int y = 0) const;};typedef DevMem2D_<unsigned char> DevMem2D;typedef DevMem2D_<float> DevMem2Df;typedef DevMem2D_<int> DevMem2Di;

GPU: ptrstep _ ClassGPU: ptrstep _

Similar to GPU: devmem2d. Because of the performance, this class only retains pointers and row steps, but does not keep the length and height.

template<typename T> struct PtrStep_{        T* data;        size_t step;        PtrStep_();        PtrStep_(const DevMem2D_<T>& mem);        typedef T elem_type;        enum { elem_size = sizeof(elem_type) };        __CV_GPU_HOST_DEVICE__ size_t elemSize() const;        __CV_GPU_HOST_DEVICE__ T* ptr(int y = 0);        __CV_GPU_HOST_DEVICE__ const T* ptr(int y = 0) const;};typedef PtrStep_<unsigned char> PtrStep;typedef PtrStep_<float> PtrStepf;typedef PtrStep_<int> PtrStepi;

GPU: ptrelemstep _ ClassGPU: ptrelemstep _ inherited from GPU: ptrstep _ similar to GPU: devmem2d _, but only contains pointers and row steps, excluding length and height. This class is only used when sizeof (t) is a multiple of 256. (I have never seen how constructor is implemented. To be honest, I don't quite understand the meaning of this class. My personal understanding is the compilation Optimization in Cuda)
template<typename T> struct PtrElemStep_ : public PtrStep_<T>{        PtrElemStep_(const DevMem2D_<T>& mem);        __CV_GPU_HOST_DEVICE__ T* ptr(int y = 0);        __CV_GPU_HOST_DEVICE__ const T* ptr(int y = 0) const;};
GPU: gpumatclass GPU: gpumat

  1. The basic storage class on GPU memory uses the reference counting technology.
  2. Restrictions:
  3. Only 2D supported
  4. No function to return the data address (because the return is useless, the CPU cannot directly operate)
  5. Expression templates technology is not supported, which may lead to a lot of memory allocation.
  6. Gpumat can be converted to devmem2d _ and ptrstep _, which can be directly used in the kernel function.
Comparison with Mat, gpumat: iscontinuous () = false, this is to ensure data alignment (Cuda merge access requirements) The gpumat of the PS row is continuous
Class cv_exports gpumat {public ://! Default constructor gpumat (); gpumat (INT rows, int cols, int type); gpumat (size, int type );.....//! Builds gpumat from mat. Blocks uploading to device. Explicit gpumat (const mat & M );//! Returns lightweight devmem2d _ structure for passing //! Template <class T> operator devmem2d _ <t> () const; Template <class T> operator ptrstep _ <t> () const ;//! Transmit data from the host to the GPU device. Void upload (const CV: mat & M); void upload (const cudamem & M, stream & stream );//! Transmit data from the GPU to host. Blocking CILS. Void download (CV: mat & M) const ;//! Download async void download (cudamem & M, stream & stream) const ;};
Note: the reason for the global or static gpumat is not explained. GPU: createcontinuous creates a continuous matrix in the GPU memory.

C++: void gpu::createContinuous(int rows, int cols, int type, GpuMat& m)C++: GpuMat gpu::createContinuous(int rows, int cols, int type)C++: void gpu::createContinuous(Size size, int type, GpuMat& m)C++: GpuMat gpu::createContinuous(Size size, int type)

Parameters:
Rows-row count.
Cols-column count.
Type-type of the matrix.
M-Destination matrix. This parameter changes only if it has a proper type and area ().

GPU: ensuresizeisenough

Make sure that the data space of the matrix is large enough. If not, allocate the space again. Otherwise, do not operate.

C ++: void GPU: ensuresizeisenough (INT rows, int cols, int type, gpumat & M)
C ++: void GPU: ensuresizeisenough (size, int type, gpumat & M)
Parameters:
Rows-Minimum desired number of rows.
Cols-Minimum desired number of columns.
Size-rows and coumns passed as a structure.
Type-desired matrix type.
M-Destination matrix.

GPU: registerpagelocked

Register the data to the page-locked type (for the mat on the host, this can accelerate data transmission, but the distribution of excessive page-locked memory will affect the system performance. For details, refer to the Cuda documentation)

C ++: void GPU: registerpagelocked (MAT & M)
Parameters:
M-Input matrix.
GPU: unregisterpagelocked

Cancel registration and set the data to pageable

C ++: void GPU: unregisterpagelocked (MAT & M)
Parameters:
M-Input matrix.
GPU: cudamem

ClassGPU: cudamem

This class uses technical reference technology and Cuda memory allocation function to allocate space (host ). This interface is similar to mat (), but has additional memory type parameters.

Alloc_page_locked sets a page locked memory type used commonly for fast and asynchronous uploading/downloading data from/to GPU.
Alloc_zerocopy specifies a zero copy memory allocation that enables mapping the host memory to GPU address space, if supported.

Alloc_write_combined sets the write combined buffer that is not cached by CPU. Such buffers are used to supply GPU with data when GPU only reads it. The advantage is a better CPU cache utilization.

For more information about the preceding parameters, see the Cuda documentation.

class CV_EXPORTS CudaMem{public:        enum  { ALLOC_PAGE_LOCKED = 1, ALLOC_ZEROCOPY = 2,                 ALLOC_WRITE_COMBINED = 4 };        CudaMem(Size size, int type, int alloc_type = ALLOC_PAGE_LOCKED);        //! creates from cv::Mat with coping data        explicit CudaMem(const Mat& m, int alloc_type = ALLOC_PAGE_LOCKED);         ......        void create(Size size, int type, int alloc_type = ALLOC_PAGE_LOCKED);        //! returns matrix header with disabled ref. counting for CudaMem data.        Mat createMatHeader() const;        operator Mat() const;        //! maps host memory into device address space        GpuMat createGpuMatHeader() const;        operator GpuMat() const;        //if host memory can be mapped to gpu address space;        static bool canMapHostMemory();        int alloc_type;};
GPU: cudamem: creatematheader

Create only one header for the GPU: cudamem data, without the reference count Header

C ++: mat GPU: cudamem: creatematheader () const
GPU: cudamem: creategpumatheader

Map the CPU memory to the GPU address and create a GPU with no counting pointer: gpumat header pointing to it.

C ++: gpumat GPU: cudamem: creategpumatheader () const
Note that this method is only useful when alloc_zerocopy is used for memory allocation. It also requires support for specific models of hardware. Laptops often share video and CPU memory, so address spaces can be mapped, which eliminates an extra copy. (This feature has never been used in previous projects. I will not translate the following sentence to avoid mistakes)

GPU: cudamem: canmaphostmemory

Determine whether alloc_zerocopy is supported

C ++: static bool GPU: cudamem: canmaphostmemory ()
GPU: Stream

ClassGPU: Stream

Encapsulate stream in Cuda,
Note currently, you may face problems if an operation is enqueued twice with different data. some functions use the constant GPU memory, and next call may update the memory before the previous one has been finished. but calling different operations Asynchronously
Is safe because each operation has its own constant buffer. memory copy/upload/download/set operations to the buffers you hold are also safe.

Class cv_exports stream {public: stream ();~ Stream (); stream (const stream &); stream & operator = (const stream &); bool queryifcomplete (); void waitforcompletion ();//! Asynchronously transmits data to the host // CV: When mat must point to page locked memory (I. e. to cudamem data or to its submat) void enqueuedownload (const gpumat & SRC, cudamem & DST); void enqueuedownload (const gpumat & SRC, mat & DST );//! Asynchronously transmitted to the GPU // warning! CV: mat must point to page locked memory (I. e. to cudamem data or to its ROI) void enqueueupload (const cudamem & SRC, gpumat & DST); void enqueueupload (const mat & SRC, gpumat & DST ); void enqueuecopy (const gpumat & SRC, gpumat & DST); void enqueuememset (const gpumat & SRC, scalar Val); void enqueuememset (const gpumat & SRC, scalar Val, const gpumat & Mask); // conversion matrix type, ex from float to uchar depending on type void enqueueconvert (const gpumat & SRC, gpumat & DST, int type, double A = 1, double B = 0 );};
GPU: Stream: queryifcomplete

Determines whether the current stream queue is over. If yes, true is returned; otherwise, false is returned.

C ++: bool GPU: Stream: queryifcomplete ()
GPU: Stream: waitforcompletion

Keep the current CPU thread informed that all operations in the stream are completed.
C ++: void GPU: Stream: waitforcompletion ()
GPU: streamaccessor

ClassGPU: streamaccessor

Used to obtain cudastream_t from GPU: Stream

Class that enables getting cudastream_t from GPU: stream and is declared in stream_accessor.hpp because it is the only public header that depends on the Cuda Runtime API. Including it brings a dependency to your code.

struct StreamAccessor{    CV_EXPORTS static cudaStream_t getStream(const Stream& stream);};

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.