OPENCV's GPU module provides many parallel functions for CUDA implementations, but sometimes it is necessary to write parallel functions and use them in conjunction with existing OPENCV functions, while OpenCV is an open-source library of functions, and we can easily see the implementation mechanism within it. You can write one of your own Cuda parallel functions based on his existing function Bishi.
The key classes for the GPU you need to use are: Gpumat and Ptrstepsz. Two classes. The Gpumat is mainly used to upload the in-memory data to the existing, while PTRSTEPSZ is the parameter for the GPU kernel function.
1, first, the design of their own kernel function, the parameters can be the type of their own definition can also be ptrstepsz, generally choose the latter, because OpenCV overloaded the Gpumat coercion type conversion function.
2. Call, initialize your own gpumat use the upload function, then use the coercion type conversion, convert it to ptrstepsz<**> type, set the grid and block, and finally call the kernel function, end call Cudafree release the resources you requested.
3. Finally call Gpumat's download function and return to the matrix that you have finished processing.