The GPU module of opencv provides many parallel functions implemented by cuda, but sometimes you need to write parallel functions and use them with existing opencv functions. opencv is an open-source function library, we can easily see its internal implementation mechanism, and write a Cuda parallel function based on its existing functions.
The key GPU classes are gpumat and ptrstepsz. Gpumat is mainly used to upload data in the memory to the existing one, while ptrstepsz is a parameter used for GPU core functions.
1. First, design your own core function. The parameter can be a defined type or ptrstepsz. Generally, select the latter because opencv reloads the gpumat forced type conversion function.
2. Call. initialize your own gpumat and use the upload function. Then use forced type conversion to convert it to the ptrstepsz <**> type, Set grid and block, and call the kernel function, call cudafree to release the resources you have applied.
3. Call the download function of gpumat to return the processed matrix.