people who understand the JPEG data format should be able to imagine that the method of splitting and compressing images with 8*8 pixel block size is very easy to implement with parallel processing ideas. In fact, Nvidia's Cuda has provided examples of JPEG codecs since v5.5. The example is stored in the Cuda SDK, the default installation path for Cuda "C:\ProgramData\NVDIA Corporation\cuda SAMPLES\V7.0\7_CUDALIBRARIES\JPEGNPP (the number after V is changed depending on the version).
This example decodes and re-encodes the picture data, and since decoding only converts the data to YUV, we need to use the example to convert the image to RGB data. Yuv->rgb conversion work. This is really what this article is about to focus on. In addition, since there is a bug in the sample itself, it is not possible to directly decode the different images of the compression aspect ratio, which is mentioned again below, and gives a more trickery fix. This bug has been reported to Nvidia, and Nvidia reply will fix the bug in the next version (that is, the version after v7.0).
Reprint Please specify source: http://blog.csdn.net/weixinhum/article/details/46683509
OK, here we go.
because we need to modify the demo source, or first to the above path will be JPEGNPP A copy of the folder to be backed up. Then we directly open the folder inside the VS project. The main code of the project is in jpegNPP.cpp ,
Inverse dctfor (int i = 0; i < 3; ++i) { npp_check_npp (nppidctquantinv8x8ls_jpeg_16s8u_c1r_new (apddct[i), aDCTSt Ep[i], apsrcimage[i], asrcimagestep[i], pdquantizationtables + oframeheader.aquantizationtableselector[i] * ( asrcsize[i], pdctstate));}
This code has implemented the ability to decode JPEG images into YUV data, and YUV data is stored in apsrcimage[0],apsrcimage[1],apsrcimage[2] , and its step length (channel width) exists asrcimagestep[0],asrcimagestep[1],asrcimagestep[2] , the known conditions are sufficient, we can directly delete all the code behind the above code (that part of the code is about the image encoding), and then write a cuda processing function to convert YUV to RGB.
The approximate process is as follows:
Configure the OPENCV environment and include the header file (this step is not necessary, just to make it easy to see if the image we turned out is right, if it is not necessary to ignore it, just know the output RGB data pointer and data length urine can):
#include <OPENCV2/CORE/CORE.HPP>//OPENCV include header files #include <opencv2/highgui/highgui.hpp> #include < Opencv2/opencv.hpp> using namespace std;
write code to implement YUV to RGB:
Add the following code after the demo project code above :
int pwidth = Asrcsize[0].width;int Pheight = asrcsize[0].height;iplimage *drawimg;//data output Image drawimg = Cvcreateimage ( Cvsize (Pwidth, Pheight), 8, 3); npp8u *host_img;//Host memory npp8u *device_img;//graphics memory size_t Mpitch; Npp_check_cuda (Cudamallocpitch (&device_img, &mpitch, Pwidth * 3, pheight));//Open memory space to store RGB data//unsigned char* Imgdata = (unsigned char*) drawimg->imagedata; Ycrcb2rgb (Apsrcimage[0], apsrcimage[1], apsrcimage[2], pwidth, Pheight, asrcimagestep[0],asrcimagestep[1], ASRCIMAGESTEP[2], drawimg->widthstep/sizeof (Uchar), device_img, NMCUBLOCKSV, Nmcublocksh); Npp_check_cuda (Cudahostalloc (&host_img, Pwidth*pheight * 3, Cudahostallocdefault));//Assign Host lock page memory Npp_check_cuda ( cudamemcpy (host_img, Device_img, Pwidth*pheight * 3, Cudamemcpydevicetohost));//Copy the graphics card to finish processing the image to host Drawimg->imagedata = ( char*) Host_img;cvshowimage ("", drawimg); Cvwaitkey (0); GetChar (); for (int i = 0; I < 3; ++i)//Memory Release {Cudafree (apsrcimage[i]); Cudafree (Apddct[i]); Cudafreehost (Aphdct[i]);} Cudafree (device_img); CudafreehoSt (HOST_IMG); Cudadevicereset (); return exit_success;
Add a "cudaycrcb.cu" file to define Ycrcb2rgb function, as to how to set the. cu file if you have any questions, please refer to the previous article for more information, in addition to the Ycrcb2rgb function needs to be under the JpegNPP.cpp file Header declaration . The contents of the file are as follows:
#include "cuda_runtime.h" #include "device_launch_parameters.h" #include "Endianess.h" __device__ unsigned char judge ( int value) {if (value >= 0 && value <= 255) {return value;} else if (value>255) {return 255;} Else{return 0;}} __global__ void Ycrcb2rgbconver (unsigned char *device_y, unsigned char *device_cr, unsigned char *device_cb, unsigned char *device_img, int width, int height, int ystep, int crstep, int cbstep, int img_step, int nmcublocksv, int nmcublocksh)//place Kernel function {//int tid = blockidx.x*blockdim.x + threadidx.x;int row = blockidx.y*blockdim.y + threadidx.y;int cols = blockIdx.x*b Lockdim.x + threadidx.x;if (row >= height) {return;} if (cols >= width) {return;} int Y = Device_y[row*ystep + cols];int U = Device_cr[row/nmcublocksh*crstep + COLS/NMCUBLOCKSV]-128;int V = Device_C B[row/nmcublocksh*cbstep + COLS/NMCUBLOCKSV]-128;device_img[row*img_step + cols * 3 + 0] =judge (Y + U + (U * 198) & Gt;> 8));D Evice_img[row*img_step + cols * 3 + 1] =judge (Y-((U * ) >> 8) + ((v * 183) >> 8));D Evice_img[row*img_step + cols * 3 + 2] =judge (Y + V + ((v * 103) >> 8)) ;} extern "C" int ycrcb2rgb (unsigned char *device_y, unsigned char *device_cr, unsigned char *device_cb, int width, int heigh T, int ystep, int crstep, int cbstep, int img_step, unsigned char *device_data, int nmcublocksv, int nmcublocksh)//Graphics processing function {cudaevent_t start, stop;float time;cudaeventcreate (&start); cudaeventcreate (&stop); CudaEventRecord (Start, 0)///This part can be adjusted DIM3 threads (16, 16);//thread block thread number 1*1//DIM3 threads (256, 40);//thread block Threads 1*1dim3 Blocks ((width + threads.x-1)/t Hreads.x, (height + threads.y-1)/THREADS.Y);//thread block size ycrcb2rgbconver << <blocks, Threads >> > (device_y , DEVICE_CR, DEVICE_CB, device_data, width, height, ystep, Crstep, Cbstep, Img_step, NMCUBLOCKSV, Nmcublocksh);//Call the graphics card to process the data Cudaeventrecord (stop, 0); Cudaeventsynchronize (stop); Cudaeventelapsedtime (&time, start, stop); Cudaeventdestroy (start); Cudaeventdestroy (stop);p rintf ("nuclear LetterHours:%f\n ", time); return 0;}
The statement reads as follows:
to this, to achieve the content of the article title, for the earlier mention of Nvidia's demo itself exists bug (decoding compression aspect ratio different image memory error), is due to the compression of the aspect ratio is wrong, can be modified by the following way.
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
CUDA implements JPEG image decoding to RGB data