- I. Overview of the JPEG principle
- Second, the JPEG principle detailed analysis and the compression algorithm process
- 1. Color model Conversion (color models)
- 2. DCT (discrete cosine Transform discrete cosine transform)
- 3. Data quantification
- 4. Reorder DCT Results
- 5. DC coding based on differential pulse coded modulation
- 6, RLE Code
- 7. Paradigm Huffman Coding
- 8, JPEG compression process summary
- Three, JPEG storage format
- Four, JPEG compression of GPU optimization
I. Overview of the JPEG principle
JPEG is the abbreviation for the Joint Photographic Experts Group, the ISO and IEC joint Image Expert Group, responsible for the development of the static image compression standard, the algorithm developed by this expert Group is called the JPEG algorithm, and has become a common standard, that is, JPEG Standard. JPEG compression is lossy compression , but this part of the loss is not easy to detect the human vision of the part, it takes full advantage of the human eye on the computer color of the high-frequency information is not sensitive to the characteristics , to greatly reduce the need to process data information.
JPEG format in the image of the position equivalent to the MP3 format in the music of the same position, are for the original data lossy compression, for example, a picture of 1000*1000, RGB three channels each accounted for one byte, so uncompressed image information is about the size of 1000*1000* 3=3000000 byte, approximately equal to 2.86MB, this file size read the novel people can imagine, probably should be 150w words to 200w Word, and after JPEG compression, its size can reach 300KB around, compression ratio is generally about 1:8. And how does JPEG achieve such a high compression ratio? Previously mentioned JPEG is lossy compression, so-called lossy, that is, the image is not important, the human eye on its non-sensitive things filtered out to achieve the purpose of compressing file size, For example, 12345.0000000001, we can treat this number as 12345, and the missing part is because it contains too little information. Next, in the stored procedure, we can use some special methods to optimize the storage structure to achieve the purpose of further compressing the file size. Therefore, the process of JPEG encoding of the original image information is divided into two major steps: The first step, the removal of redundant information on the visual, that is, the space redundancy, the second step to remove the data itself redundant information, that is, structural redundancy.
Second, the JPEG principle detailed analysis and the compression algorithm process
The main topics involved in the entire JPEG encoding include: color model Conversion (color models), DCT (discrete cosine Transform discrete cosine transform), data quantization, reordering of DCT results, DC coding based on differential pulse encoding modulation , RLE coding and Paradigm Huffman coding. Let us explain in detail.
1. Color model Conversion (color models)
In the image processing, in order to take advantage of the human visual angle characteristic, thus reduces the data quantity, usually the RGB space representation color image transforms to other color space. There are three types of color-space transformations we use now: Yiq,yuv and YCRCB.
For computer, the color space transformation of digital domain of computer is different from the color space transform of TV simulation domain, their components are represented by Y, Cr and Cb, so it is necessary to convert RGB to YCRCB, and its conversion relationship is as follows:
The y here represents the luminance (luminance), and the CB and CR respectively represent the green and red chromatic aberration values.
From here, it can be seen that the calculated Y, Cr and Cb components, there will be a large number of decimals, that is, floating-point numbers, resulting in the JPEG encoding process will appear a large number of floating-point operations, of course, after certain optimization, these Floating-point arithmetic can be encoded by shifting and adding these computers in such a way that they can be processed more quickly .
Note that, in fact, theJPEG algorithm is independent of the color space , the color space is related to the image sampling problem, it is not directly related to the compression of data. The color image processed by the JPEG algorithm is a separate color component image, so it can compress data from different color spaces, such as RGB,YCBCR and CMYK.
The human eye has different sensitivity to the different frequency components that make up the image, which is determined by the visual physiology characteristic of the human eye. If the human eye contains 180 million light-sensitive columnar cells, containing 8 million color-sensitive vertebral cells, because the number of columnar cells is much larger than the vertebral cells, so the eye sensitivity to brightness is greater than the sensitivity of the color. For example, the following picture:
As you can see, the details of the luminance map are richer. After the JPEG converts the image to YCbCr, it is possible to do different processing for the important degree of the data. This is why JPEG generally uses this color space.
2. DCT (discrete cosine Transform discrete cosine transform)
As we mentioned before, the human eye is not sensitive to the high frequency information in the computer color, so if we can filter out the high frequency part of the image, we can compress the image. The question is, how do we convert the color gamut image to the frequency domain? In the principle of digital communication in contact with the fast Fourier transform, the discrete cosine transform is another form of Fourier transform, Fourier transform can convert the time domain signal into a frequency domain signal, which originates from the famous idea that Fourier once proposed, he thinks any periodic function, Can be decomposed into a series of triangular function combinations, the idea was questioned by Lagrange, he proposed that the trigonometric functions of any combination, can not express the "sharp angle" of the function, although the last Lagrange is correct, but as long as the trigonometric functions enough, we can infinitely approximate the final result, for example , take a look at how to describe a rectangular square wave with trigonometric functions:
When we are not dealing with a function, but a bunch of discrete data, then the Fourier transform function only has the cosine term, which is called the discrete cosine transform. For example, there is a set of one-dimensional data [x0,x1,x2,..., xn-1], then the DCT transform can be obtained by the N Transformation Series fi:
At this point the original data Xi can be expressed by the inverse transform (IDCT) of the discrete cosine transform:
That is, after the DCT transformation, an array can be decomposed into a number of arrays of the and, if we think of the array as a one-dimensional matrix, then the result can be seen as a series of matrices and:
For example, we have a number with a length of 8, a 50,55,67,80,-10,-5,20,30, a DCT conversion, Get 8 Series for 287.0,106.3,14.2,-110.8,9.2,65.7,-8.2,-43.9, according to the formula to convert the array to 8 new array of sum, if we use the image to express, we can find the interesting point of the DCT conversion:
It can be seen that seemingly messy data after the DCT transformation will become a few neat changes in the data, and the DCT converted array of the first is a straight line data, so called DC data, referred to as DC, the rest of the data for the exchange of data, referred to as AC.
In the JPEG compression algorithm, the entire image is divided into 8*8 image block, and then the DCT transform for each image block, and the DCT transformation formula is:
Let's take a very extreme example to see how powerful the DCT transformation is:
After the DCT transformation, the energy of the whole image is concentrated in the DC component of the upper left corner. Let's look at a more general example:
It can be seen that DCT changes, the matrix is divided into the DC component and the AC component two parts, and all the way to here, the whole process is reversible, the image is still lossless, and this step for the back of the image compression played a role, so the DCT transform is the core of the JPEG compression algorithm.
3. Data quantification
After the first two steps, the entire image will be decomposed into a number of 8*8 small matrix, and each small matrix is divided into Y, Cb, CR three components, we take a Y-component matrix as an example:
The question now is how do you store these floating-point numbers in less space with some precision loss? The answer is quantization. For example, for example, in the game to deal with the character face direction, generally not 0 to 2π such a floating point, but the direction is divided into 16 intervals, with an integer to represent, such a direction requires only 4 bit. The quantization algorithm provided by JPEG is as follows:
Where g is the image matrix we want to deal with, Q is the quantization coefficient matrix, which provides two standard quantization coefficient matrices in the JPEG algorithm, which are used to process luminance data and chromatic aberration data respectively:
After quantization, the original matrix becomes:
We can see that after the compression of a large area of 0, which is very conducive to data compression, in the actual compression process, we can also multiply the matrix by a coefficient, representing the compression rate, to control the occurrence of more or less 0, and then control the compression quality, the coefficient is the value of 0 to 1 of the real number. In general, the DCT transform is actually a low-pass filter in the spatial domain. The luminance components are finely quantified and the chromatic aberration components are coarse-grained.
4. Reorder DCT Results
After quantization, the matrix of 8*8 is still a two-dimensional matrix, how to adjust the result of our DCT can increase the compression rate more high? After observing the quantized matrix, we found that a lot of information was concentrated in the upper left corner, so we used zigzag orchestration,
The result becomes: 26,?3,0,?3,?3,?6,2,?4,1 4,1,1,5,1,2,?1,1,?1,2,0,0,0,0,0,?1,?1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0. The number of consecutive 0 is increased so that we can compress the data better.
5. DC coding based on differential pulse coded modulation
After observing the DC component and the AC component of the 8*8 matrix DCT, we can see that the value of the DC component is obviously greater than the AC component, and the coefficient value of the DC component of the neighboring 8*8 matrix changes little, because the energy of the image is basically concentrated in the low frequency component, and the picture is mostly continuous, That is, the energy change in the adjacent matrix is relatively stable, so we use the differential pulse modulation coding (DPCM) technique to encode the difference between the quantization DC coefficients between adjacent image blocks.
6, RLE Code
Run length Encoding, the travel code is also known as "Runtime encoding" or "run-time Encoding", which is a lossless compression encoding. For example: 5555557777733322221111111, a feature of this data is that the same content will be repeated many times, then you can use a simplified method to record this string of numbers, such as (5,6) (7,5) (3,3) (2,4) (1,7) is its travel code. The number of digits in the stroke encoding is much less than the number of bits in the original string. Take a look: 57,45,0,0,0,0,23,0,-30,-16,0,0,1,0,0,0,0, 0, 0, 0,..,0, can be expressed as: (0,57); (0,45); (4,23); (1,-30); (0,-16); (2,1); EOB. That is, the first one of each group of numbers represents the number of 0, and in order to be more conducive to subsequent processing, must be 4 bit, that is, only 0~15, which is a feature of the travel code. JPEG uses a 1-byte high 4-bit to represent the number of consecutive "0", and uses its low 4 bits to represent the number of bits required to encode the next non-"0" factor, followed by a numerical value of the quantization AC coefficient. where (0,0) and (15,0) are more special, (0,0) represents the end of the block, (15,0) represents already have 16 consecutive 0.
7. Paradigm Huffman Coding
DPCM and RLE encoding, in order to further compress, we use the Paradigm Huffman encoding, with an example to see how the data after zigzag is compressed:
Since the first is the DC component, using the DPCM technique, we assume that the DC component value of the previous matrix is 0, that is, 35 represents its difference, then the output of the AC component of the original data is RLE encoded as follows:
The next thing we want to deal with is the data on the right side of the parentheses, and JPEG provides a standard code table:
So our raw data has become:
Merge the first two words in parentheses into one byte:
Next we are going to use Huffman coding, assuming we now have the Huffman table as follows, DC component:
AC Component:
After Huffman coding, the data becomes:
Finally, we used 10 bytes of space to save the original 64 bytes of data, so that the entire JPEG compression algorithm ended.
8, JPEG compression process summary
So, let's conclude by summarizing the entire JPEG compressed picture process:
- Divide the whole picture into several 8*8 matrices
- DCT transformation for each 8*8 matrix
- Quantization of the matrix after DCT
- Re-zigzag the sorting
- The DC and AC components are respectively DPCM and RLE encoded
- Huffman encoding of the overall information
Three, JPEG storage format
Note that we are talking about the JPEG compression algorithm, this standard only shows that if the picture compressed into a byte stream and re-decoded into the process of image, and the JPEG standard definition of the file storage format is Jif, but it has a certain flaw, the use of low, and then gradually appeared a different file storage format, such as JFIF, EXIF, etc., but they all follow jif.
The entire JPEG compression algorithm is described in the whole process, obviously, the JPEG general storage format is basically clear, we need at least the field to store the quantization table, Huffman table and compressed data. Let's take a look at what the format of the JPEG format is:
JPEG with 0xFF as marker, when encountering 0xFF need to judge:
- If it is 0x00, it means that 0XFF is an integral part of the image stream and needs to be decoded;
- In the case of 0xd0~0xd7, the RSTN tag needs to be ignored, i.e. the entire RSTN tag is not decoded and the decoding variable is adjusted according to the rules of the RST tag, not the current 0XFF and the 0XDn two bytes immediately thereafter;
- If it is 0XFF, ignore the current 0XFF, the next 0XFF to judge;
- If it is an existing head mark, then the corresponding decoding;
- If it is a different value, suddenly the current 0XFF, and retain the value immediately thereafter for decoding;
and the head tag code and its meaning are as follows:
Soi,start of image, the beginning of the picture, the tag code is a fixed value of 0xffd8, expressed in 2 bytes;
App0,application 0, the application retains tag 0, the tag code is a fixed value 0xffe0, in 2 bytes, and the tag contains 9 specific fields:
(1) Data length: 2 bytes, used to denote (1)-(9) The total length of 9 fields, that is, does not contain the tag code but contains this field;
(2) Identifier: 5 bytes, fixed value 0x4a6494600, indicating the string "JFIF0";
(3) Version number: 2 bytes, generally 0x0102, indicating that the version number of JFIF is 1.2, but it may also be other values, thereby representing the other version number;
(4) x, y direction density unit: 1 bytes, only three values are optional, 0: no unit; 1: dots per inch; 2: points per centimeter;
(5) x-direction pixel density: 2 bytes, the value range is unknown;
(6) Y-direction pixel density: 2 bytes, the value range is unknown;
(7) Thumbnail horizontal pixel number: 1 bytes, the value range is unknown;
(8) Thumbnail vertical pixel number: 1 bytes, the value range is unknown;
(9) Thumbnail RGB bitmap: The length may be a multiple of 3, save a 24-bit RGB bitmap, if there is no thumbnail bitmap (this is more common), then the value of the field (7) (8) is 0;
APPn, Application N, the application retains the tag n (n=1---15), the tag code is 2 bytes, the value is 0XFFE1--0XFFEF, and two fields are included:
(1) Data length, 2 bytes, representing (1) (2) The total length of two fields; that is, the tag code is not included but contains this field;
(2) Detailed information: Data length-2 bytes, the content is variable;
Dqt,define quantization table, define the quantization tables; the tag code is a fixed value of 0XFFDB; Contains 9 specific fields:
(1) Data length: 2 bytes, representing the total length of (1) and multiple (2) fields; that is, the tag code is not included but contains this field;
(2) Quantization table: Data Length-2 bytes, including the following:
(a) Accuracy and quantization table id,1 bytes, high 4 bits for precision, only two selectable values, 0:8 bits, 1:16 bits, low 4 bits for the quantization table ID, and a value range of 0--3;
(b) Table entry, 64* (precision value +1) bytes, for example, 8-bit precision quantization table, its table entry length is 64* (0+1) = 64 bytes;
In this mark paragraph, (2) can be repeated, indicating a plurality of quantization tables, but can only appear at most 4 times;
Sofo,start of frame, starting with the framing image, the tag code is a fixed value 0xffc0; Contains 9 specific fields:
(1) Data length: 2 bytes, (1)--(6) Total length of 6 fields; that is, the tag code is not included but contains this field;
(2) Accuracy: 1 bytes, representing the number of bits per data sample, usually 8 bits;
(3) Image height: 2 bytes, representing the height of the image in pixels, which must be greater than 0 if DNL is not supported;
(4) Image width: 2 bytes, which represents the width of the image in pixels, and if DNL is not supported, it must be greater than 0;
(5) Number of color components: 1 bytes, because JPEG uses YCRCB color space, here is a constant of 3;
(6) Color component information: The number of color components * 3 bytes, which is usually 9 bytes, and according to the following information:
(a) A color component id:1 bytes;
(b) Horizontal/Vertical Sampling factor: 1 bytes, high 4 bits representing the horizontal sampling factor, and low 4 bits representing the vertical sampling factor;
(c) Quantization table: 1 bytes, the quantization table ID used by the current component;
In this paragraph, the field (6) should be repeated 3 times, because there are 3 color components;
Dht,define Huffman table defines Huffman tables with a tag code of 0XFFC4; contains 2 fields:
(1) Data length, 2 bytes, representing (1)-(2) The total length, that is, not including the tag code, but contains this field;
(2) Huffman table, data Length-2 bytes, containing the following fields:
(a) Table ID and table type, 1 bytes, high 4 bits for table type, value only two; 0:DC DC; 1:ac ac; Low 4 bit, Huffman table ID; to be reminded, the DC and AC tables are encoded separately;
(b) Number of code words in different digits, 16 bytes;
(c) encoded content, the sum of the number of code words of 16 different digits (bytes);
In this paragraph, the field (2) can recur, and it is generally necessary to repeat 4 times.
Dri,define Restart Interval, defines the interval of the cumulative reset of the differential code, the marking code is fixed value 0XFFDD;
Consists of 2 specific fields:
(1) Data length: 2 bytes, with a fixed value of 0x0004, representing (1) (2) The total length of two fields, i.e., does not include the tag code, but contains this field;
(2) The unit of the MCU block restart interval: 2 bytes, if the value is n, it means that every n MCU block has a RSTN tag, the first mark is RST0, the second is rst1,rst7 and then repeats from RST0, if there is no this mark segment, or the interval value is 0, Indicates that there is no restart interval and mark rst;
Sos,start of scan, scanning begins; The tag code is 0XFFDA and contains 2 specific fields:
(1) Data length: 2 bytes, representing (1)-(4) The total length of the field;
(2) Number of color components: 1 bytes, only 3 selectable values, 1: grayscale; 3:YCRCB or Yiq;4:cmyk;
(3) Color component information: Includes the following fields,
(a) A color component id:1 bytes;
(b) DC/AC coefficient table id,1 byte, high 4 bits represents the ID of the DC component of the Huffman table, and the low 4 bits represents the ID of the Huffman table for the AC component;
(4) Compress image data
(a) Spectral selection start: 1 bytes, fixed value 0x00;
(b) Spectral selection end: 1 bytes, fixed value 0x3f;
(c) Spectral selection: 1 bytes, fixed value 0x00;
In this mark paragraph, (3) should be repeated, the number of color components, repeated several times, after the end of this paragraph is the real image information, image information until the EOI mark to the end of the event;
Eoi,end of image, end of picture; tag code 0xffd9;
Four, JPEG compression of GPU optimization
Cuda is the nvidia out of the GPU-oriented platform, in Cuda, the PC is called the host side, the graphics card is called the device side, provides the __GLOBAL__ macro to define the kernel functions for GPU operations, as well as many malloc, free, The memcpy function is used to request, release, or transfer data between memory and video, using <<<N,M>>> to specify that N thread blocks be opened, and M threads within each block to execute kernel functions.
Take a look at the following example:
1#include <stdio.h>2#include <cuda_runtime.h>3 4 //__global__ declares a function that tells the compiler that this piece of code is being called by the CPU and executed by the GPU5__global__voidAddConst int*dev_a,Const int*dev_b,int*Dev_c)6 {7 intI=threadidx.x;8dev_c[i]=dev_a[i]+Dev_b[i];9 }Ten One intMainvoid) A { - //request host memory and initialize - inthost_a[ +],host_b[ +],host_c[ +]; the for(intI=0;i< +; i++) - { -host_a[i]=i; -host_b[i]=i<<1; + } - + //define Cudaerror, default is cudasuccess (0) Acudaerror_t err =cudasuccess; at - //request GPU Storage space - int*dev_a,*dev_b,*Dev_c; -Err=cudamalloc ((void* *) &dev_a,sizeof(int)* +); -Err=cudamalloc ((void* *) &dev_b,sizeof(int)* +); -Err=cudamalloc ((void* *) &dev_c,sizeof(int)* +); in if(err!=cudasuccess) - { toprintf"The Cudamalloc on GPU is failed"); + return 1; - } theprintf"SUCCESS"); * //data to be computed is transferred to the GPU using cudamemcpy $cudamemcpy (Dev_a,host_a,sizeof(host_a), cudamemcpyhosttodevice);Panax Notoginsengcudamemcpy (Dev_b,host_b,sizeof(Host_b), cudamemcpyhosttodevice); - the //call the kernel function to execute on the GPU. Less data, using a block containing 512 threads +add<<<1, +>>>(Dev_a,dev_b,dev_c); Acudamemcpy (&host_c,dev_c,sizeof(Host_c), cudamemcpydevicetohost); the for(intI=0;i< +; i++) +printf"host_a[%d] + host_b[%d] =%d +%d =%d\n", I,i,host_a[i],host_b[i],host_c[i]); -Cudafree (dev_a);//Releasing GPU Memory $Cudafree (Dev_b);//Releasing GPU Memory $Cudafree (Dev_c);//Releasing GPU Memory - return 0 ; -}
Fortunately, Cuda has encapsulated multi-threaded optimizations for GPUs into functions, and we just need to call the functions in the NPP library directly.
After testing, using GPU optimization is 10 times times more efficient than the graphics library in Golang, which is about 5 times times more than the resize library.
--------------------------------------------------------------------------------------------------------------- ----------------------
Resources:
- Https://www.ibm.com/developerworks/cn/linux/l-cn-jpeg/index.html
- "GPU High performance programming Cuda combat"
Description: This article refers to a number of network data collation, thank you for your Daniel's pay!
JPEG format compression algorithm