JPEG (joint photographic Experts Group) is short for joint image expert group. It is a team jointly established by The CCITT (the International Telegraph and Telephone Consultative Committee) Advisory Committee and ISO in 1986 to develop static digital image encoding standards.
The Team has been committed to standardization and developed digital image compression and coding methods, I .e., JPEG algorithms, for continuous tones, multilevel gray scale, and static images. The JPEG algorithm has been identified as a common international standard and is widely used. In addition to static image encoding, it is also applied to intra-frame image compression of TV image sequences. The static image files compressed by the JPEG algorithm are called JPEG files. The extensions are *. jpg and *. jpe *. JPEG.
1. Basic System Structure of JPEG encoder and decoder.
1.1 JPEG file format Overview
JPEG files use multiple data storage methods. The most common format is JPEG file interchange format (jfif ). However, JPEG files can be divided into two parts: tag and compressed data. A tag code consists of two bytes. The first byte is a fixed value of 0xff, And the last byte has different values according to different meanings. Before each tag code, you can add an unlimited number of meaningless 0xff fills. That is to say, multiple consecutive 0xff can be understood as a 0xff and indicate the beginning of a tag code. After a complete two-byte mark code, the compressed data stream corresponding to the mark code records various types of information about the file.
Common tags include soi, app0, dqt, sof0, DHT, dri, SOS, and EOI. Note that the SOI and so on are all marked names. In the file, the tag code appears as a tag code. For example, the Code for marking the SOI is 0xffd8, that is, if the data 0xffd8 appears in the JPEG file, it indicates that it is an SOI mark.
1.2 Basic Process of JPEG codec
Basic JPEG System Structure
Ii. JPEG encoding process.
2.1 Convert RGB format to YUV format
RGB introduction:
When recording computer images, the most common is to use RGB (red, green, blue) color components to save color information, for example, a non-compressed 24-bit BMP image uses RGB space to save the image. One pixel is 24 bits, each 8 bits saves a color intensity (0-255), for example, red is saved as 0xff0000.
YUV introduction:
YUV is a color encoding method used by European television systems. This method is also widely used in China's radio and television systems. "Y" indicates the brightness (luminance or Luma), that is, the gray scale value, while "u" and "V" indicate the color (chrominance or chroma ). The color TV uses the YUV space to solve the compatibility problem between the color TV and the black and white TV with the Brightness Signal y, so that the black and white TV can also receive the color TV signal.
The conversion formula between YUV and RGB is as follows (RGB values are 0-255 ):
Y = 0.299r + 0.587G + 0.114b
U =-0.147r-0.289G + 0.436b
V = 0.615r-0.515g-0.100b
R = Y + 1.14 V
G = Y-0.39u- 0.58 V
B = Y + 2.03u
2.2 divide the image into 8x8 blocks
After the original image is converted to the YUV format, the image is sampled in a certain format. Common formats include, and.
After sampling, the image is divided into MCU by 8x8 (pixel.
2.3 discrete cosine transformation (DCT)
Discrete Cosine Transform DCT (discrete cosine transform) is a common conversion encoding method for digital rate compression. The Fourier transformation of any continuous real-symmetric function only contains the cosine. Therefore, the cosine transformation has the same physical meaning as the Fourier transformation. DCT first divides the entire image into N * n pixel blocks, and then performs DCT transformation on N * n pixel blocks one by one. Because the high-frequency components of most images are small, the coefficients corresponding to the high-frequency components of the images are often zero. In addition, the human eyes are not sensitive to the distortion of the high-frequency components, so they can be further quantified.
Therefore, the digital rate of the transfer coefficient is much smaller than the digital rate used to transmit image pixels. After reaching the receiving end, the system returns the sample value through the inverse Discrete Cosine transformation. Although there is a certain degree of distortion, it is acceptable to the human eye. Formula for two-dimensional positive and negative discrete cosine transformation:
Where N is the number of horizontal and vertical shards of the image block, generally n = 8. N is more effective than 8, but the complexity is greatly increased. The two-dimensional data blocks of 8*8 are converted into 8*8 transformation coefficients after DCT. These coefficients have clear physical meanings. For example, when u = 0, V = 0, F () is the average of the original 64 sample values, which is equivalent to the DC component. As the u and v values increase, the corresponding coefficients represent the gradually increasing horizontal and vertical spatial frequency components. When we only consider the data row (8 pixels) in the horizontal direction ,:
It can be seen that the image signal is decomposed into DC components, and various cosine components from low frequency to high frequency. The DCT coefficient only indicates the shares of the original image signal occupied by this component. Obviously, the restored image information can be expressed as a matrix in the form of F (n) = C (n) * E (n)
In formula, E (n) is a base, C (n) is a DCT coefficient, and F (n) is an image signal.
If we consider the changes in the vertical direction, we need a two-dimensional base, that is, the base must not only reflect the changes in the horizontal direction frequency, but also reflect the changes in the vertical space frequency; it corresponds to 8x8 pixel blocks. Its space base 2 shows that it is an image consisting of 64 pixel values, which is usually called a basic image. They are called Basic images because any image block can be expressed as a combination of different sizes of 64 coefficients in the inverse transform of discrete cosine transformation. Since the basic image is equivalent to a single coefficient in the transform field, any pixel can also be seen as a combination of 64 basic images with different ranges. This has the same physical significance as the combination of any signal that can be decomposed into base wave and harmonic waves of different amplitude.
Quantization)
The quantization process is a process of discretization the amplitude of a signal. After Quantization, the discrete signal is converted into a digital signal.
HVS is more sensitive to low-frequency signals, so the low-frequency part of the signal uses a relatively short quantitative step, and the high-frequency part uses a relatively long quantitative step. In this way, relatively clear images and higher compression ratio can be obtained to a certain extent.
2.5 Z-shaped code (zigzag scan)
Read the quantified data in Z-shaped form, for example:
2.6 Use the travel Length Encoding (RLE) to encode the AC coefficient (AC)
The so-called Length Encoding means that a code can simultaneously represent the value of the code and there are several zeros in the front. This gives full play to the advantages of Z-based reading, Because Z-based reading has many opportunities for zero connections, especially in the end, if all are zero, after reading the last number, as long as the block end code (eob) is given, the output can be ended, saving a lot of bit rate.
For example, in the figure, the code value is obtained by using Z-shaped extraction and travel code.
(, 0) (, 0) (, 0) (, 0) (, 1) eob
In this way, a 4*4 matrix can be expressed with a small number!
2.7 entropy Encoding
The commonly used entropy encoding is a variable-length coding, that is, Harman encoding.
Khman's coding method: assign a short-character-length binary code to a symbol with a high probability, assign a long-character-length binary code to a symbol with a low probability, and obtain the Code with the shortest average length of the symbol.
Step 1: (1) sort the source symbols in the order of probability and try to distribute the length of the code word in reverse order. (2 ). when allocating the length of a codeword, first combine the probabilities of the two symbols with the minimum probability to form a probability (3 ). regard this synthetic probability as a new probability of combining symbols. Repeat the above practice until there are only two signed probabilities at the end. (4 ). after the above probability order is arranged, the code will be carried forward in turn. Each time there are two branches each assigned a binary code, which can be assigned zero to the probability, if the probability is small, 1 is assigned.
About the AC/DC coefficient Encoding
1. Huffman encoding of AC Coefficient
The non-zero AC coefficient after Z scan and program encoding is expressed as symbol A and symbol B. The symbol A is composed of (runlength, size) and B (amplitude ).
The runlength is the AC coefficient that is consecutive 0 before the non-zero AC coefficient;
Size indicates the number of bits required for amplitude encoding;
Amplitude is the amplitude of the AC coefficient.
In actual operation, JPEG uses an 8-bit value Rs to represent the symbol A, RS = rrrrssss. For a non-zero AC coefficient, the four-digit height indicates runlength, the lower four bits are used to indicate the size. (00000000) indicates eob.
Encode symbol B with a variable-length INTEGER (vli). Put the vli code of symbol B in a to form the final result of encoding a and B.
2. Huffman encoding of DC coefficient
For the DC coefficient, similar to the non-zero AC coefficient, it describes the difference (diff) between the two adjacent DC coefficients as follows: Symbol A is (size ), symbol B is (amplitude ).
Size indicates the number of digits required for amplitude encoding;
Amplitude indicates the amplitude of the DC coefficient.
In the JPEG standard, symbol A is encoded according to the corresponding Huffman table, and symbol B is converted into an integer, then, the vli code of symbol B is placed in the Huffman code of symbol A, and the diff encoding is completed.
The default Huffman table is not defined in the JPEG standard. You can choose a general Huffman table or a specific image based on your actual needs, calculate the Huffman table by collecting statistical features before compression.
Iii. Main Process of JPEG decoding.
3.1 Read File Information
According to the data storage method of JPEG files, read the information about the files to be decoded one by one, and prepare for the subsequent decoding work. The reference method is to design a series of struct corresponding to each tag and store the information represented in the tag. Among them, the image length and width, multiple quantization tables, the user table, the horizontal/Vertical Sampling factor, and other information is more important. The following are some reading problems.
1. Read the general structure of the file
The general sequence of jfif format JPEG files (*. jpg) is:
SOI (0xffd8), app0 (0xffe0), [Appn (0 xffen)] (optional,
Dqt (0 xffdb), sof0 (0xffc0), DHT (0xffc4), SOS (0 xffda ),
Compress data, EOI (0xffd9 ).
2. read data from the table;
3. Create a user tree.
After preparing all the image information, you can decode the image data.
Decoding of AC and DC coefficients
1. decode the AC Coefficient
Solve the RS by querying the Huffman data, and the value from the middle to the runlength and size. Because symbol B is encoded in the vli table, amplitude can be obtained by querying the size value. In this way, the values of the symbols A and B can be solved.
2. decode the DC coefficient
Similarly, we first query the Huffman table to calculate the size, calculate the diff through the size, and add it to the DC coefficient value of the previous 8*8 block, and finally obtain the DC coefficient of the block.
Decoding of color components (Y, U, V) in 3.2 MCU
Image Data streams are composed of MCU, while MCU is composed of data units and color components. Image Data Streams store information in bits. In addition, the internal data is obtained after the forward discrete cosine transform (FDCT) is used to transform the time-space domain to the frequency domain during encoding, therefore, each color component unit should be composed of two parts: one DC component and 63 AC components.
The color component units use RLE travel encoding and Harman encoding to compress data. The data stream of each pixel consists of two parts: the encoding and the value, and the two are basically separated by each other (unless the weight of the encoding is zero ). The process of decoding is actually the search process of the Harman tree.
Differential Code of 3.3 DC coefficient
All color component units are classified by color components (Y, Cr, and CB. Within each color component, the DC variables of the adjacent two color component units are encoded by difference. That is to say, the previously decoded DC variable value is only the actual DC variable of the current color component unit minus the actual DC variable of the previous color component unit. That is to say, the current DC Variable must be corrected by the actual (non-decoded) DC component of the previous color component unit:
DCN = DCn-1 + diff
Diff is the difference correction variable, that is, the DC coefficient directly decoded. However, if the current color component unit is the first unit, the decoded Dc value is the real DC variable.
The DC variables of the three color components are separated for differential encoding. That is to say, three independent DC correction variables should be set for decoding an image.
3.4 anti-Quantization
The anti-quantization process is relatively simple. You only need to multiply the 64 values of the 8*8 color component units by the values with the same positions in the corresponding quantization table. All the color units in the image must be reversed.
3.5 anti-Zig-Zag Encoding
3.6 inverse Discrete Cosine Transformation
As mentioned above, the data in the file is obtained by performing a forward discrete cosine transformation (FDCT) to transform the spatiotemporal domain to the frequency domain during encoding, therefore, the inverse Discrete Cosine Transformation (IDCT) of decoding is to convert the value of the frequency field in the color component matrix to the time-space field. In addition, if the size of the matrix in the original frequency field is 8x8, the matrix in the time-space field is still 8x8 after the inverse Discrete Cosine transformation.
3.7 convert ycrcb to RGB
To display an image on the screen, the color of the image must be expressed in RGB mode. Therefore, the ycrcb mode must be converted to the RGB mode during decoding.
In addition, due to the Discrete Cosine variation, the symmetry of the defined domain is required. Therefore, during encoding, the RGB value range is reduced from [0,255] to [-128]. Therefore, 128 must be added to each component during decoding. The specific formula is as follows:
R = Y + 1.402 * Cb + 128;
G = Y-0.34414 * Cr-0.71414 * Cb + 128;
B = Y + 1.772 * Cb + 128;
Another problem is that the R, G, and B values obtained by transformation may exceed the defined domain. Therefore, we need to check whether the R, G, and B values are used. If the value is greater than 255, the value is truncated to 255. If the value is smaller than 0, the value is truncated to 0.
So far, the decoding of each MCU has been completed.