MPEG adopts the discrete cosine transform (DCT-discrete cosine transform) compression algorithm proposed by Ahmed (a giant mathematician) in 1970s to reduce the spatial redundancy of video signals.
DCT converts motion compensation errors or original image information blocks into coefficients that represent different frequency components. There are two advantages: first, signals usually focus most of their energy on a small range of frequency domains. In this way, describe unimportant components with only a few bits. Second, frequency Domain Decomposition maps the processing process of human visual systems and allows subsequent quantization processes to meet their sensitivity requirements.
I have a detailed description of this point in my tutorial. Let me directly reference it:
The spectrum line of the video signal is in the range of 0 to 6 MHz, and most of the video images contain low frequency spectrum lines, only video signals at the edge of an image with a very low proportion in the image area contain high-frequency spectral lines. Therefore, when video signal digital processing, bit numbers can be allocated based on Spectrum factors: a large number of BITs can be allocated to low-spectrum areas that contain a large amount of information, A small number of BITs are allocated to the high-frequency spectral areas with low information, while the image quality is not perceptible to achieve bit rate compression. However, it is only when the low entropy (entropy) value is used for Effective encoding. Whether a string of data can be effectively encoded depends on the probability of each data occurrence. The probability difference between each data is large, which indicates that the entropy value is low and the data in this string can be efficiently encoded. If the probability difference is small and the entropy value is high, efficient coding cannot be performed. The digitization of video signals is based on the/D converter's video level conversion at a specified sampling frequency. The video signal amplitude of each pixel changes periodically with the time of each layer. The total average Information volume of each pixel is the total average Information volume, that is, the entropy value. Because each video level has almost the same probability, the video signal entropy is very high. The entropy value is a parameter that defines the bit rate compression ratio. The compression ratio of a video image depends on the entropy value of the video signal. In most cases, the video signal is a high entropy value and must be encoded efficiently, it is necessary to change the high entropy value to the low entropy value. How does it become a low entropy value? This requires analyzing the characteristics of the video spectrum. In most cases, the video spectrum decreases as the frequency increases. Among them, the low frequency spectrum gets the level 0 to the highest under almost equal probability. In contrast, high-frequency spectrum usually produces low-level and rare high-level. Obviously, the low frequency spectrum has a higher entropy value and the high frequency spectrum has a lower entropy value. Based on this, the low-frequency and high-frequency video components can be processed separately to obtain the high-frequency compression value.
As can be seen from the reference above, bit rate compression is based on transform encoding and entropy Encoding algorithms. The former is used to reduce the entropy value, and the latter converts the data into an effective encoding method that can reduce the number of bits. In the MPEG standard, the conversion encoding adopts DCT. Although the conversion process does not compress the bit rate itself, the converted frequency coefficient is very helpful for bit rate compression. In fact, the whole process of compressing digital video signals is divided into four main processes: block sampling, DCT, quantization, and encoding. First, the original image is divided into N (horizontal) × n (vertical) in the time domain) sampling block. You can select 4x4, 4x8, 8x8, 8x16, and 16x16 blocks as needed, these sampled pixel blocks represent the gray values of each pixel in the original image frame, which ranges from 139-163 and are sent to the DCT encoder in sequence, in this way, the sampling block is converted from the time domain to the DCT coefficient block in the frequency domain. The conversion of the DCT system is carried out in each sampling block. Each sampling block is a digital value, indicating the video signal amplitude value corresponding to the pixel in a field.
The specific inverse algorithm for DCT and Its decompression is as follows.
When u, v = 0, if the coefficient after the Discrete Cosine positive transformation (DCT) is F (0, 0) = 1, then the Discrete Cosine inverse transformation (IDCT) f (x, y) = 1/8 is a constant value. Therefore, F () is called the DC (DC) coefficient. When u, v = 0, if the coefficient after positive transformation is f (u, v) = 0, f (x, y) is not a constant, the f (u, v) coefficient after the positive transformation is the AC coefficient.
This article only discusses the basic algorithm of Static Image Compression. The purpose of image compression is to use a small amount of data
Images to save storage costs, transmission time, and costs.
The JPEG compression algorithm can be used to compress the image with distortion, but the degree of distortion is visible to the naked eye.
Cannot be recognized. This is why JPEG has such a satisfactory compression ratio.
The following describes the basic JPEG compression method.
1. JPEG compression process
JPEG compression is implemented in four steps:
1. color mode conversion and sampling;
2. DCT transformation;
3. quantization;
4. encoding.
Ii. 1. color mode conversion and sampling
The RGB color system is the most commonly used color representation method. JPEG adopts the YCbCr color system.
To use JPEG basic compression to process full-color images, you must first convert the RGB color mode image data
Data of the YCbCr color mode. Y indicates the brightness, CB and CR indicates the color and saturation. Use the following calculation
The formula can complete data conversion.
Y = 0.2990r + 0.5870G + 0.1140b
CB =-0.1687r-0.3313G + 0.5000b + 128
Cr = 0.5000r-0.4187g-0.0813b + 128
Human eyes are more sensitive to low-frequency data than high-frequency data. In fact, humans
The eye is more sensitive to the brightness change than the color change, that is, the data of the Y component is compared.
Important. Since the data of CB and Cr components is relatively unimportant, you can retrieve only part of the data.
. To increase the compression ratio. JPEG has two sampling methods: yuv411 and yuv422.
The table indicates the data sampling ratio of Y, CB, and Cr.
2. DCT Transformation
The full name of DCT transformation is the discrete cosine transformation (discrete cosine transform), which refers
Light intensity data is converted into frequency data to learn the intensity change. If you modify high-frequency data,
When we switch back to the original form of data, it is obviously a little different from the original data, but the human eyes are not
Easy to recognize.
During compression, raw image data is divided into 8x8 Data Unit matrices, such as within the first matrix of the brightness value.
The content is as follows:
JPEG regards the entire brightness matrix and the chromium CB matrix, and the saturation Cr matrix as a basic unit called
MCU. Each MCU can contain up to 10 matrices. For example, the ratio of row and column sampling is 4:
, Then each MCU will contain four brightness matrices, one color matrix and one saturation matrix.
After the image data is divided into an 8*8 matrix, each value must be subtracted by 128 and then substituted
In the DCT transformation formula, the purpose of DCT transformation can be achieved. The image data value must be reduced by 128 because of DCT.
The conversion formula accepts numbers ranging from-128 to + 127.
DCT conversion formula:
X and Y represent the coordinates of a value in the image data matrix.
F (x, y) represents several values in the image data matrix.
U, V represents the Coordinate Position of a value in the matrix after DCT Transformation
F (u, v) represents a value in the matrix after DCT transformation.
U = 0 and V = 0 C (u) C (v) = 1/1. 414
U> 0 or v> 0 C (u) C (v) = 1
The Natural Number of matrix data after DCT transformation is the frequency coefficient. These coefficients are the greatest values of F (0, 0 ).
The other 63 frequency coefficients are mostly positive and negative floating-point numbers close to 0.
It is AC.
3. Quantization
After the image data is converted to the frequency coefficient, a quantization program must be accepted to enter the encoding phase.
In the quantization phase, two 8*8 matrix data are required. One is the frequency coefficient for processing the brightness, and the other is
For the color frequency coefficient, divide the frequency coefficient by the value of the quantization matrix to obtain the nearest integer to the business number,
That is, quantization is completed.
When the frequency coefficient is quantified, the frequency coefficient is converted from a floating point to an integer, which facilitates the most
. However, after the quantization phase, all data is retained with only the approximate integer value, which leads to further loss.
Some data content is provided. The quantization table provided by JPEG is as follows:
4. Encoding
The Huffman encoding method is the most common JPEG encoding method.
It is carried out with a complete MCU.
During encoding, the DC and 63 ACSS of each matrix data use different Huffman codes.
And the brightness and color also need different Huffman encoding tables, so a total of four encoding tables are required.
The JPEG encoding can be completed smoothly.
DC code
DC is the difference encoding method modulated by difference pulse coding, that is, it is obtained from the same image component.
Encode the difference between each Dc value and the previous Dc value. DC adopts difference pulse encoding mainly because
In a continuous tone image, the difference value is mostly smaller than the original value.
The number of digits required to encode the original value is much smaller. For example, if the difference value is 5, its binary value is 101, for example
If the difference value is-5, it is first changed to a positive integer 5, and then its binary value is converted into a complement of 1. The so-called complement of 1
If the value of each bit is 0, it is changed to 1; if BIT is 1, it is changed to 0. Number of digits to be retained for the difference value 5
3. The following table lists the comparison between the bit numbers reserved for the difference value and the difference content.
In addition, add some differences to the front end of the difference. For example, the Brightness Difference is 5 (101) bits.
If the number is 3, the Hoffman code value should be 100, and the two are connected together to 100101. The following two tables are divided:
Are the encoding tables for the differences between brightness and chromium DC. Based on the contents of the two tables, you can add
Man code value to complete DC coding.
AC Encoding
The AC encoding method is slightly different from that of DC. Before the AC encoding, you must first sort 63 ACSS by zig-zag.
In sequence.
63 values are arranged, and the AC coefficient is converted into an intermediate symbol, which is expressed as RRRR/SSSS,
Rrrr refers to the number of AC whose value is 0 before the non-zero AC, ssss refers to the number of digits required for the acvalue, And the AC system
The correspondence between the number range and ssss is similar to the comparison table between the number of BITs and the difference value of DC.
If the number of consecutive 0 s is greater than 15, then 15/0 is used to represent 16 consecutive 0 s, 15/0 is called zrl
(Zero rum length), while (0/0) is called eob (ENEL of Block) to indicate
The remaining AC coefficients are equal to 0, and the intermediate symbol value is used as the index value to find the appropriate
The following code is used to connect the user with the acvalue.
For example, if the center operator of a set of brightness is 5/3 and the acvalue is 4, the first index value is 5/3.
In the Huffman encoding table, the 1111111110011110 Hoffman code value is found, and the original 100 (4) is added)
It is used to get the Huffman code 1111111110011110100 of [5, 4]. [5, 4] indicates
There are five zeros in front.
Because the brightness is AC and the color is relatively long, this parameter is omitted here. For more information, see
Close books.
Achieve the above four steps to complete the JPEG compression of an image.
References
[1] Lin fuzong's Image File Format (I)-Windows programming, Tsinghua University Press,
1996
[2] edited by Li zhenhui and Li Ren, exploring the mysteries of image files, Tsinghua University Press, 1996
[3] Li hongsong, translated JPEG static data compression standards, xuanyuan Press, 1996
JPEG Image Compression Algorithm:
The input image is divided into 8x8 or 16x16 segments. Then, two-dimensional DCT (discrete cosine transformation) transformation is performed for each segment. The transformed coefficients are quantified, encoded, and transmitted;
The JPEG file decodes and quantifies the DCT coefficient, computes two-dimensional inverse DCT transformation for each piece, and finally Concatenates the result block into a complete image. After DCT transformation, remove the coefficients close to 0 that do not seriously affect image reconstruction.
The feature of DCT transformation is that most of the energy of the transformed image is concentrated in the upper left corner, because the low-frequency data of the original image is displayed on the upper left, and the lower right shows the high-frequency data of the original image. The image energy is usually concentrated in the low-frequency part.