original link: http://media.pkusz.edu.cn/achievements/?p=40
AVS2 adopts the traditional hybrid coding framework, the whole coding process includes intra prediction, frame prediction, transform quantization, loop Filter and Entropy coding module. Has the following technical characteristics:
Figure 1 AVS2 Coding Framework
1. Coding Structure Division
In order to meet the compression efficiency requirements of HD and Ultra HD resolution video, the AVS2 adopts a more flexible block partitioning structure based on quadtree, the Maximum encoding unit (coding, CU) is 64x64, and the minimum coding unit is 8x8. At the same time, AVS2 also uses a flexible predictive unit (PREDICTIONUNIT,PU) and a transform unit (Transform unit,tu). The image to be encoded is first segmented into a fixed size maximum encoding unit (largest coding UNIT,LCU), which is then iterated into a series of CU by a four-fork tree. Each CU contains a luminance encoding block and two corresponding chroma encoding blocks (the size of the block unit in the following section refers to the brightness code block).
Fig. 2 The relationship between the original image, Lcu and Cu, and the partition structure of the four-fork tree.
The predictive unit PU is the basic unit for intra prediction and frame prediction, and its size cannot exceed the current-owned CU. On the basis of the square frame prediction blocks in the previous generation standard, the AVS2 of the intra prediction block is added, and the frame prediction is based on the symmetry prediction block, and 4 kinds of asymmetric partitioning methods are added.
Fig. 3 The method of dividing prediction units between frames and frames
In addition to CU and PU,AVS2, transformation unit TU is defined for predicting residual transformations and quantification. Tu is the basic unit of transformation and quantification, as in the case of Pu, defined in CU. For intra mode, TU is bound to Pu, the same size. For the frame pattern, tu can select a chunk partition or a small block partition. The chunk divides the entire CU as a tu; when the small block is divided, the CU block is divided into 4 small pieces of tu, the choice of the size is associated with the corresponding PU, if the current CU is divided into square pu, then the corresponding TU will use the square partition; 2. Intra-frame predictive coding
Intra-frame prediction eliminates the redundancy of the image to be encoded in the airspace. AVS2 supports 33 Intra prediction modes, including DC predictive mode, plane predictive model, bilinear predictive model and 30 angle prediction mode. Provides richer and more detailed intra prediction patterns than AVS1 and H.264/AVC,AVS2. At the same time, in order to improve the accuracy, the AVS2 adopts the 1/32-pixel interpolation technique, and the pixel pixels are interpolated by 4-Contact linear filters. There are 5 modes on the Chroma block: DC mode, horizontal prediction mode, vertical prediction mode, bilinear predictive mode, and new brightness export (Derived mode, DM) mode.
Fig. 4 Prediction mode of luminance block in frame
3. Inter-frame Predictive coding
In contrast to intra prediction, inter-frame prediction is used to eliminate redundancy on time domain. Compared with the previous generation AVS1 and H.264/AVC coding standards, AVS2 's frame prediction technology has been strengthened and innovated in the prediction mode.
The traditional frame prediction technology only has p frame and B frame, p frame is forward reference frame, the prediction unit can only refer to the prediction block in one frame, the B frame is bidirectional frame, and the predictive unit can refer forward and/or backward to the prediction block in the frame image. AVS2 on this basis, the forward multiple hypothesis F-frame is added, and the scene frame (g frame and GB frame) and reference scene frame S frame are designed for the specific applications such as video surveillance and sitcom.
For frame B, AVS2 has a unique symmetric pattern in addition to the traditional forward, back, bidirectional, and skip/direct patterns. In symmetric mode, only the forward motion vector is coded, and the back motion vector is deduced by the forward motion vector.
For f frames, the predictive unit can refer to the forward two reference blocks, which is equivalent to the multiple frame reference of P frames. AVS2 is divided into two kinds of hypotheses, namely, time domain double hypothesis and spatial double hypothesis. The current prediction block of time domain double hypothesis uses the predictive block weighted average as the predictive value of the current block, but the motion vector difference MVD and the reference image index are only one, and the other one MVD and reference image index are deduced according to the distance on the time domain by linear scaling. The spatial double hypothesis prediction is also called directional multiple hypothesis prediction (directional multi-hypothesis PREDICTION,DMH), which is obtained by fusing two predicted points around the initial prediction point, and the initial prediction point is on the line of the two prediction points.
Fig. 5 O'Clock double hypothesis prediction of field
Fig. 6 Dual-hypothesis prediction of airspace (DMH) 4. Motion Vector Prediction
The motion vector Prediction Technique utilizes the correlation of adjacent blocks, based on the motion information of the coded neighboring blocks, the motion vectors of the current encoding block are predicted, and the difference between the motion vector MV and the predicted motion vector MVP is coded, which reduces the bit number of the coded motion vector and saves the code rate.
In AVS2, 4 kinds of motion vector prediction methods are used for different frame prediction modes: mean prediction, spatial prediction, time domain prediction, spatial and temporal hybrid prediction. In order to save the code rate further, when the MVP and the motion estimate the MV greater than a certain threshold value, use the 1/2 precision MV and MVD, otherwise still use 1/4 accuracy.
Forecasting method |
Specific method description |
Mean value prediction |
Using the mean value of the motion vectors of the coded adjacent blocks as the predictive value |
Spatial prediction |
Using motion vectors of coded adjacent blocks as predictive values |
Time Domain prediction |
Use the motion vector scaling value of the same block in the time domain as the predicted value |
Mixed prediction of spatial and temporal domain |
Using the motion vectors of the same block in the time domain plus the motion vectors of the coded neighboring blocks as the predictive value |
5. Transform
The aim of the transformation is to remove the spatial correlation, to concentrate the energy of the spatial signal on a small number of low-frequency coefficients in the frequency domain, and then to encode the transformation coefficients later. The transformation coding in AVS2 mainly uses integer DCT transform. For 4x4, 8x8, 16x16, 32x32 size of the transformation block directly to the integer DCT transform. But for the 64x64 size of the transformation block is a logical transformation, the first wavelet transform, and then the integer DCT transform. After the DCT transform is completed, the AVS2 to the low frequency coefficient of the 4x4 block again two times 4x4 transformation, thereby further reducing the correlation between the coefficients, is more concentrated energy.
Figure 7 4x4 Two-time transform 6 entropy Coding
The entropy coding of AVS2 first divides the transformation coefficients into 4x4 size coefficients (coefficient group,cg), and then ZIG-ZAG scans and context based two-yuan arithmetic codes according to the coefficient group. The coefficient encoding first encodes the CG position with the last non 0 coefficient, then encodes each CG until the CG coefficients are encoded so that the 0 coefficients are more concentrated in the coding process. 7. Loop Filter
In order to eliminate the undesirable effects of block effect, ringing effect, chroma shift and blurred image, the AVS2 loop filter includes three parts: Block filter, adaptive sample point offset and sample compensation filter. The purpose of the block filtering is to eliminate the block effect caused by the transformation quantization, the basic filtering unit is 8x8 block, the vertical boundary is filtered first, then the horizontal boundary. Different filtering methods are chosen for each boundary according to the filtering intensity.
After the block filtering, the Adaptive sample offset compensation is used to further reduce the distortion. There are two modes of compensation: boundary Compensation and Edge-band compensation. The boundary compensation is divided into four filtering directions: vertical, horizontal, oblique and oblique. The Edge band compensation adds a different offset value to each edge band, depending on the amplitude of the reconstructed value of the pixel point.
After the block filtering and sample offset compensation, AVS2 added an adaptive filter, a 7x7 cross plus 3x3 Square Center symmetrical Wiener filter. The least squares filter coefficients are computed by the original lossless image and the reconstructed image, and the reconstructed image is filtered to reduce the compression distortion and improve the reference image quality. 8. Scene Code
For special scenes such as surveillance video, sitcom and so on, the redundancy of video image is a large part of the background. Therefore, AVS2 designed a monitoring tool based on background model to improve the compression efficiency of these specific scenes. When the monitoring tool is not turned on, I frames only refer to the image before the next random access point. When the monitoring tool is turned on, AVS2 will use a frame in the video to make the scene g frame, and the G frame can be used as a long-term reference for the following image. In addition, in order to prevent the coding of background frames resulting in a steep rate, with the large delay on the transmission, AVS2 uses a background frame technique based on block update, and selects no more than a certain proportion of LCU in each frame image as the background refresh block, then refreshes the background frame reference image after one frame is finished.
Fig. 8 summary of scene frames based on background model
AVS2 has made a lot of improvements and innovations in traditional coding technology, such as the addition of F frame and background frame in the prediction of frames, so that the compression efficiency of video can be greatly improved. Especially the scene coding, the coding method based on the background model greatly enhances the compression efficiency of video surveillance and so on, which is more than one times than the international standard of the same period. These are the technical overviews of the AVS2 standard, and the next "standard commentary" column will provide a detailed description of the technology in AVS2 by module.
(This article for original works, reprint please indicate the source)
Welcome to the public number, receive more technical dry goods in time