Keywords: pixel block prediction Motion Compensation source encoding
I. pixel block Prediction
The basic Prediction Technology in H.264/AVC standards is based on blocks rather than objects. Its encoder uses a mixed coding scheme to improve coding efficiency. These schemes include advanced Prediction Technology and Effective entropy coding technology. In Motion Prediction, it uses the size of different blocks for prediction and organizes the prediction mode in the tree structure. Its main features are also included in the multi-reference frame prediction method and general B-frame concept. H.264 pixel block prediction encoding includes intra-frame block prediction and inter-frame block prediction. Intra-frame block prediction plays an important role in H.264. In image signal compression coding, as the Brightness Signal and chromatic aberration signal are processed separately, the prediction can be divided into Brightness Signal prediction and chromatic aberration signal prediction. The following two different prediction methods are discussed respectively.
1. Intra-Frame Prediction
The prediction encoding method adopted in H.264/AVC is different from the MPEG-4 video encoding standard in 14496-2, and the intra-frame prediction is performed before the transcoding. In H.264/AVC Standard intra-frame prediction, the sampling value of the current block is always obtained using the neighboring block sampling value. This may cause the spread of image errors due to the motion compensation error of macro blocks encoded between adjacent frames. Therefore, there is also a restricted intra-frame encoding mode in the image encoding process. In this case, only the adjacent intra-Frame Prediction macro block is allowed to be known before the macro block can be used as a reference macro block.
H.264 adopts a new intra-frame (INTRA) prediction mode based on the correlation between adjacent pixels and adjacent pixels. Prediction is performed based on the left and top pixels of the current pixel block (encoded and reconstructed pixels. Only encode the difference between the actual value and the predicted value, so that the pixel block information encoded within the frame can be expressed with a small number of bits. In H. in the 264 Standard, the brightness block can have 9 4x4 blocks and 4 16x16 intra-frame prediction modes, the four color 8x8 models are the same as the four brightness 16x16 models. For each 4x4 blocks (except for edge blocks for special disposal), each pixel can have different weights of 17 closest previously encoded pixels (some weights can be 0) that is, 17 pixels in the upper left corner of the block where the pixel is located. Obviously, this intra-frame prediction algorithm is not based on time, but on the spatial domain. It can remove the space redundancy between adjacent blocks and achieve more effective compression. 1, A, B,… in 4 × 4 blocks ,... P is 16 pixels to be predicted, while a, B ,... P is the encoded pixel. According to the selected prediction reference, there are nine different brightness modes, but the intra-Frame Prediction of the color is only one.
The 4 × 4-frame prediction method is used to encode the image details. The basic idea is to calculate and compare the Brightness Difference between each pixel in a block in different directions, that is, the gradient value. Select the direction with the smallest prediction error as the best prediction direction. The following is an example of the prediction mode. 1, of which 16 sample values (~ P) is the sampling value (~ Q) use various prediction modes to make predictions. This prediction mode has eight directions, and nine models are added for average computing of each pixel.
2. 4 × 4-frame prediction mode Encoding Process
The decoder will be informed of the choice of the prediction mode within each 4x4 frames, which may require many bits. However, the nearby 4x4 in-frame mode is very relevant. For example, if the 4 × 4 blocks A and B previously encoded in Figure 2 are predicted using mode 2, the best mode of Block C (current Block) is probably also Mode 2. Most_probable_mode is the most suitable mode for computing each current Block C, encoder and decoder. If both A and C are encoded in 4x4 internal mode and both are in the current slice layer, the most appropriate mode (most_probable_mode) is the minimum amount of A and C prediction modes; otherwise, most_probable_mode is set to Mode 2 (DC prediction ).
The encoder sends a flag for each 4x4 block and uses the most suitable mode use_most_probable_mode. If the flag is "1", use the most_probable_mode parameter. If the flag is "0", send another parameter remaining_mode_selector to indicate the mode change. If the remaining_mode_selector is smaller than the current most_probable_mode, the prediction mode is set to remaining_mode_selector; otherwise, the prediction mode is set to remaining_mode_selector + 1. In this way, the remaining_mode_selector only needs eight values (0 to 7) to represent the current intra-frame mode (0 to 8 ).
3. Brightness Signal prediction within 16x16 frames
The 16 × 16 prediction method is based on 16 × 16 blocks and used to encode the relatively unchanged part of the image. There are only four prediction modes: vertical prediction, horizontal prediction, DC prediction, and plane prediction. You can use an optional mode with a 4x4 brightness component to predict the entire 16x16 brightness component of a macro block. There are 4 modes.
Mode 0: (vertical prediction) interpolation by the above sample value (h );
Mode 1: (horizontal prediction) interpolation from the sampling value on the left (v );
Mode 2: (DC prediction) interpolation by the average sample values above and left (H + V );
Mode 4: (plane prediction) interpolation by the above and left sampling values.
The linear "plane" can be set to interpolation by the sample values H and V on the top and left, which is better in the smooth brightness area.
4. Intra-Frame Prediction of the color signal
Each 8x8 color component of a macro block is predicted based on the sample value of the previously encoded and reconstructed color component on the left side. Because the color is relatively flat in the image, the prediction method is similar to the intra-Frame Prediction of the 16x16 Brightness Signal. There are also four intra-frame prediction modes: vertical prediction (Mode 0), horizontal prediction (mode 1), DC prediction (Mode 2), and plane prediction (Mode 3 ).
5. Inter-Frame Prediction
Interframe prediction uses previously encoded frames as a reference image to predict the current image. It uses Motion Vector Compensation for the sampling points of the reference image as the reference values of the current image sampling value. The H.264/AVC Standard Uses block structure motion compensation that has been used since the H.261 Standard. However, the biggest difference between it and the early standard is: ① supports prediction of multiple block structures; ② the accuracy of calculation can be accurate to 1/4 pixels.
In the H.264/AVC standard, the multi-Frame Prediction Method Used in H.263 is also used. The main idea is to increase the estimated reference frames of the timeline in the motion vector. At the macro block level, one or more front video frames can be selected as the reference frame. The multi-Frame Prediction Method Used for Motion Compensation significantly improves the prediction gain in most cases.
Next we will discuss the inter-frame prediction methods used in two different types of slice. Before explaining them, we will first introduce the Motion Compensation of the tree structure, which mainly describes the block of the macro block.
Interframe prediction is used to reduce the time-domain correlation of images. precise prediction is performed on the next frame by using multiple frame references and smaller Motion Prediction regions, thus reducing the amount of data transmitted. Each brightness macro block is divided into areas of different shapes as the Motion Description area. As shown in figure 4, there are four Classification Methods: 16x16, 16x8x16, and 8x8. When the 8x8 mode is used, it can be further divided into 8x8x4, 4x8, and 4x4 sub-areas. Each region contains its own motion vector, and each motion vector and region selection information must be transmitted through encoding. Therefore, when a large area is selected, the amount of data used to represent the motion vector and the selected area is reduced, but the residual data after motion compensation increases. When a small area is selected, the residual data is reduced, the prediction is more accurate, but it is used to indicate that the amount of data selected by the motion vector and region increases. Large areas are suitable for reflecting the homogeneous parts between frames, and small areas are suitable for displaying the details of frames.
In H. in 264, the accuracy of Motion Prediction has also been improved. For images in qcif (144x176 pixels) format, the accuracy is 1/4 pixels. For images in CIF (288x352 pixels) format, 1/8 pixel precision. Among them, the 1/4 pixel interpolation is to use a 6-tap filter for horizontal and vertical filtering to obtain the semi-pixel interpolation point, and then perform linear interpolation; the 1/8 pixel interpolation directly uses an 8-tap filter for horizontal and vertical filtering.
In the intra-frame encoding mode, H.264 performs bidirectional prediction on the spatial coefficient instead of the transformed coefficient (see H.263 + advanced intra-frame prediction mode ). In addition, like Appendix N of H.263 +, H. 264 supports the reference frame optional mode, that is, when encoding subsequent images, you can choose to use the previous reference frame (more than one frame) in the encoding cache for motion valuation.
H. 264 Apart from supporting I frame, P frame, and B frame, a new image type SP frame is also proposed. SP frame is also a prediction Encoding Frame, you can change the image used for this frame prediction as needed. SP frames can be used for operations such as channel rate change, video bit stream switching, and random stream access. SP frames are widely used in Video Communication and streaming media transmission over time-varying wireless channels.
Ii. Motion Compensation
1. Tree Structure Motion Compensation
H.264 adopts Macro Block Segmentation and subsegmentation methods of different sizes and shapes. The 16x16 brightness values of a macro block can be divided by 16x16, 16x8, 8x16, or 8x8. If you select 8x8, you can also perform subsegmentation Based on 8x8, 8x4, 4x8, or 4x4, as shown in Figure 5. These macro block segmentation and subsegmentation will make each Macro Block contain many different size blocks. The method of motion compensation using blocks of various sizes is called tree structure motion compensation (tree Structured motion compensation ). Each brightness block generated by Macro Block Segmentation and Sub-division has its own independent motion vector. The color value in the macro block is the same as the brightness value. However, due to the sampling, the size of the Color Block is half of the brightness block, when a color block uses a motion vector, it must divide each component by 2. Secondly, H.264 can reach the Motion Precision of 264 pixels, which is obtained by interpolation of the brightness value of the entire pixel. The interpolation process first obtains the half-pixel precision through a 6-stroke filter, and then uses a linear filter to obtain the accuracy of 1/4 pixels. Because of the sampling, the Motion Precision of the color is 1/8 pixels, which is obtained through linear filter interpolation. H. 264 use the motion vectors of the encoded blocks to predict the motion vectors of the unencoded blocks. Finally, you only need to encode and transmit the difference between the actual motion vectors and the predicted values.
Again, H. 264 you can also use multi-reference images (up to five forward and backward frames each) for Motion Prediction, so that you can perform periodic motion, the translation closed motion and the video stream that is constantly switching between two scenes have very good Motion Prediction results. With Multi-reference images, H.264 not only improves coding efficiency, but also achieves better bit/code recovery, but also requires additional latency and storage capacity.
Finally, H.264 uses backward Motion Prediction in B images, which is consistent with previous standards. However, B images can also be used as reference images for other images through weighting.
The block size of motion compensation obtained in H.264/AVC is no longer limited to macro blocks. You can obtain motion vectors from macro blocks or subblocks.
Each motion vector is encoded and transmitted, and the multipart method must also be encoded in the data stream. Select a large block mode (16x16, 16x8, or 8x16). You may only need to transmit a small number of BITs to describe the motion vector and block mode, however, after motion compensation, the sampling point difference value may be relatively large. Selecting a small chunking mode (8 × 4 or 4 × 4) may be able to obtain the sampling point difference after small motion compensation, but it takes a lot of bit to transmit the motion vector and the chunking mode. Therefore, the selection of parts will have a significant impact on the compression effect. Generally, a large part is used for the uniform part within the frame, while a small part is used for detailed description in the image. Each color block is segmented based on the brightness. Because the color resolution in the macro block is half of the brightness resolution, the size of the Color Block is only half of the brightness block in both the horizontal and vertical directions. At the same time, the vertical motion vectors and horizontal motion vectors on the Color Block are only half of the brightness block.
2. Inter-Frame Prediction in P-type slices
In the past, the highest motion estimation accuracy of General video compression is half pixel (half pixel), for example, in the 14496-2 standard (that is, the MPEG-4 video encoding part) of the basic compensation technology is the use of half pixel accuracy, the interpolation method is also a simple bilinear interpolation method, and its quality of compensation is relatively low. Different from the above, the accuracy of Motion Estimation in the H.264/AVC standard must be 264 (quarter pixel.
In H. in the 264/AVC standard, multi-Frame Motion Compensation prediction can be used for p-type slice encoding, that is, more than one previously encoded frame can be used as a reference frame for the current frame motion compensation.
Multi-Frame Prediction requires the decoder and encoder to store multi-frame images in the buffer as reference frames. The decoder uses parameters in the bit stream to set the information memory management control operation (memory management control operation) to replicate the same multi-frame buffer of the encoder. At the same time, for each motion compensation 16x16, 16x8, 8x16 or 8x8 and its sub-blocks, you need to transmit reference index parameters, used to determine the position of the block or sub-block reference frame in the cache. The P-type prediction method corresponds to the block in the macro block, that is, it corresponds to the block in the macro block described earlier. In P-type films, the prediction mode is not only described in the table but also in-frame prediction mode. In addition, for the p_8 x 8 prediction method, it corresponds to the 8x8 prediction, and there can be a subblock-based prediction mode.
For different prediction modes in P frames, we can see the interpret_mb_mode_p function in JM. In this function, we store the block mode of the front macro block and its mode into the currmb data structure, in this way, the readmotioninfofromnal function can obtain the motion vector of the current block and the Prediction difference value in the readcbpandcoeffsfromnal function.
3. Prediction between frames in B-type films
Compared with previous image encoding standards. in the 264/AVC standard, the concept of B-type slice is promoted. The biggest feature is that B-frame composed of B-type slice can be used as a reference frame for other images. The most essential difference between B and P is that the prediction values of macro blocks or its subblocks in B are obtained by weighted average of two Compensation Values of different motion. The B-type slice uses two sets of different reference images: list0 (Forward reference image set) and list1 (backward reference image set ).
In B, four different macro block prediction modes support the following four modes: ① direct mode ): in this way, you do not need to transmit additional information such as motion vectors ). ② One-way prediction method (inter mode): Only one macro block prediction information is sent. ③ Multi-hypothesis prediction mode: Two macro block prediction information must be transmitted. ④ Intra mode ). The following describes the direct prediction method and multi-hypothesis prediction method respectively.
(1) Direct Mode)
The two-way prediction method is used directly to transmit the prediction difference value. In this way, forward and backward motion vectors (mv0, mv1) are the motion vectors (MVC) of the co-located macroblock) identified by the reference image (rl1) calculated. The Macro Block using the direct prediction method should have the same block as the public-determined macro block.
Mv0 is the forward motion vector, and mv1 is the backward motion vector. MVC represents the block motion vector determined by the image between subsequent frames. For the previous concept of B frame, TDD is the time difference between the previous frame and the next frame of the current frame. TDB is the time difference between the current B-frame image and the previous frame. This concept changes when multi-frame prediction is used. TDB is the time difference between the current frame and the forward reference frame rl0, while TDD is the time difference between the forward reference frame rl0 and the backward reference frame rl1. In the H.264/AVC standard, the direct prediction method performs hybrid weighting on the Prediction signal, which is improved compared with the average weighting method in the previous standard. This technology is most suitable for music TVs and movie ends, which are usually weak. Especially at the end of a movie, the scene gradually fades into a dark screen. There is no good compression method in the previous standard, which is difficult to encode. If we encode this phenomenon into the Pbbb mode, because the average weighting will make the first and third B frames worse than the surrounding frame, inter-frame, and second B frames, however, the hybrid weighting method based on the relative distance between frames can be greatly improved.
C Indicates the sampling value in the macro block or block of the current B frame, and CP indicates the sampling value in the macro block or block of the forward reference image, CS is the back-to-reference image to predict the sampling value in a macro block or block.
(2) Multi-hypothesis prediction (multihypothesis Mode)
The multi-hypothesis prediction method needs to overlay the prediction values of two macro blocks by two motion vectors. We call each predicted block a hypothesis ). Finally, the prediction block is obtained after the predicted values are averaged using two hypothetical motion vectors. Multi-hypothesis prediction is different from bidirectional prediction. Two-way prediction only allows linear combination of backward and backward prediction pairs. See Figure 8. Many assumption that the prediction method removes this restriction, and it can get the final forecast value by one direction prediction pair. Prediction can be performed (Forward, forward) or (backward, backward.
When the first hypothesis comes from the previous reference image and the second hypothesis comes from the back reference image, the multi-hypothesis prediction method can also become a two-way prediction method.