Part of H.264 paper records

Source: Internet
Author: User

<4x4 fast intra-frame prediction algorithms in H.264>

Author: Wang qiwen, Huang dongjun

At present, the research on intra-frame prediction has made many achievements, such as the Quick three-step method [1]. This algorithm uses the correlation of adjacent prediction directions to selectively calculate the intra-frame direction prediction mode to achieve the goal of optimizing the computation workload with distortion of the gain rate. However, this algorithm does not effectively save the coding time. [2] using the variation direction of the sub-block edge to select the most likely prediction direction from the pre-selection mode, this algorithm uses the edge gradient histogram method, the Optimal Prediction direction is selected among the several most likely prediction directions. The video compression effect is not good when the video scenario is complex. [3] literature [2] algorithm improvement,

However, the improvement on the prediction accuracy is not obvious. [4] a fast adaptive threshold algorithm is proposed. a threshold value is set for rate distortion of the current block based on the adjacent block correlation to determine whether the current block is an intra-frame block in advance, however, the dynamic variability of video images makes it difficult to predict the threshold. In view of this, this paper proposes a new method for measuring the prediction direction.

In H.264, rdo technology is used to traverse all intra-frame prediction modes and select an optimal prediction mode. The prediction mode of intra-Frame Prediction blocks does not directly perform entropy encoding, but subtract them from the most likely mode,

Perform entropy encoding. The most likely mode is most-probable-mode [4], which consists of the Left Block of the current block (A in figure 3) and the top block (B in figure 3). The prediction mode of the current block is most-probable-mode.

Only one bit is used to represent the prediction mode,

The minimum number of BITs is used.

The algorithm in this paper makes full use of the direction characteristics of 4x4 prediction models, as well as the correlation between adjacent blocks and adjacent prediction models. Measure the four direction prediction modes with strong directionality, namely vertical (Mode 0), horizontal (mode 1), diagonal left bottom (Mode 3), and Diagonal right bottom (Mode 4 ). Then, select the candidate prediction mode from the measurement results and calculate whether the current block has a direction. If there is directionality, the prediction model adjacent to the candidate mode is used as the candidate prediction model based on the correlation of the adjacent prediction modes. Otherwise, select mode 2 as the candidate mode. Finally, the prediction mode of adjacent blocks is used as the candidate mode based on the correlation of adjacent blocks.

References

[1] Cheng Chao-chung, Chang Tian-sheuan. fast three step intra prediction algorithm for 4 × 4 blocks in H. 264 [c] // Proc. of IEEE International Symposium on circuits and systems. kobe, Japan: [S. n.], 2005.

[2] Pan Feng, Lin Xiao. fast mode demo-for intra prediction [J]. IEEE Trans. on circuits and systems for video technology, 2005, 15 (7): 813-822.

[3] Li Shiping, Jiang gangyi, Yu Mei. New Method for intra-Frame Prediction Model Selection [J]. Journal of Electronic Science, 2006, 34 (1): 141-146.

[4] Kim B G. fast selective intra-mode Search Algorithm Based on Adaptive Thresholding scheme for H. 264/AVC encoding [J]. IEEE Trans. on circuits and systems for video technology, 2008, 18 (1): 127-133.

[5] Lim Keng-pang. text description of joint model reference encoding methods and decoding concealment methods [Z]. 2005.

 

<New Method for intra-Frame Prediction>

Author: Li Shiping Jiang gangyi yu mei

 

<Fast intra-Frame Prediction Algorithm for H.264 Based on sad and satd>

Author: Xie Cuilan, Zheng Yiling

H.264 Determination of prediction mode within Frame

H. the rate-distortion optimization (rdo) criterion combining brightness and color is adopted in the 264-frame encoding. The rdo criterion is used to traverse all prediction modes, find the least cost of Rate Distortion rdcost as the optimal prediction mode. Formula for Rate Distortion cost:

Rdcost = SSD + λ x rate

Among them, SSD represents the sum of the square difference between the current block and the reconstructed block; λ represents the quantization parameter QP function; rate represents the bit rate after entropy encoding.

In H.264, the process of determining the optimal INTRA-frame prediction mode is as follows:

(1) determine the optimal prediction mode in the intra_16 × 16 mode:

1) use four prediction modes to predict the encoding macro blocks;

2) perform hamama transformation on the Prediction residual macro block and calculate the encoding cost;

3) Select the mode with the smallest encoding cost as the optimal prediction mode in the intra_16 × 16 mode.

(2) calculate the cost of rate distortion of the macro block to be encoded in the intra_16 × 16 mode rdcosti16 by color encoding.

(3) determine the optimal prediction mode in intra_4 × 4 mode.

Divide the macro block to be encoded into 16 4x4 blocks to determine the optimal prediction mode for each 4x4 block:

1) use 9 prediction modes to predict 4x4 blocks;

2) perform DCT transformation/quantization, anti-DCT transformation/anti-quantization on the Prediction Residual Block, and calculate the code cost rdcost;

3) select the optimal prediction mode with the smallest rdcost value as the current 4 × 4 blocks.

(4) calculate the cost of rate distortion of the macro block to be encoded in intra_4 × 4 mode rdcosti4.

(5) determine the optimal INTRA-frame prediction mode.

Compare the rdcosti16 and rdcosti4 of the macro block to be encoded, and select the smaller rdcost as the final brightness encoding method of the macro block to be encoded. The prediction mode corresponding to this method is the optimal INTRA-frame prediction mode.

We can see from the above: Each macro block must be encoded in two ways. A total of 3 328 predictions, 144 decoding/reconstruction, and 68 4 × 4 hadama transformations are required, the complexity is quite high.

Fast Algorithm Flow

The algorithm flow is as follows:

(1) determine the optimal prediction mode in intra_16 × 16 mode. The steps are the same as those in H.264.

(2) calculate the cost of rate distortion of the macro block to be encoded in the intra_16 × 16 mode rdcosti16 by color encoding.

(3) Calculate the sad of the macro block to be coded in the intra_16 × 16 pre-test mode. If sad is <t1 (QP ≤ 20, T1 = 500; other, t1 = 1000), then jump to (6 ).

(4) determine the optimal prediction mode in intra_4 × 4 mode.

Divide the macro block to be encoded into 16 4x4 blocks to determine the optimal prediction mode for each 4x4 block:

1) use 9 prediction modes to predict 4x4 blocks;

2) Perform a hamadma transformation on the Prediction residual blocks corresponding to each mode, and calculate the absolute residual and satd after the transformation;

3) calculate the average satd value satdaverage;

4) select the candidate prediction mode that meets the satd ≤satdaverage condition;

5) perform DCT transform/quantization, reverse DCT transform/reverse quantization on the Prediction residual blocks corresponding to each candidate mode, and calculate the code cost rdcost;

6) Compare the rdcost of all candidate modes, and select the optimal prediction mode with the minimum rdcost as the current 4 × 4 blocks.

(5) calculate the cost of rate distortion of the macro block to be encoded in intra_4 × 4 mode rdcosti4.

(6) determine the optimal INTRA-frame prediction mode. The steps are the same as those of H.264.

 

 

<H.264 intra-Frame Prediction and inter-Frame Prediction> (Master's thesis)

Author: Wu Jing

Research status:

Currently, the Research on Inter-frame prediction mainly focuses on three directions: fast motion estimation, quick reference frame selection, and quick mode selection, especially in H. 264 in terms of the most time-consuming and complex motion estimation in coding, we have invested a lot of research work and put forward many fast algorithms, this reduces the coding efficiency and saves considerable coding computing time. These algorithms can be roughly divided into two types: optimization of motion estimation algorithms, such as hexagonal search algorithms (Harvard) and enhanced prediction area search algorithms (EPZs ), asymmetric cross-shaped multi-hexagonal hybrid search (umhexagons) method and ARPS-3 method. Among them, the umhexagons algorithm achieves a good SNR performance, while saving 264-90% of the computing workload compared with the previous fast full search algorithm adopted by H.264, which has been adopted by JVT. The other is to terminate motion estimation calculation in advance, such as the variable block size optimal motion detection (vbbmd) algorithm proposed by Libo Yang and others.

In terms of intra-frame prediction, the problem of high computing complexity is mainly solved from two aspects: simplifying the cost function, and narrowing the scope of mode selection. Currently, the proposed quick algorithms mainly include: Pan Feng and so on, using the variation direction of the sub-block edge to select the most likely prediction direction from the pre-selection mode, according to the edge direction histogram (edge direction histogram)

Pre-exclude some less likely prediction modes to reduce complexity.

Meng Bo Jun and so on proposed EIP algorithm, using the cost function and multi-Threshold Method to Improve the encoding speed of 4 × 4 sub-blocks. However, the main disadvantage of the above algorithms is that they are complicated and difficult to implement. Therefore, it is of important theoretical significance and application value to study a more effective and easy-to-implement intra-frame prediction mode selection algorithm.

Video Image quality measurement criteria

The difference between the reconstructed image and the original image is generally reflected by the MSE and SNR, which does not reflect the large gray-scale difference between a few pixels and the small gray-scale difference between many pixels. The same treatment of pixels in an image does not clearly reflect the visual characteristics of the human eye. The image is ultimately for people to watch, so it is reasonable

Subjective quality evaluation should also be fully considered in image quality evaluation methods.

Layered Structure in H.264

How to enhance the network adaptability of video encoding and extend the application scope of video encoding standards has become the focus of attention.

For this reason, H. 264 introduces a new hierarchical design concept in the system structure design. The entire coding system is divided into video coding layer (VCL) and network extract action layer (Network encoding action Layer, NAL), 2.2. The video encoding layer VCL is mainly responsible for efficient coding and decoding of digital videos and provides video encoding streams featuring high quality, high compression ratio, robustness, and classification.

This part is also the core part of H.264 video encoding standards. However, bit streams are not generally adaptive to different transmission networks and protocols. Therefore, the H.264 standard defines the network extraction layer nal outside the video encoding layer. NAL is responsible for correctly and properly ing the video encoding data generated by the video encoding layer VCL to different transmission networks.

When the bit stream of the video produced by VCL is transmitted in a specific network, NAL aims at the characteristics of this network and its transmission protocol, the VCL encoding code stream is encapsulated for the Network and its transmission protocol. In this way, H.264 can flexibly provide different encapsulation methods for different transmission networks, enhancing network adaptability. NAL not only makes H.264 highly friendly to different existing networks, but also makes it highly adaptable to future networks.

Frame Prediction in H.264

In order to eliminate the time-domain redundancy of video sequences more efficiently, H.264 inter-Frame Prediction adopts the following new technologies:

(1) variable block size during Prediction

Compared with 16x16 blocks for prediction, blocks of different sizes and shapes can save the bit rate by more than 15%.

(2) precise prediction accuracy.

Motion Estimation uses the temporal correlation of video images to generate the corresponding motion vector (motion vector, MV). Motion Compensation is the motion vector calculated by motion estimation, move the Macro Block in the referenced frame image to the corresponding horizontal and vertical positions to generate a prediction for the compressed image.

H.264 supports Motion Estimation of 264 pixels in the brightness component and 1/4 pixels in the color component. Such precise prediction accuracy can save the bit rate by more than 20% compared with integer precision.

(3) supports multi-reference frame prediction.

Using this technology can improve the performance of motion estimation and improve H. the error recovery capability of the 264 decoder. Compared with using only one reference frame, using five reference frames can reduce the bit rate by 5%-10%.

In the H.264 standard, motion estimation and Motion Compensation account for 264 of the total calculation time. The key to the Motion Compensation prediction coding algorithm is how to use the current frame and reference frame to estimate the motion vector.

H.264 inter-frame mode selection algorithm process is as follows:

(1) perform motion estimation and calculate rdo for the current macro block mode 16 × 16, 16 × 8, 8 × 16;

(2) perform motion estimation and calculate rdo for each 8x8 block model in 8x8, 8x4, 4x8, and 4x4 modes, select the minimum rdo mode as the best mode for 8x8 blocks;

(3) Repeat the second step until three 8x8 blocks are calculated and 8x8 rdo modes are obtained;

(4) Calculate the predicted motion vector and rdo for the Skip mode;

(5) Select the minimum rdo mode from 16x16, 16x8x16, 8x8, 8x4, 4x8, 4x4, and skip as the Macro Block encoding mode between frames.

Hybrid asymmetric Cross multi-hexagonal Search Algorithm

The hybrid asymmetric Cross multi-hexagonal search algorithm (umhexagons) is currently the best search algorithm, which has been used by H.264 official reference software JM. The umhexagons search process involves four steps: motion vector prediction, asymmetric cross search, uneven multi-hexagonal search, and extended hexagonal search.

Figure 3.10 shows the search process when the search window is 16. the start point is (0, 0 ).

(1) perform motion vector prediction based on the H.264 standard algorithm to determine the start position of the search.

(2) asymmetric cross search. It is observed that in a natural motion video image sequence, the horizontal motion must be larger than the vertical motion. Therefore, the optimal motion vector can be preliminarily searched through asymmetric cross-shaped search. Asymmetric refers to two times the horizontal direction of the cross-shaped search range centered on the Search Start Point, and the horizontal length of the cross-shaped search is the width of the search window, the vertical direction is half the height of the search window, and the step between search points is 2. Find the current best match point as the next search center.

(3) Uneven multi-hexagonal search. This step is divided into two steps. First, the entire search is performed in the range of 5x5 with the current search center as the center,

The dots in step 3.10 in step 3.11 are shown. Then, the 16-point hexagonal search mode shown in step is used for multiple hexagonal searches. The hexagonal pattern can cover a larger search area, and the search points on both sides of the hexagonal pattern are more than those on the upper and lower sides. This 16-point hexagonal search is nested layer by layer from the inside to the outside, and the best matching block position is found as the center of the next search.

(4) extended hexagonal search. When the distance between the motion vectors obtained from the Multi-hexagonal search in the previous step is different from the search center, the accuracy of the motion vectors is also different. When the motion vectors are located in the external concentric hexagonal area far from the search center, because of its low precision, it is necessary to further use some center-based search modes for refined search. Generally, hexagonal template search is used. Search with a hexagonal circle with a radius of 2 until the center of the hexagonal circle is the best point, and then continue the search with a small hexagonal circle with a radius of 1, find the best matching position point in the center of the small hexagonal. Compared with full search, umhexagons can reduce the total Pixel Motion Search calculation workload by 90%, while the decrease in SNR is less than 0.1db, And the bit rate remains unchanged. It is a good full pixel fast search algorithm.

Intra-frame encoding

In-frame encoding is mainly used in the following situations:

(1) The first frame (IDR) of the entire video sequence ). Because no frame has been encoded before the first frame of the video sequence as the reference frame, the in-frame encoding method must be used.

(2) I frame. Because I frames are not encoded independently by reference to the information of other frames, I frames must adopt intra-frame encoding.

(3) partial macro blocks in P and B frames. In the existing video encoding standards (except for the second part of the MPEG-4), the input image is first divided in space macro block level, and then the macro block as the basic unit for encoding. For every macro block in P and B frames, it must be encoded in two ways: Inter-frame encoding and intra-frame encoding. The cost of the two encoding modes is calculated using the distortion optimization criterion. For a macro block, if the intra-frame encoding cost is less than the inter-frame encoding cost, the macro block uses intra-frame encoding. This is mainly because the sequence motion is too intense, so that there is no suitable reference block in the reference frame. the Study of intra-Frame Prediction and inter-Frame Prediction in 264 matches with macro blocks to be encoded, leading to the high cost of Inter-frame encoding. On the other hand, when the frame to be encoded is the first frame after the scene switch, because of the loss of time relevance, although it is a p frame or B frame, it still needs to adopt the intra-frame encoding method.

(4) error recovery: When decoding a macro block encoded in interframe mode, you need to obtain the Reference Block information of the former macro block. If an error occurs during transmission due to a channel problem and the decoding end cannot obtain the information of the Reference Block or the error information, the front macro block cannot be correctly decoded in the way of frames. For areas affected by this error, you can use intra-frame encoding to recover the error. At the encoding end, the distortion of the macro block to be encoded is estimated using the source algorithm or the buffer algorithm to determine whether the macro block adopts intra-frame encoding. This error recovery technique is called intra-frame update and is also an important application of intra-frame encoding.

Because the spatial correlation of video sequences is much less than the temporal correlation, to ensure high compression ratio and good reconstruction quality, the changes in the pixel values in different directions are based on the local texture features of the video image, in H. in 264, a total of 17 intra-frame prediction modes are provided, which can accurately predict the texture of images with different local characteristics. According to the statistical characteristics of the image, the correlation of adjacent pixels decreases exponentially with the increase of distance. Therefore, the effects of different block sizes vary in fluctuating scenarios. In scenarios with relatively large transformations and multiple different objects, use relatively small blocks to make more precise predictions for different textures to provide sufficient prediction accuracy. For 4x4 blocks, different texture features in multiple directions can be accurately predicted. For a smooth background area, the use of 16x16 blocks will be better, and generally the background texture is relatively smooth, the ups and downs are small, only four prediction modes are provided in the standard. As the sensitivity of the human visual system to the color transform is less than that of the brightness transform, only 8x8 blocks are used for the color prediction, there are four prediction modes.

 

Intra-Frame Prediction Method

H. 264 uses an intra-Frame Prediction Algorithm Based on Spatial domains. Compared with intra-Frame Prediction in DCT domains in H.263 + advanced intra-frame encoding mode, the average SNR value is increased by 4.37db [7].

In H. in 264 frame encoding, each Macro Block Brightness Signal must complete 9 4x4 prediction modes and 4 16x16 prediction modes, and then select the prediction mode, an Optimal Prediction mode is obtained to ensure the highest encoding performance and efficiency. In H.264, two modes are available:

(1) Rate Distortion Optimization (rdo) method: prediction, integer transformation, quantization, and variable-length coding are required for each prediction mode, and inverse quantization and inverse transformation are performed, compare the encoding bit rate and reconstruction image quality in various modes, and select the best one.

(2) calculate the cost value. The cost calculation method is as follows:

(A) construct a 4 × 4 prediction block p based on a prediction model;

(B) Calculate the absolute error and sad16 between the original block and the predicted block p.

(C) computing cost16 = sad16 + 4 R λ (qP) (4-1)

Lambda (qP) is an exponential function of the quantization factor QP. If the current mode is the most likely prediction mode, the R value is 0, and the R value is 1 in the other 8 cases. Sad indicates the difference between the predicted value and the pixel value of the image.

In order to compare the cost values of each mode more accurately, H.264 also performs Hadamard transformation on these differences, and converts the difference values to the frequency domain to obtain the absolute difference and. Here, we use Hadamard transformation instead of DCT transformation, mainly considering that Hadamard transformation is relatively simple and close to DCT transformation. After calculating the cost value of the macro block,

Select the mode with the smallest cost value as the best prediction mode. For Brightness Signal mode selection, first select an optimal prediction mode based on the minimum cost criterion in 9 4 × 4 prediction modes, then, in the four 16 × 16 prediction modes, an optimal prediction mode is also selected based on the minimum cost criterion, finally, compare a Macro Block's cost16 cost value after a 16 × 16 prediction and the total cost value obtained by adding the cost value after a 16 × 4 prediction, select the prediction mode with the smallest cost value as the final macro block. If the cost value is the same

The prediction mode is prioritized. If the same cost value appears in 9 4x4 prediction modes or 4 16x16 prediction modes, the prediction mode with a smaller serial number will be selected as the best prediction mode.

In fact, the H.264 Encoder usually uses rdo technology to get the best mode. Calculate the cost of output rate distortion for each mode rdcost. Figure 4.6 shows the rdcost calculation process of a macro block in a certain mode. The rate distortion optimization model can effectively improve the encoding Rate Distortion effect. However, the calculation of the Rate Distortion cost requires determining the bit numbers corresponding to the pattern and the distortion between the reconstructed image and the original image, this requires prediction, transformation, quantification, and encoding of each mode to obtain the number of encoded bits. Then, the quantified coefficients are reversed, transformed, and reconstructed, this results in distortion between the reconstructed image and the original image. Therefore, the computation process is very complicated. For the optimization model with no-adoption rate distortion, you only need to make predictions and calculate the sad or SSD (sum of square difference) for the prediction residual ). Therefore, using rdo greatly increases the computing complexity.

Fast intra-frame prediction Mode Selection Algorithm

There are two ways to reduce the complexity of intra-frame prediction: the first is to simplify the cost function, and the second is to narrow the selection range of prediction modes. This article mainly discusses the second method. This type of method is mainly to use some features of the current block and its surrounding pixels to pre-exclude some less likely prediction modes, or to terminate the cost calculation of some less likely models in advance, this reduces the complexity of intra-frame prediction. Because H.264 intra-frame encoding is based on intra-frame redundancy, the spatial correlation is large. Therefore, the number of Fast Algorithms Based on Spatial domains has an absolute advantage.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.