HEVC algorithm and Architecture: inter-frame prediction of predictive coding

Source: Internet
Author: User
Tags coding standards


Inter-frame prediction of predictive coding (inter-picture prediction)

Inter-frame prediction refers to the use of video time domain correlation, using neighboring encoded image pixels to predict the current image of the pixel, in order to effectively remove the video time domain redundancy. Since the video sequence usually includes strong time-domain correlation, the predicted residual value is close to 0, and theresidual signal is transformed, quantified, scanned and entropy encoded as the input of subsequent modules, which can realize the efficient compression of the video signal.

One, inter-frame prediction Coding principle

At present, the main video coding standard inter-Frame Prediction section uses block-based motion compensation technology, as shown in the basic principle is: the current image of each pixel block in the previously encoded image to find a best match block, the process is called motion estimation Me (motion estimation). Where the image used for prediction is called a reference image, the displacement of the reference block to the current pixel block is called motion vector mv(motion vector), and the difference between the current pixel block and the reference block is called the Predictive residuals ( Prediction residual). The residual signal is transformed, quantified, scanned and entropy coded as the input of subsequent modules, which can effectively compress the video signal. According to the motion vector MV, the previous frame (or the first few frames, the last few frames) of the motion displacement block image to make the corresponding displacement of the current frame of the current block motion prediction estimates, so that the current frame of the frame prediction frame, this process is called the motion compensation MC (Motion compensation).

It is important to note that the motion vectors obtained by motion estimation are not only used for motion compensation, but also transmitted to the decoder, and the decoder can obtain the exact same predictive image from the coding end according to the motion vector, thus realizing the correct image decoding.




In fact, there are many similarities between intra-frame prediction and inter-frame prediction, except that the reference pixel used in the intra-frame prediction is derived from the encoded pixel value of the current frame, and the reference pixel of the inter-frame prediction is derived from the previously encoded frame (or the first few frames, the last few frames). Similar to inter-frame prediction, the encoder needs to pass the motion vector MV to the decoding end, and the decoding end can obtain the exact same prediction block according to the motion vector, and the encoder needs to pass the actual intra-frame predictive mode information to the decoder in the intra-frame coding mode. The decoding end can obtain the exact same frame prediction block as the encoder according to the prediction mode information. It can be seen that motion vectors and intra-frame prediction patterns are of the same importance, and they are represented in the macro size with specific grammatical elements.

The key technology of inter-frame predictive coding

In the inter-frame predictive coding process, the most important operations are motion estimation, MV prediction, and multi-reference frames and weighted predictions, which are analyzed in one of the following.

1, motion estimation

The so-called motion estimation me refers to the process of extracting the current image motion information. In motion estimation, the common motion representations include: pixel-based motion notation, region-based motion notation, and block-based motion notation.

(1), pixel-based motion notation: assigns a motion vector to each pixel directly. This method is generally applicable, but it is necessary to estimate a large number of unknowns, and the solution usually does not reflect the actual motion of the object in the scene, in addition, the method needs to be attached to each pixel to transmit a MV, the amount of data is very high.

(2), region-based motion notation: divides an image into multiple regions, allowing each region to represent exactly one complete moving object. This method by default in each region of the pixel has the same motion form, suitable for the scene containing multiple moving objects, however, the shape of the moving object is often irregular, so the region needs a lot of information to characterize, and accurate partition method requires a large number of calculations to determine, Thus the region-based representation is less used in practice.

(3), block-based motion notation: The image is divided into different sizes of pixel blocks, as long as the block size is appropriate, the motion of each block can be regarded as uniform, while the motion parameters of each block can be independently estimated. This method, which takes into account the accuracy and complexity of motion estimation, is a tradeoff between the two, so it is the core technology in the international standard of video coding.

The block-based motion estimation method has three core problems which require special attention. One is the criterion of motion estimation, the other is search algorithm, and the other is subpixel precision motion estimation.

1.1, motion estimation criteria

The goal of motion estimation is to find the best matching block in the reference image for the current block, so a criterion is needed to determine the matching degree of two blocks. Common matching criteria include minimum mean square error MSE(Mean Squareerror), minimum mean absolute error MAD(Mean Absolute Difference) and the maximum number of matching pixels MPC(matching-pixel Count).

To simplify the calculation, the absolute error and SAD(Sum of Absolute difference) are generally used instead of MAD. In addition, the minimum transform domain absolute error and SATD(Sum of Absolute transformed difference) are also an excellent matching criterion.

1.2, search algorithm

In some application environments, video-encoded transmission has high real-time requirements, while motion estimation is usually of high computational complexity, so high-performance and low-complexity motion search algorithm is particularly important.

Commonly used search algorithms are: Full search algorithm, two-dimensional log search algorithm, three-step search algorithm and so on. The full search algorithm is to calculate the matching error of two blocks for all possible positions in the Search window, and the minimum matching error corresponding to the MV must be the best MV of the global .

However, the full search algorithm is extremely complex and cannot satisfy real-time encoding. In addition to the full search algorithm, the rest of the algorithm is collectively referred to as Fast search algorithm, fast search algorithm has the advantage of fast, but its search process is easy to fall into the local optimal point, so as not to find the global best advantage. In order to avoid this phenomenon, it is necessary to search for more points in every step of the search algorithm, and the related algorithms are the umhexagons algorithm in JM and HM The tzsearch algorithm in the.

1.3, Subpixel precision Motion estimation

Because of the continuity of motion in nature objects, the motion between two adjacent images may not necessarily be the whole pixel as the basic unit, but it is possible to use half-pixel, quarter- pixel or even 1/8 pixels. At this point, if only using the integer pixel precision motion estimation will appear the problem of matching inaccuracy, resulting in a large motion compensation residual amplitude, affecting coding efficiency.

To solve the above problems, the accuracy of motion estimation should be raised to sub-pixel level, which can be achieved by interpolating the pixel points of the reference image. Quarter pixel accuracy compared to the pixel accuracy, the coding efficiency is significantly improved, but 1/8 pixel accuracy compared to Quarter the coding efficiency of pixel accuracy is not significantly improved in addition to the high bit rate situation and 1/8 Pixel Precision Motion estimation is more complex. as a result, both the existing standard, H , and HEVC use pixel accuracy for motion estimation.

2,MVPredictive Technology

In most images and videos, a moving object may cover multiple motion compensation blocks, so the motion vectors of adjacent blocks in the spatial domain have strong correlations. If the current block MV is predicted using adjacent encoded blocks , coding the difference will significantly save The number of bits required to encode the MV. At the same time, because the motion of the object has continuity, the MV in the same position of the adjacent image has a certain correlation. both spatial and temporal MV prediction methods are used in H .

In HEVC , the MV of the current block is predicted in order to make full use of the MV of the airspace and time domain adjacent blocks to save mv Number of encoded bits,HEVC presents two new technologies for the MV prediction:Merge technology and AMVP(advancedMotion Vector prediction) technology.

       merge technology and AMVP technology uses airspace and time domain mv mv pu mv

(1 merge pu MV directly from the airspace or the time domain near the pu predicted, there is no MVD AMVP MV predictive technology, the encoder only needs to the actual MV MV MVD

(2), the candidate MV list length is different, the way to build the candidate MV List also has a difference.

2.1,MergeTechnology

Mergemode will be the currentPUCreate aMVcandidate list, list exists5a candidateMVand the corresponding reference image. by traversing this5a candidateMV, and the cost of the rate distortion is calculated, the final selection rate of distortion cost the least one as theMergethe optimal modeMV. If the edge decoding terminal builds the candidate list in the same way, the encoder only needs to transmit the optimalMVThe index in the candidate list allows for significant savings in the number of encoded bits of the motion information. MergeModel-basedMVThe candidate list contains both airspace and time domain scenarios, while forB Slicecontains the way the list is combined.

2.2,AMVP(Advanced Motion Vector prediction) Technology

AMVP uses the correlation of motion vectors in airspace and time domain to establish a candidate predictive MV list for current PU . The encoder selects the best predictive mv, and the mv is differentially encoded; The decoding end requires only the motion vector residuals (MVD ) by establishing the same list ) and predict the number of MV in this list to calculate the current PU mv.

Similar to the Merge pattern,theamvp candidate MV list also contains both airspace and time domain scenarios, unlike AMVP The list length is only 2.

3, multi-reference image and weighted prediction

For certain scenes, such as periodic changes of objects, multi-reference frames can greatly improve the accuracy of prediction. The early video coding standards supported only a single reference image, and h.263+ began to support multi-reference image prediction techniques, while h . x supports up to a reference image, and as the number of references increases, Coding performance also increases, but the speed is increasing slowly, so in order to weigh the coding efficiency and encoding time, generally use 4~6 reference image.

In addition , The weighted prediction technique was used. A weighted prediction means that the predicted pixel can be multiplied by a weighted coefficient using one of the pixels (for P Slice case) or two (for B Slice) reference images. HEVC has adopted the weighted prediction Technique in H + + and has made further progress.


HEVC algorithm and Architecture: inter-frame prediction of predictive coding

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.