H.264 baseline and main profile

Source: Internet
Author: User

H. the 264 codec framework and the previously proposed standards such as H.261, H.263 and MPEG-1/2/4 are not significantly changed, is also based on the mixed encoding scheme: motion vectors represent the motion content of each frame in an image sequence. Motion Estimation and compensation are performed using the decoded frame or the intra-frame prediction technology is used, the resulting image parameter differences must be transformed, quantified, and entropy encoded. Therefore, the performance improvement of the new standard lies in the improvement of the technical solutions of each part and the application of the new algorithm.

The new standard has done a lot of work in improving the fault tolerance of image transmission, and redefined the structure division suitable for images. During encoding, each part of the image frame is divided into multiple slice structures, and each slice can be decoded independently from other parts. Slice consists of a macro block, the most basic structure of the image. Each macro block contains a 16x16 brightness block and two 8x8 color blocks. To further improve robustness, the entire system is divided into video encoding layer and network abstraction layer. The video encoding layer mainly describes the video content carried by the video data to be transmitted. The Network abstraction layer considers different applications, such as video conferencing communication, h.32x continuous packet transmission, or RTP/udp/IP communication.

H.264 standards are divided into three profiles: baseline, main, and X, representing the set of algorithms and technical limits for different applications. Among them, baseline mainly includes technical features of low complexity and low latency; mainly for interactive applications; Considering the fault tolerance in harsh environments, the contents of baseline are basically included by other higher-level profiles. The main profile is intended for applications with higher encoding efficiency, such as video broadcast. X profile is designed mainly for streaming media applications. In this framework, all fault tolerance technologies and flexible access and switching technologies for bit streams will be included.
(1) Main Technical Features of baseline profile. The baseline decoder only performs operations on I slice and P slice.For inter-frame prediction, in order to make more precise predictions and compensation for the motion content of the image, The new standard allows further partitioning of macro blocks into 16x16, 16x8, 8x16, 8x8, 8x4, 4x8, 4x4 sub-blocks; motion Estimation is accurate to the 1/4 pixel position obtained through the 6-tap filter. motion vectors are predicted by adjacent blocks, and the difference value is encoded and transmitted.H. 264 supports multi-Reference Frame Prediction. the maximum number of reference frames for motion estimation is 15. The use of multi-reference frames greatly improves the error tolerance of image transmission, this restricts the spread of errors in space and time. For all slice encoding types, H.264 supports two types of intra-frame encoding: 4x4 and 16x16.For the 4x4 mode, each 4x4 module has eight prediction modes and DC prediction modes in different directions. For the 16x16 mode, each 16x16 brightness block has four inframe prediction modes. 8x8 color sampling of macro blocks uses a prediction mode that is almost the same as 16x16 brightness. To ensure the slice encoding independence, intra-Frame Prediction cannot cross the slice boundary. For transformation and quantification. Unlike the previous standard, which uses DCT to encode the difference of prediction parameters, H.264 uses a simple integer transformation. Compared with DCT, this type of transformation has almost the same compression performance and many advantages. Its core transformation only uses addition and subtraction and shift operations to avoid loss of precision. The quantizer Of the 52-step is used for quantization of the transformation parameter coefficients, while the H.263 standard is only 31. The quantization step increases by 12.5%. The larger the quantization step range, the larger encoder can control the bit rate and image quality more flexibly and accurately. Entropy Encoding. For the quantitative transformation coefficient to be transmitted, if context-based variable-length encoding (cavlc) is used ), it selects the variable length code table to be used for coefficient Encoding Based on the quantitative transformation coefficient value transmitted in the previous encoding. Because the variable length code table is designed based on the corresponding statistical conditions, its performance is superior to the use of a single variable length code table. For other data, such as header information, a single variable length code table (exp-golomb Code) is used ). The new standard still uses block-based prediction and reconstruction methods. In order to remove the square effect that affects the subjective image quality, H.264 uses a deblocking filter.The main idea is that when the difference between the two sides of the block boundary is small, the filter is used to smooth the difference. If the boundary image feature is obvious, the filter is not used. This not only weakens the influence of the block effect but also avoids filtering out the objective features of the image. Meanwhile, the bitrate is reduced by 5-10% with the same subjective quality. In addition, Organization and transmission of image data. In H. 264 the image Macro Block in the standard can be flexibly organized into multiple slice groups (FMO). The slice is mutually independent and can be transmitted to the decoding end (ASO) in any order ). In addition, slice can transmit data in a bit stream using repeated methods (RS), which can be used for restoration when slice Data fails, enhancing the robustness of image transmission. At the same time, the independence between slice blocks the propagation of error spaces, thus improving the fault tolerance of bit streams. (2) technical features of main profile includes all algorithms of baseline profile and has additional technical features, but it does not support technologies such as FMO, Aso, and Rs. Only operations on I, P, and B slice are supported. In this framework, the concept of ABT is proposed. This concept is used for inter-frame encoding. The main idea is to associate the block size of the predicted parameter transformation encoding with the block size used for motion compensation. In this way, we try to use the maximum signal length for conversion and encoding. However, due to the complexity, the maximum block size for transformation is limited to 8 × 8. Entropy encoding is more efficient. context-based arithmetic coding (cabac) is used to further improve the entropy encoding performance. Compared with the cavlc, using cabac to encode TV signals with the same image quality will reduce the bit rate by 10-15%. In addition, the main profile does not support the Division of multiple slice groups.
(3) coding problems how to select the proposed model and use the motion estimation strategy (me) has always been a key research topic of video coding. In the implementation software of H.263, the pattern selection is simply based on the comparison of the threshold value. In the new standard testing software, the langron Rate Distortion Optimization Strategy is used, which is based on the deviation produced using each image block size and each prediction mode and the bit rate transmitted. In this way, the mode can achieve optimized rate distortion performance, but this is at the cost of improving the computing complexity. This optimization operation minimizes the pull functions below: J = satd + λ "r where R is the bit rate of each part of the corresponding transmission; λ is the optimization Parameter, it is highly correlated with quantitative parameters. Satd is the sum of the absolute values of the 4x4 blocks after the Harman transformation. The selection of Macro Block encoding modes and multi-reference frames within and between frames is achieved by minimizing the pull functions. Generally, video standards only include decoding specifications, while the Technical Research on pattern selection belongs to the encoding end, so they are not listed in the standards.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.