H264 Coding Technology

Source: Internet
Author: User
Tags arithmetic coding standards

The target application of H. E covers most of the current video services, such as cable TV remote monitoring, interactive media, digital TV, video conferencing, video on demand, streaming media services. To solve the difference of network transmission in different applications. Two layers are defined: The video coding layer (Vcl:video Coding layer) is responsible for efficient video content representation, and the network extraction layer (Nal:network abstraction layers) is responsible for packaging and transmitting data in the appropriate manner required by the network. As shown in Figure 3.19.

Figure 3.19 The overall framework of the standard

Basic level (Baseline profile): This level uses all of the features except B-SLICES,CABAC and interleaving coding patterns. This level is mainly used in real-time applications with low latency.

Main profile: Contains all the features of baseline profiles and includes B-SLICES,CABAC and interleaved encoding modes. It is mainly for the time delay requirements, when the compression rate and high quality requirements of the occasion.

Extension hierarchy (Profile X): Supports all baseline profile features, but does not support CABAC and self-adapting frame field encoding based on macro blocks. This level is mainly aimed at the time of various network video stream transmission aspect application.

CABAC

Cabac is a content-based adaptive binary arithmetic encoding, and when the parameter Entropy_coding_mode is set to 1 o'clock, an arithmetic system is used to encode and decode the syntax elements of H.

There are two methods for entropy coding: CAVLC encoding and CABAC coding algorithm. Using the context-based adaptive binary Arithmetic Coding algorithm (CABAC), we can make full use of the advantages of contextual information and arithmetic coding, and make the average code length more approximate to the information entropy of the image and achieve the best coding efficiency. Coding with the CABAC algorithm increases the encoding rate by approximately 10%

Specific coding steps:

12 Value: Cabac uses binary arithmetic encoding, so the data is first converted to binary data, which includes transformation coefficients and motion vectors. The converted binary data is variable-length encoded data, and the data is also encoded in arithmetic.

2 Content Mode selection: The content mode is a probabilistic model for binary data statistics, which is selected from a number of selectable modes based on the statistical characteristics of some previously encoded data symbols. The content mode stores the probability of each bit "1" or "0".

3 Arithmetic code: The arithmetic encoder encodes each one according to the selected content mode.

4 Probability correction: The selected content mode is corrected according to the actual encoded value, for example, if there is a value of "1" in the data bit stream, the probability statistic value of "1" is added to 1.

DCT Transform

The residual signal is still used to compress the spatial redundancy information by using the Entropy coding mode after quantization. An integer transformation similar to the 4x4 discrete cosine transform DCT is used instead of a 8x8dct floating-point number transformation like MPEG4. The final use of that transformation is also based on the different types of residual data to choose, in-frame encoding macro block brightness DC coefficient (only for the 16x16 Prediction mode is valid) using a 4x4 matrix, chroma DC coefficients using a 2x2 matrix, for the other uses a 4x4 block to transform.

The use of integer-based spatial transformations can increase the computational speed (using only addition and displacement operations), but the use of integer transformation to the premise of the non-vector accuracy, integer transformation of the inverse transformation of the process will not have a large error, and the multiplication of the scaling matrix is integrated into the quantization, reducing the total number of multiplication.

(1) DC coefficient transform of 4x4 luminance component

If the macro block is encoded as 16x16 in-frame mode, then each 4x4 residual block is transformed first with the transformation described previously, and then a 4x4 two-time transformation of the DC (DC) coefficients for each 4x4 transform, using the Hadamard transform.
The positive transformation is:

Where a is the transformation kernel matrix
A=1/2

(2) DC coefficient transformation for 2x2 chroma blocks

After the 4 4x4 chroma blocks within each macro block have been transformed, the DC coefficients of each block constitute a 2x2 block of WD, which is a 2x2 Hadamard transform.

The formula for the positive transformation is:

The inverse transformation formula is:
(3) as shown in Figure 3.18, the transformation blocks in the macro block and their order of delivery are displayed. The block numbered 1 is the result of the DC coefficients of the 0-15 4x4 sub-block transformed by the integer DCT after the use of the intra16x16 mode encoding in the 4x4 Hada transform. Blocks 16 and 17 are the result of the 2x2 Hada Code transformation of the DC coefficients of the chroma blocks. The remaining 24 blocks are 4x4 integer transforms.
Figure 3.20 Transformations in macro blocks and their order of delivery

Multiple motion compensation Blocks

There are 7 types of motion compensation to choose from, the 7 kinds of blocks are: inter16x16,inter16x8,inter8x16,inter8x8,inter8x4,inter4x8,inter4x4. According to the different block size of motion compensation, the coding mode of macro block is divided into four kinds, the first three modes are compensated by a 16x16 block, two 16x8 block and two 8x16 block respectively. The last mode is p8x8, in p8x8 mode, a macro block is divided into 4 8x8 sub-blocks. And each sub-block has 4 possible sub-patterns, respectively, according to an 8x8 block, two 8x4 block, two 4x8 block and four 4x4 blocks for motion compensation, as shown in Figure 3.19, the first line is the macro block four mode, the second row is a sub-block four mode.

Figure 3.21 Macro Block Partitioning method

The choice of block size is reasonable for the quality of the compression effect has a great impact, generally speaking, for the slow-changing part of the larger block effect is better, for the part containing more detail should adopt a smaller block way.

1/4 Pixel Accuracy Motion estimation

Each chunk of the in-frame coded macro block is predicted by the same size area in the reference frame. The offset between the two regions is the motion vector. Because the motion of an image cannot always be full-pixel. Therefore, subpixel motion vectors are introduced. For the luminance component, the motion vector has a resolution of 1/4 pixels. Because sub-pixel sampling points are not inherently possible in the reference frame, sub-pixel sampling points need to be produced using their neighboring pixel interpolation. Sub-pixel sampling point interpolation process, as shown in Figure 3.20

Figure 3.22 Sub-pixel sampling point
The half-pixel interpolation is generated by a one-dimensional 6-order filter moving horizontally and vertically. The 1/4 pixel value is obtained by averaging the integer pixel and half pixel point.
For example:
B=round ((e-5f+20g+20h-5i+j)/32)
A=round ((g+b)/2)
E=round ((b + H)/2)
Due to the 1/4 pixel precision motion vectors in the luminance component, 1/8 pixel accuracy is generated in the Chroma component. Therefore, the linear interpolation method is used to produce 1/8 pixel sampling points.
A=round ([(8-DX). (8-dy) a+dx. (8-dy) B + (8-DX). dyc+dx.dyd]/64)

Image segmentation

The image segmentation of the slice structure is supported by H. A slice is composed of several macro blocks within a frame of the picture. The encoder side has no limit on the number of macro blocks contained in the slice. A slice can contain only one macro block or all the macro blocks in that frame. However, any macro block can only be contained in one slice, and it is not allowed to recur (except in the redundant slice method).
The main motive for adopting the slice structure is to make the encoded slice size adaptable to different MTU sizes. When it can be used in the implementation of cross-packaging and other methods.

Multi-Reference Frame selection

Multi-reference frame selection can also be used in some of the previous video coding standards. This method is especially used in systems with feedback mechanisms. But it has little significance in the application of high latency.
Unlike the previous standard P-frames and B-frames, the prediction of the forward and back multiple reference frames is used.

Data split Fast

In general, the code elements in a macro block are encoded in a single bit string. Data chunking creates multiple bit strings for each of the slice.
Three different types of data blocks were used in H.

Header information blocks, including macro block types, quantization parameters, motion vectors. This information is most important because the code elements of the data block are not available to leave them. The data chunking is called a-class data chunking.

In-frame coded information data block, called Class B data block. It contains the intra-frame coded macro block type and the intra-frame encoding factor. For the corresponding slice, the availability of Class B data chunking depends on the Class A data chunking. and the Inter-frame encoding information data block is that the intra-frame encoding information can prevent further deviations, and therefore more important than the inter-frame encoding information.

Inter-frame encoded information data block, called Class C data chunking. It contains the inter-frame encoding macro block type, inter-frame encoding factor. It is usually the largest part of the slice. Inter-frame encoded information blocks are not important. The information it contains does not provide synchronization between codecs. The availability of Class C data chunking also relies on a class a data block, but is independent of the Class B data block.

When the data is chunked, the source encoder puts the non-type code element into three different bit buffer species in addition, the slice size needs to be adjusted so that the maximum data chunking is not greater than the maximum MTU size. In this respect, the data chunked operation is the source encoder instead of the nal.
On the decoder side, all data chunking information must be obtained before the correct decoding begins. However, if the encoded block information is lost in the frame or frame, the header information can be effectively used to improve the error recovery efficiency. The header information type contains the macro block types, uses the motion vector and so on information, therefore can copy the information according to the higher quality. And just lost some of the image texture information.

Set of parameters

The sequence parameter set includes all information related to a sequence of pictures. The set of image parameters contains information related to all slice in the image. At the decoder end, you can store multiple sets of different sequences and picture parameters. The encoder can choose an appropriate set of picture parameters, and the picture parameter set itself contains the referenced sequence parameter set information.

The creative application of parameter sets greatly improves the error recovery performance. The key to using parameter sets in a fault-tolerant environment is to ensure that the parameter set reaches the receiving decoder reliably and in a timely manner. The parameter set can be transmitted at one time using an out-of-band reliable communication Control Protocol and ensures that the decoder receives the first slice data from the real-time communication channel before it is required to reference the parameter set. Or it can be transmitted within the frequency band, but some application layer protections must be used (e.g., multiple copies of one parameter set are transferred to increase the probability of at least one replication to the destination). The third scenario is to pre-place some set of parameters on the encoder and decoder side, where the codec must select the parameter set.

Variable macro block ordering

Variable macro block ordering (fmo,flexible macroblock ordering) can be used in baseline and ext4ended modes, but is not allowed to be re-used in main mode. Variable macro block ordering allows macro blocks not to be assigned to slice in the scan order. The specific allocation policy is defined by a macro block allocation Map (MBAMAP). Within slice, macro blocks are still encoded in normal scan order.

This feature provides a pattern for assigning macro blocks in a frame image to multiple slice, each of which is a separate encoding unit that cannot cross-frame or intra-frame encoding, if data loss occurs during transmission, You can use the received macro block data to recover the missing macro block data.

Figure 3.23 Variable Macro block encoding order

Slice

Slice is a concept similar to the image Group (GOP) in H.263, where a slice consists of a series of macro blocks arranged in the order of raster scans. In general, each macro block contains a 16x16 luminance array that contains and two corresponding chroma arrays when the video format is not monochrome. If you do not use the macro block adaptive frame/field decoding, each macro block represents a rectangular area of space in the image. For example, as shown in Figure 3.22, an image is divided into two bands.

Figure 3.24 Slice Object

Each slice is a separate encoding unit that cannot be crossed between frames or intra-frame encoding. Redundant slice allows the encoder to embed one or more redundant representations of the macro block in the same slice in the same data stream. The key difference between this approach and the transport layer redundancy technique, such as packet replication, is that in redundant slice the macro block redundancy representation can be encoded using different encoding parameters. For example, the first step is to indicate that a relatively low quantization factor can be used to obtain a lower image quality, whereas a relatively high quantization factor can be used to reduce the number of bits in a redundant representation. Redundant representations are discarded when the decoder correctly receives the first representation. If the first indication is that the packet is lost and so on, it can be recovered with redundant representation of the information in the slice data. The redundant slice was originally introduced to support high-error wireless communication environments, but is equally effective in IP-based environments.

Method of estimating motion by block matching

A motion compensator that completely offsets all motion will produce very good prediction frames, so that virtually no power is present in the difference picture. We need relatively large amounts of data to describe the motion in detail, but only a relatively small amount of data is needed to describe the difference frame. Admittedly, it is impossible to even use art technology to recognize and measure the motion of any object from a generic frame source. We have to be content with simplifying picture models, such as the often-used block-matching technique. In addition to suboptimal motion compensation, the data rate required for a differential image is much smaller than the rate required for no motion compensation. Further, our advantage is particularly simple, thus saving the number of bits required to describe the movement. This is partly specialties to compensate for the lack of signal power in the difference picture, which is not completely minimized.

Motion estimator using block-matching technique

In data compression, the block-matching motion estimator can handle each new frame arbitrarily, making it transfer with the same size directly adjacent objects. In addition, objects can only move uniformly in one direction on a 2-dimensional plane. Thus, the transmitted frames are divided into a series of rectangular pattern blocks, which are produced consecutively. The motion predictor assumes that the pattern block can only move a maximum value in the X and y directions. For each pattern block, there is a search area where the pattern block can be found in the area of the previous frame, depending on the base model. With equal-length steps, the pattern blocks move through successive positions within the search area, and each position is compared to the old picture.
Positional transformations, also known as displacements, are called post-search motions if a displacement achieves the best similarity or matching results. Then, the block of the motion compensation frame fills the contents of the block belonging to the previous frame, which produces the best match with the pattern block previously searched for. In this way, the motion compensation frame can be as close as possible to the transient frame.

The X and Y components in the displacement are transmitted to the receiver through a lateral channel, in order to compensate for the frame from the old frame. Performing this operation on the contents of the previous frame to perform this operation on a known image is an essential advantage of this coding technique.

The data rate of a vector depends on the band of the lookup area, depending on the maximum displacement and the exact degree of the desired vector. The outline of an object is not necessarily transmitted because all objects have the same rectangle.

VLC coding for P images

VLC is a variable length code, VLC is a statistical coding technology, its basic idea is: the occurrence of a higher frequency of the number of bits less than the number of code words, and the frequency of lower frequencies to allocate more bits of the code word, so from the overall effect, the amount of data than the uniform allocation of bits of data less. Variable-length coding is an improvement on Huffman coding

P image is a reference to the past in-frame image or the past prediction obtained by the motion compensation prediction technology to encode, p image is encoded by the image macro block as the basic coding unit. The prediction coding is based on the motion estimation, which will directly affect the coding efficiency and compression performance of the whole system, so we hope to find a motion estimation algorithm with high prediction accuracy and low computational value.

As I picture, each p screen is divided into one or more pieces, and each piece is divided into several macro blocks. The encoding of P screen is much more complicated than I, because it is needed to compensate macro block of tectonic movement. The difference between the motion compensated macro block and the current macro block is transformed by a two-dimensional DCT into an 8x8 transformation coefficient matrix, which is quantified into a set of quantization coefficients, and finally, the quantization coefficients are encoded by the stroke length technique. Tables 3.11 and 3.12 Show the types of macro blocks and VLC codes supported in the P and B screens respectively.

Table 3.11 The type of macro block and VLC code in P screen

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.