" H.264/AVC Video Codec technology detailed" video tutorial has been in the "CSDN College" on-line, the video details of the background, standard protocol and implementation, and through a practical project in the form of the standard of the resolution and realization of H. A, welcome to watch.
"The paper came to the end of the light, I know this matter to preach", only by themselves in accordance with the standard document in the form of code to operate, in order to the video compression coding standard ideas and methods have enough deep understanding and experience.
link Address: H.264/AVC Video codec technology detailed
Video of this section free
I. H. A video coding standard
The video coding standard is another great achievement of ITU-T and MPEG, which has had a great impact on the industry since the promulgation date. Strictly speaking, the H. E Standard is part of the MPEG-4 family, the 10th part of the MPEG-4 series document ISO-14496, and is therefore called MPEG-4/AVC. Different from the flexibility and interactivity that the MPEG-4 focus on, it emphasizes higher coding compression ratio and transmission reliability, which has been widely used in digital TV broadcasting, real-time video communication, network streaming media and other fields. two. Introduction to the video coding method of H.
In terms of the overall coding framework, H. E still uses a structure similar to the previous standard, that is, the block structure of the hybrid coding framework. The main structure of the diagram is as follows:
The h image of each frame is encoded into one or more bands (slice) during the encoding process. Each stripe contains multiple macro blocks (Mb,macroblock). The macro block is the basic coding unit of the H. x standard, and its basic structure contains a 16x16 luminance pixel block and two 8x8 chroma pixel blocks, as well as some other macro header information. When a macro block is encoded, each macro block is divided into a number of different size sub-blocks for prediction. Intra-frame predictions may have a block size of 16x16 or 4x4, and the blocks used for inter-frame prediction/motion compensation may have 7 different shapes: 16x16, 16x8, 8x16, 8x8, 8x4, 4x8, and 4x4. Compared to the early standard can only follow the macro block or half of the macro-block motion compensation, the more subdivision of the macro-block segmentation method provides higher prediction accuracy and coding efficiency. In terms of transformation coding, the size of the transform block for predictive residuals is 4x4 or 8x8 (supported only in the FRExt version). Compared to the earlier versions of a transform block that only supports 8x8 size, the problem of mismatch that often occurs in a transform inverse transformation is avoided.
The entropy coding method used in the H. B Standard mainly has the context adaptive variable length coding CAVLC and the context Adaptive binary arithmetic encoding Cabac, which specifies different encoding methods according to different grammatical element types. The two entropy coding methods achieve a balance between coding efficiency and computational complexity.
Similar to the previous standard, the bands of H. C have different types, the most commonly used are I bands, p bands and b bands. In addition, SI and SP slices are also defined in the extended class in order to support bitstream switching. I strip: in-frame coding strip, contains only I macro block, p strip: one-way inter-frame coding strip, may contain P macro block and I macro block, B Strip: bidirectional inter-frame coding strip, may contain B macro block and I macro block;
The coding tools used in video coding, such as predictive coding, change quantization and entropy coding, mainly work in the slice layer or below, and this layer is often referred to as the "Video Coding layer" (Coding, VCL). In contrast, the data and algorithms performed above slice are often referred to as the "Network abstraction layer," which is known as the web abstraction layers, NAL. The main meaning of the design definition nal layer is to improve the affinity of the video to network transmission and data storage in the format of H.
In order to adapt to different application scenarios, H. Three also defines a number of different grades: Benchmark Grade (Baseline profile): Mainly used for video conferencing, videophone and other low-latency real-time communication field; Support I bands and p bands, Entropy coding support CAVLC algorithm. Main grade (main profile): Mainly used for digital TV broadcasting, digital video data storage, etc. support video field coding, B-band bidirectional prediction and weighted prediction, Entropy coding supports CAVLC and CABAC algorithms. Extended Grade (Extended profile): Mainly used for network video live and on-demand, etc. support all features of the benchmark grade and support SI and SP bands, support data segmentation to improve BER performance, support for B bands and weighted predictions, but do not support CABAC and field coding. Three. The coding tool used in the standard H.
The following types of coding techniques are used in H: intra- frame prediction
Intra-frame prediction technology based on pixel block is adopted in H. This can be divided into the following different types: 16x16-sized brightness blocks: 4 Predictive mode 4x4-sized brightness blocks: 9 Predictive mode chroma blocks: 4 predictive modes, with 16x16 brightness blocks
The 4 predictive modes for 16x16 luminance blocks and chroma blocks are shown below:
The 9 predictive modes for 4x4 brightness blocks are shown below:
inter-frame prediction
The Inter-frame prediction method in H. 1 uses block-based motion estimation and compensation, and its main features are: Multiple candidate reference frames, B-frame as reference frame, arbitrary reference frame sorting, multiple motion compensated pixel block shapes including 16x16, 16x8, 8x16, 8x8, 8x4, 4x8, and 4x4 pixels /4 pixels (brightness) subpixel interpolation; The motion estimation based on frame or field of interlaced video;
The macro block for inter-frame prediction is divided into sub-macro blocks as shown in the figure:
The representation of subpixel interpolation is shown below. Where the red dot represents the position of the entire pixel in the image, the green dot represents the position of the 1/2-pixel interpolation between two positive pixels, and the purple dot represents the position of the 1/4-pixel interpolation.
Interleaved Video Encoding
For interlaced video, H + defines an algorithm for processing such interlaced video. Picaff:picture Adaptive frame field--image layer adaptive; Mbaff:macroblock Adaptive frame field--macro block layer adaptive; transformation and quantization coding
The transform code of H. A has creatively adopted integer transform of class DCT, which effectively reduces the complexity of operation. The transform matrix is 4x4 for the basic version of H. FRExt and the 8x8 transform matrix is also supported in the extension.
The quantization algorithm of H. I still uses scalar quantization method. lossless entropy Coding algorithm
The different entropy coding algorithms are specified for different grammatical elements, mainly: UVLC (Universal Variable Length Coding): Mainly using exponential Columbus coding; CAVLC (Context Adaptive Variable Length Coding): Context-Adaptive variable-length coding; CABAC (context Adaptive binary arithmetic Coding): Context-adaptive binary arithmetic encoding; Other Technologies
In addition to the above-mentioned core algorithms, H. A also defines a variety of technologies including Block loop filters, si/sp frames, and bitrate control.