"Reprint" H264 coding principle and I-frame, B-frame, p-frame

Last Update:2015-12-09 Source: Internet

Author: User

Tags coding standards

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Preface
H264 is a new generation of coding standards, high compression high-quality and support a variety of network streaming media transmission is known, in terms of coding, I understand his theoretical basis is: reference for a period of time, the statistical results show that in several adjacent images, there is generally a difference in pixels only 10% of the points, The difference in brightness does not change beyond 2%, and the chroma difference is only 1% or less. So for a little change in the image screen, we can first encode a complete image frame A, then the B-frame does not encode the entire image, only write and a frame difference, so that the size of the B-frame is only 1/10 of the full frame or smaller! If the C frame after the B-frame does not change much, we can continue to encode C frames in reference to B, so that the loop goes on. This image we call a sequence (the sequence is a piece of data with the same characteristics), when an image with the previous image changes very large, can not refer to the previous frame to generate, then we end the last sequence, the next sequence, that is, the image generated a complete frame A1, then the image is referenced A1 generation, Write only the difference between the A1 and the content.
In the H264 protocol defines three kinds of frames, the fully encoded frame called I-frame, referring to the previous I-frame generated by only the difference part of the encoded frame called P-frame, there is a reference frame encoded before and after the frame is called B-frame.
The core algorithm adopted by H264 is intra-frame compression and inter-frame compression, intra-frame compression is an algorithm for generating I-frames, and inter-frame compression is an algorithm for generating B-frames and P-frames.

Description of the sequence
In H264 the image is organized in sequence, a sequence is an image-encoded data stream that begins with the I frame and ends at the next I frame.
The first image of a sequence is called an IDR image (immediately refreshes the image), and the IDR image is an I-frame image. The IDR image is introduced in order to decode the resynchronization, when the decoder decoding to the IDR image, immediately empty the reference frame queue, the decoded data are all output or discard, re-find the parameter set, start a new sequence. Thus, if there is a significant error in the previous sequence, chances of resynchronization can be obtained here. The image after the IDR image is never decoded using the data from the image before the IDR.
A sequence is a stream of data generated after an image is encoded with a less varied content. When the movement is relatively small, a sequence can be very long, because the movement of less changes in the image of the content of the picture changes very little, so you can make an I-frame, and then always P-frame, B-frame. When the movement changes a long time, perhaps a sequence is shorter, for example, contains an I-Frame and 3, 4 P-frames.

Description of three types of frames

1, I frame
I-frame: in-frame encoded frames, I-frames represent keyframes, you can understand the full retention of this frame screen, only need to decode the frame data can be completed (because the full picture is included)
I-Frame features:
1) It is a full frame compression encoded frame. It will encode and transmit the full frame image information in JPEG compression.
2) The full image can be reconstructed with the data of I frame only;
3) I-frame describes the image background and the details of the moving body;
4) I frame does not need to refer to other pictures to generate;
5) I frame is the reference frame of P frame and B frame (its mass directly affects the quality of subsequent frames in the same group);
6) I frame is the base frame of the frame group GOP (first frame), with only one I frame in a group;
7) I-frames do not need to consider motion vectors;
8) The amount of information that I frame occupies is relatively large.

2, P-Frame

P Frame: Forward prediction encoded frame. The P-frame represents the difference between this frame and a previous keyframe (or P-frame), which needs to be decoded to create the final picture by overlaying the differences defined by this frame with the previously cached screen. (That is, the difference frame, p frame does not have the complete picture data, only with the previous frame of the picture difference of data)
P-Frame Prediction and reconstruction: P-Frame is based on I-frame as reference frame, in I-frame to find the P-frame "a point" of the predicted value and motion vector, take the predicted difference and the motion vector transmission together. A complete P-frame can be obtained by finding the predicted value of the P-frame "point" from the I-frame at the receiving end and adding it to the difference to get the P-frame "some" sample value.
P-Frame Features:
1) P-frame is an encoded frame that is separated by one or more frames after the I frame;
2) The P-Frame uses the motion compensation method to transmit its difference and motion vectors (prediction error) with the previous I or P frames;
3) decoding must sum the predicted value in I frame with the prediction error to reconstruct the complete P-frame image;
4) P-frames belong to the inter-frame encoding of forward prediction. It refers only to the I-frame or P-frame closest to it;
5) The P-frame can be the reference frame of the P-frame behind it, or it can be a reference frame of B-frame before and after it;
6) Since P-frame is a reference frame, it may cause the spread of decoding errors;
7) Due to the difference in transmission, p-frame compression is relatively high.

3, B-Frame

B-Frame: bidirectional prediction interpolation encoded frames. B-Frame is a two-way differential frame, that is, B-frame recording is the difference between this frame and the front and back frames (more complex, there are 4 cases, but I say simple), in other words, to decode B-frame, not only to obtain the previous cache screen, but also to decode the screen after the screen and the frame of the data overlay B-Frame compression rate is high, but the CPU will be more tired when decoding.
Prediction and reconstruction of B-frame
The B-frame is the reference frame of the front I or P frame and the following P-frame, "finds" the predicted value of "a point" of B-frame and two motion vectors, and takes the predicted difference and motion vector transmission. The receiving end "finds out" the predicted value according to the motion vector in two reference frames and sums it with the difference, obtains the B-frame "Some point" sample value, thus obtains the complete B-frame.
B-Frame Features
1) B-frame is predicted by the front I or P frames and the following p-frames;
2) B-frame transmits the prediction error and motion vector between it and the front I or p frame and the back P-frame;
3) B-frame is a bidirectional predictive coding frame;
4) b Frame Compression ratio is the highest, because it only reflects the change of the motion body between the reference frame, the prediction is more accurate;
5) B-frame is not a reference frame and does not cause the spread of decoding errors.

Note: I, B, p frames are based on the needs of the compression algorithm, are human-defined, they are real physical frames. In general, I-frame compression rate is 7 (similar to JPG), p-frame is 20,b frame can reach 50. The use of B-frames can be seen to save a lot of space, saving space can be used to save more I-frames, so that at the same bitrate, can provide better picture quality.

Description of the

compression algorithm
H264 compression method:
1. Grouping: A few frames of images into a group (GOP, that is, a sequence), in order to prevent motion changes, the number of frames should not be taken more.
2. Define frames: Define each frame image as three types, i.e. I, B, and P frames;
3. Prediction frame: I frame as the base frame, with I frame prediction P-frame, and then by I-frame and P-Frame prediction B-frame;
4. Data transfer: Finally, I-frame data is stored and transmitted with the predicted difference information.
     in-frame (intraframe) compression is also known as spatial compression (spatial compression). When compressing a frame image, only the data in this frame is considered, regardless of the redundancy between adjacent frames, which is actually similar to the static image compression. In the frame is generally used lossy compression algorithm, because the intra-frame compression is encoded a complete image, so it can be independently decoded, displayed.　　Intra-frame compression generally does not reach very high compression, similar to encoding jpeg. The principle of
     inter-frame (interframe) compression is that there is a great correlation between the data in the neighboring frames, or that the two frame information changes very little. That is, continuous video has redundant information between its neighboring frames, according to this feature, compressing the redundancy between adjacent frames can further improve the compression and reduce the compression ratio. Inter-frame compression, also known as time compression (temporal compression), is compressed by comparing data between different frames on the timeline. Inter-frame compression is generally lossless. Frame differencing algorithm is a typical time compression method, which can reduce the amount of data by comparing the difference between the frame and the adjacent frame, only the difference between the frame and its neighboring frame.
        by the way lossy (Lossy) compression and lossless (Lossy less) compression. Lossless compression is also the exact same data that is compressed before and after decompression. The RLE Travel coding algorithm is used for most lossless compression. lossy compression means that the uncompressed data is inconsistent with the data before it is compressed. In the process of compression to lose some of the human eye and the human ear is not sensitive to the image or audio information, and the lost information is not recoverable. Almost all high-compression algorithms use lossy compression to achieve the goal of low data rates. The lost data rate is related to the compression ratio, the smaller the compression ratio, the more data is lost, the effect is generally worse after decompression. In addition, some lossy compression algorithms use multiple repeated compression methods, which can also cause additional data loss.

"Reprint" H264 coding principle and I-frame, B-frame, p-frame

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More