-----------------------
Objective
-----------------------
H264 is a new generation of coding standards, high compression high-quality and support a variety of network streaming media transmission is known, in terms of coding, I understand his theoretical basis is: reference for a period of time, the statistical results show that in several adjacent images, there is generally a difference in pixels only 10% of the points, The difference in brightness does not change beyond 2%, and the chroma difference is only 1% or less. So for a small change in the image screen, we can first encode a complete image frame A, then the B-frame does not encode the entire image, only write the difference with a frame, so that the size of the B-frame is only 1/10 of the full frame or less. If the C frame after the B-frame does not change much, we can continue to encode C frames in reference to B, so that the loop goes on. This image we call a sequence (the sequence is a piece of data with the same characteristics), when an image with the previous image changes very large, can not refer to the previous frame to generate, then we end the last sequence, the next sequence, that is, the image generated a complete frame A1, then the image is referenced A1 generation, Write only the difference between the A1 and the content.
In the H264 protocol defines three kinds of frames, the fully encoded frame called I-frame, referring to the previous I-frame generated by only the difference part of the encoded frame called P-frame, there is a reference frame encoded before and after the frame is called B-frame.
The core algorithm adopted by H264 is intra-frame compression and inter-frame compression, intra-frame compression is an algorithm for generating I-frames, and inter-frame compression is an algorithm for generating B-frames and P-frames.
----------------------
Description of the sequence
----------------------
In H264 the image is organized in sequence, a sequence is an image-encoded data stream that begins with the I frame and ends at the next I frame.
the first image of a sequence is called an IDR image (immediately refreshes the image), and the IDR image is an I-frame image. The IDR image is introduced in order to decode the resynchronization, when the decoder decoding to the IDR image, immediately empty the reference frame queue, the decoded data are all output or discard, re-find the parameter set, start a new sequence. Thus, if there is a significant error in the previous sequence, chances of resynchronization can be obtained here. The image after the IDR image is never decoded using the data from the image before the IDR.
A sequence is a stream of data generated after an image is encoded with a less varied content. When the movement is relatively small, a sequence can be very long, because the movement of less changes in the image of the content of the picture changes very little, so you can make an I-frame, and then always P-frame, B-frame. When the movement changes a long time, perhaps a sequence is shorter, for example, contains an I-Frame and 3, 4 P-frames.
-----------------------
Description of three types of frames
-----------------------
I-frame: in-frame encoded frames, I-frames represent keyframes, you can understand the full retention of this frame screen, only need to decode the frame data can be completed (because the full picture is included)
I-Frame features:
1. It is a full frame compression encoded frame. It will encode and transmit the full frame image information in JPEG compression.
2. Decoding the full image can be reconstructed with the data of I-frame only;
3.I frame describes the image background and the details of the moving body;
4.I frames do not need to refer to other images to generate;
The 5.I frame is a reference frame of P-frame and B-frame (its mass directly affects the quality of subsequent frames in the same group);
The 6.I frame is the base frame of the frame group GOP (first frame), with only one I frame in a group;
7.I frames do not need to consider motion vectors;
The amount of data that the 8.I frame occupies is relatively large.
P Frame: Forward prediction encoded frame. The P-frame represents the difference between this frame and a previous keyframe (or P-frame), which needs to be decoded to create the final picture by overlaying the differences defined by this frame with the previously cached screen. (That is, the difference frame, p frame does not have the complete picture data, only with the previous frame of the picture difference of data)
P-Frame Prediction and reconstruction: P-Frame is based on I-frame as reference frame, in I-frame to find the P-frame "a point" of the predicted value and motion vector, take the predicted difference and the motion vector transmission together. A complete P-frame can be obtained by finding the predicted value of the P-frame "point" from the I-frame at the receiving end and adding it to the difference to get the P-frame "some" sample value.
P-Frame Features:
The 1.P frame is an encoded frame that is separated by one or more frames after the I frame;
The 2.P frame uses the motion compensation method to transmit its difference and motion vectors (prediction error) with the previous I or P frames.
3. Decoding must sum the predicted value in I frame with the prediction error to reconstruct the complete P-frame image;
4.P frames belong to the inter-frame encoding of the forward prediction. It refers only to the I-frame or P-frame closest to it;
A 5.P frame can be a reference frame of the P-frame behind it, or a reference frame of a B-frame before and after it.
6. Since P-frame is a reference frame, it may cause the spread of decoding errors;
7. Due to the difference in transmission, p-frame compression is relatively high.
B-Frame: bidirectional prediction interpolation encoded frames. B-Frame is a two-way differential frame, that is, B-frame recording is the difference between this frame and the front and back frames (more complex, there are 4 cases, but I say simple), in other words, to decode B-frame, not only to obtain the previous cache screen, but also to decode the screen after the screen and the frame of the data overlay B-Frame compression rate is high, but the CPU will be more tired when decoding.
Prediction and reconstruction of B-frame
The B-frame is the reference frame of the front I or P frame and the following P-frame, "finds" the predicted value of "a point" of B-frame and two motion vectors, and takes the predicted difference and motion vector transmission. The receiving end "finds out" the predicted value according to the motion vector in two reference frames and sums it with the difference, obtains the B-frame "Some point" sample value, thus obtains the complete B-frame.
B-Frame Features
1.B frames are predicted by the front I or P frames and the following p-frames;
The 2.B frame transmits the prediction error and the motion vector between it and the front I or p frame and the back p frame;
3.B frames are bidirectional predictive coded frames;
The 4.B frame compression ratio is the highest, because it only reflects the change of the moving body between the reference frames, the prediction is more accurate;
5.B frames are not reference frames and do not cause a spread of decoding errors.
Note: I, B, p frames are based on the needs of the compression algorithm, are human-defined, they are real physical frames. In general, I-frame compression rate is 7 (similar to JPG), p-frame is 20,b frame can reach 50. The use of B-frames can be seen to save a lot of space, saving space can be used to save more I-frames, so that at the same bitrate, can provide better picture quality.
--------------------------------
Description of the compression algorithm
--------------------------------
Compression method for H264:
1. Group: A few frames of the image into a group (GOP, that is, a sequence), in order to prevent motion changes, the number of frames should not be taken more.
2. Define frames: Define each frame image as three types, i.e. I, B, and P frames;
3. Prediction frame: I frame as the base frame, I-frame prediction P-frame, and then by I-frame and P-Frame prediction B-frame;
4. Data transfer: Finally, I-frame data and prediction of the difference between the information storage and transmission.
Intra-frame (intraframe) compression is also referred to as spatial compression (spatial compression). When compressing a frame image, only the data in this frame is considered, regardless of the redundancy between adjacent frames, which is actually similar to the static image compression. In the frame is generally used lossy compression algorithm, because the intra-frame compression is encoded a complete image, so it can be independently decoded, displayed. Intra-frame compression generally does not reach very high compression, similar to encoding jpeg.
The principle of inter-frame (interframe) compression is that the data of the neighboring frames is very correlated, or that the two frame information changes very little. That is, continuous video has redundant information between its neighboring frames, according to this feature, compressing the redundancy between adjacent frames can further improve the compression and reduce the compression ratio. Inter-frame compression, also known as time compression (temporal compression), is compressed by comparing data between different frames on the timeline. Inter-frame compression is generally lossless. Frame differencing algorithm is a typical time compression method, which can reduce the amount of data by comparing the difference between the frame and the adjacent frame, only the difference between the frame and its neighboring frame.
By the way, lossy (Lossy) compression and lossless (Lossy less) compression. Lossless compression is also the exact same data that is compressed before and after decompression. The RLE Travel coding algorithm is used for most lossless compression. lossy compression means that the uncompressed data is inconsistent with the data before it is compressed. In the process of compression to lose some of the human eye and the human ear is not sensitive to the image or audio information, and the lost information is not recoverable. Almost all high-compression algorithms use lossy compression to achieve the goal of low data rates. The lost data rate is related to the compression ratio, the smaller the compression ratio, the more data is lost, the effect is generally worse after decompression. In addition, some lossy compression algorithms use multiple repeated compression methods, which can also cause additional data loss.