H264 is a new generation of coding standards, with high compression quality and support for a variety of network streaming media transmission is known, in terms of coding, I understand his theoretical basis is:
According to the statistical results of the image for a period of time, it is shown that in the adjacent images, there are only 10% points in the pixels, the difference of luminance is not more than 2%, and the difference of Chroma is less than 1%. So for a small change in the image screen, we can first encode a complete image frame A, then the B-frame does not encode the entire image, only write the difference with a frame , so that the size of the B-frame is only 1/10 of the full frame or less. If the C frame after the B-frame does not change much, we can continue to encode C frames in reference to B, so that the loop goes on. This image we call a sequence (the sequence is a piece of data with the same characteristics), when an image with the previous image changes very large, can not refer to the previous frame to generate, then we end the last sequence, the next sequence , that is, the image generated a complete frame A1, The subsequent image is generated from the reference A1, and only the difference between the A1 and the content is written.
in the H264 protocol, three kinds of frames are defined, the fully encoded frame is called I-frame,I-frame is a keyframe, reference to the previous I-frame generated only the difference part encoded frame called P-frame,P-frame is forward differential frame , there is a reference frame encoded frame called B-frame, B-frames are bidirectional differential frames .
the core algorithm adopted by H264 is intra-frame compression and inter-frame compression, intra-frame compression is an algorithm for generating I-frames, and inter-frame compression is an algorithm for generating B-frames and P-frames.
-----------------------
Description of three types of frames
-----------------------
I-frame: in-frame encoded frames, I-frames represent keyframes, you can understand the full retention of this frame screen, only need to decode the frame data can be completed (because the full picture is included)
I-Frame features:
1. It is a full frame compression encoded frame. It will encode and transmit the full frame image information in JPEG compression.
2. Decoding the full image can be reconstructed with the data of I-frame only;
3.I frame describes the image background and the details of the moving body;
4.I frames do not need to refer to other images to generate;
The 5.I frame is a reference frame of P-frame and B-frame (its mass directly affects the quality of subsequent frames in the same group);
The 6.I frame is the base frame of the frame group GOP (first frame), with only one I frame in a group;
7.I frames do not need to consider motion vectors;
The amount of data that the 8.I frame occupies is relatively large.
P Frame: Forward prediction encoded frame. The P-frame represents the difference between this frame and a previous keyframe (or P-frame), which needs to be decoded to create the final picture by overlaying the differences defined by this frame with the previously cached screen. (That is, the difference frame, p frame does not have the complete picture data, only with the previous frame of the picture difference of data)
P-Frame Prediction and reconstruction: P-Frame is based on I-frame as reference frame, in I-frame to find the P-frame "a point" of the predicted value and motion vector, take the predicted difference and the motion vector transmission together. A complete P-frame can be obtained by finding the predicted value of the P-frame "point" from the I-frame at the receiving end and adding it to the difference to get the P-frame "some" sample value.
P-Frame Features:
The 1.P frame is an encoded frame that is separated by one or more frames after the I frame;
The 2.P frame uses the motion compensation method to transmit its difference and motion vectors (prediction error) with the previous I or P frames.
3. Decoding must sum the predicted value in I frame with the prediction error to reconstruct the complete P-frame image;
4.P frames belong to the inter-frame encoding of the forward prediction.it refers only to the I-frame or P-frame closest to it;
A 5.P frame can be a reference frame of the P-frame behind it, or a reference frame of a B-frame before and after it.
6.since the P-frame is a reference frame, it may cause the spread of decoding errors;
7. Due to the difference in transmission, p-frame compression is relatively high.
B-Frame: bidirectional prediction interpolation encoded frames. B-Frame is a two-way differential frame, that is, B-frame recording is the difference between this frame and the front and back frames (more complex, there are 4 cases, but I say simple), in other words, to decode B-frame, not only to obtain the previous cache screen, but also to decode the screen after the screen and the frame of the data overlay B-Frame compression rate is high, but the CPU will be more tired when decoding.
Prediction and reconstruction of B-frame
Frame B is the reference frame with the front I or P frames and the following p -frames, "Find" the predicted value and two motion vectors of the B-frame "some point", and take the predicted difference and motion vector transmission. The receiving end "finds out" the predicted value according to the motion vector in two reference frames and sums it with the difference, obtains the B-frame "Some point" sample value, thus obtains the complete B-frame.
B-Frame Features
1.B frames are predicted by the front I or P frames and the following p-frames;
The 2.B frame transmits the prediction error and the motion vector between it and the front I or p frame and the back p frame;
3.B frames are bidirectional predictive coded frames;
The 4.B frame compression ratio is the highest, because it only reflects the change of the moving body between the reference frames, the prediction is more accurate;
5.B-frames are not reference frames and do not cause a spread of decoding errors.
Note: I, B, p frames are based on the needs of the compression algorithm, are human-defined, they are real physical frames. Generally speakingI-Frame compression is 7 (similar to JPG), p-frame is 20,b frame can reach 50. The use of B-frames can be seen to save a lot of space, saving space can be used to save more I-frames, so that at the same bitrate, can provide better picture quality.
----------------------
Description of the sequence
----------------------
In H264 the image is organized in sequence, a sequence is an image-encoded data stream that begins with the I frame and ends at the next I frame.
the first image of a sequence is called an IDR image (immediately refreshes the image), and the IDR image is an I-frame image. The IDR image is introduced in order to decode the resynchronization, when the decoder decoding to the IDR image, immediately empty the reference frame queue, the decoded data are all output or discard, re-find the parameter set, start a new sequence. Thus, if there is a significant error in the previous sequence, chances of resynchronization can be obtained here. The image after the IDR image is never decoded using the data from the image before the IDR.
A sequence is a stream of data generated after an image is encoded with a less varied content. When the movement is relatively small, a sequence can be very long, because the movement of less changes in the image of the content of the picture changes very little, so you can make an I-frame, and then always P-frame, B-frame. When the movement changes a long time, perhaps a sequence is shorter, for example, contains an I-Frame and 3, 4 P-frames.
From the above explanation, we know that I and P decoding algorithm is relatively simple, resource consumption is relatively small, I as long as the completion of the line, p, but also only need the decoder to cache the previous screen, encountered p when using the cache before the screen is good, if the video stream only I and P, decoder can no matter the data behind, Reading side decoding, linear forward, everyone is very comfortable.
But many movies on the network use B-frame, because B-frame record is the difference between the frame, compared to P-frame can save more space, but in this case, the file is small, the decoder is troublesome, because in decoding, not only to use the screen before the cache, but also know the next I or P screen (that is, pre-read pre- ,B-frame can not simply throw away, because B-frame actually contains the picture information, if simply discarded, and with the previous screen simple repetition, will cause the picture card (in fact, dropped frames), and because the network of movies in order to save space, often use quite a lot of B-frame, b frame with more, to the player does not support the B-frame will cause more trouble, the picture is more card.
The following examples illustrate:
In the figure above, the GOP (Group of Pictures) has a length of 13,s0~s7 representing 8 viewpoints, and the t0~t12 is the 13 moments of the GOP. Each GOP contains the number of frames as a product of the number of observer GOP lengths. In the diagram, a GOP contains 94 B-frames. B frames represent 90.38% of the total number of frames in a GOP. The longer the GOP, the higher the percentage of B frames, and the higher the rate distortion performance of the encoding. the following figure tests the rate-distortion performance comparison of sequence Race1 under different GOP.
The theory of rate distortion is the theory of data compression with the basic viewpoint and method of information theory, also called limited distortion source coding theory. The basic problems of rate-distortion theory can be summed up as follows: For a given source distribution and distortion measurement, the minimum desired distortion can be achieved at a certain bit rate, or the minimum description rate can be to satisfy certain distortion limit.
The name of the rate distortion theory derives from the information rate distortion function , and the rate distortion theory contains two central contents: one is the rate distortion function or the distortion rate function, the other is the limit distortion coding theorem. This is for different sources of trust. Different distortion measures and different source probability distributions calculate the distortion function and prove the corresponding limit distortion coding theorem.
--------------------------------
Description of the compression algorithm
--------------------------------
Compression method for H264:
1. Group: A few frames of the image into a group (GOP, that is, a sequence), in order to prevent motion changes, the number of frames should not be taken more.
2. Define frames: Define each frame image as three types, i.e. I, B, and P frames;
3. Prediction frame: I frame as the base frame, I-frame prediction P-frame, and then by I-frame and P-Frame prediction B-frame;
4. Data transfer: Finally, I-frame data and prediction of the difference between the information storage and transmission.
Intra-frame (intraframe) compression is also referred to as spatial compression (spatial compression). When compressing a frame image, only the data in this frame is considered, regardless of the redundancy between adjacent frames, which is actually similar to the static image compression. In the frame is generally used lossy compression algorithm , because the intra-frame compression is encoded a complete image, so it can be independently decoded, displayed. Intra-frame compression generally does not reach very high compression, similar to encoding jpeg.
The principle of inter-frame (interframe) compression is that the data of the neighboring frames is very correlated, or that the two frame information changes very little. That is, continuous video has redundant information between its neighboring frames , according to this feature, compressing the redundancy between adjacent frames can further improve the compression and reduce the compression ratio. Inter-frame compression, also known as time compression (temporal compression), is compressed by comparing data between different frames on the timeline. Inter-frame compression is generally lossless. Frame differencing algorithm is a typical time compression method, which can reduce the amount of data by comparing the difference between the frame and the adjacent frame, only the difference between the frame and its neighboring frame.
By the way, lossy (Lossy) compression and lossless (Lossy less) compression. Lossless compression is also the exact same data that is compressed before and after decompression. The RLE Travel coding algorithm is used for most lossless compression . lossy compression means that the uncompressed data is inconsistent with the data before it is compressed. In the process of compression to lose some of the human eye and the human ear is not sensitive to the image or audio information, and the lost information is not recoverable. almost all high-compression algorithms use lossy compression to achieve the goal of low data rates. The lost data rate is related to the compression ratio, the smaller the compression ratio, the more data is lost, the effect is generally worse after decompression. In addition, some lossy compression algorithms use multiple repeated compression methods, which can also cause additional data loss.
http://blog.csdn.net/abcjennifer/article/details/6577934
Http://blog.sina.com.cn/s/blog_8fb8cd4801018yyo.html