h264--1--coding principle and I frame B frame p frame and pts&dts__ coding

Source: Internet
Author: User
Tags coding standards
----------------------

Objective

-----------------------

H264 is a new generation of coding standards, with high compression quality and support for a variety of network streaming media transmission is known, in terms of coding, I understand his theoretical basis is: the reference to a period of time the image of the statistical results show that in the adjacent several images, the general difference in pixels only 10% points, The brightness difference is not more than 2%, and the variation of chroma difference is only within 1%. So for a little change in the image screen, we can first encode a complete image frame A, then the B frame will not encode all the images, only to write a frame of the difference, so that B frame size of only the full frame of 1/10 or less. b frame after the C frame if the change is not very small, we can continue to refer to B to encode C frame, so that the loop down. This image we call a sequence (a sequence is a piece of data with the same characteristics), when an image and the previous image changes very large, can not refer to the previous frame to generate, then we end the previous sequence, start the next sequence, that is, the image generated a full frame A1, then the image on the reference A1 generation, Write only the difference to the A1 content.

In the H264 protocol, three kinds of frames are defined, the complete coded frame is called I frame, the frame named P frame which contains only the difference part encoded by the previous I frame, and the frame named B frame is encoded before and after the reference frame.

The core algorithm used in H264 is intra compression and intra compression, intra compression is the algorithm to generate I frame, and frame compression is the algorithm to generate B frame and P frame.

----------------------

Description of the sequence

----------------------

In H264, the image is organized in sequence, and a sequence is the data stream after an image is encoded, starting with the I frame and ending with the next I frame.

the first image of a sequence is called the IDR image (instantly refreshes the image), and the IDR image is an I-frame image. H.264 introduces IDR images for the purpose of decoding the resynchronization, when the decoder decodes to IDR image, immediately empty the reference frame queue, the decoded data are all exported or discarded, re-lookup parameter set, start a new sequence. in this way, if there is a major error in the previous sequence, you can get a chance to resynchronize here. The image after the IDR image will never be decoded using data from the image before IDR.

A sequence is a stream of data generated by an image encoding that is not very large in content. When the movement is relatively small, a sequence can be very long, because the movement of less change on behalf of the image of the content of the change is very little, so you can make an I frame, and then always p frame, b frame. When the motion changes for a long time, perhaps a sequence is relatively short, for example, it contains an I-Frame and 3-4 p-frames.

-----------------------

Description of three kinds of frames

-----------------------

Frame I: frame-coded frame (also known as Intra picture), the I frame represents the keyframe,I frames are usually the first frame of each GOP (a video compression technique used by MPEG), which is moderately compressed as a random-access reference point that can be treated as an image. I frame can be seen as the product of a compressed image, you can understand the complete reservation of this frame picture, the decoding only need this frame data to complete (because contains the complete picture)

I frame features:
1. It is a full frame compression coding frame. It encodes and transmits the whole frame image information in JPEG compression;
2. The complete image can be reconstructed using only I-frame data during decoding;
3.I frames Describe the background of the image and the details of the motion body;
4.I frames are generated without reference to other images;
5.I frame is a frame of reference for P frame and B frame (its mass directly affects the quality of subsequent frames in the same group);
6.I frame is the base frame (first frame) of frame group GOP, only one I frame in a group;
7.I frames do not need to consider motion vectors;
8.I frames account for a large amount of data.


p Frame, forward prediction coded frame (also known as Predictive-frame)A coded image, also known as a predictive frame, compresses the amount of data transmitted by the time redundancy information that is lower than the previously encoded frame in the image sequence. The P frame represents the difference between this frame and a previous key frame (or P frame), which needs to be superimposed on the previously cached screen to create the final picture. (that is, different frames,P frames have no complete picture data, only data that differs from the previous frame


The prediction and reconstruction of P-frame: P-Frame is a reference frame with I frame, the prediction value and motion vector of P frame "some point" are found in I frame, and the predicted difference and the motion vector are transmitted together. The complete P frame can be obtained by finding the predicted value of P frame "some point" in the receiver based on the motion vector and adding the difference to get the P frame "some point" sample value.
P Frame Features:
1.P frames are coded frames with 1~2 frames spaced behind I frames;
The 2.P frame adopts motion compensation method to transmit the difference between it and the previous I or P frame and the motion vector (prediction error);
3. The complete P frame image must be reconstructed after the sum of the predicted value and the predicted error in the I frame is decoded.
4.P frames belong to the inter-frame encoding of forward prediction. It only refers to the I frame or p frame closest to it;
A 5.P frame can be a frame of reference for p frames behind it, or it can be a frame of reference for B frames before and after it.
6. Because P frames are reference frames, it can cause the diffusion of decoding errors;
7. Because of the difference transmission, the compression of P frames is relatively high.

B-Frame: bidirectional predictive interpolation frame is also called bi-directional interpolated prediction Frame, which takes into account the previous encoded frame with the source image sequence, It also takes into account the time redundancy information between the coded frames behind the source image sequence to compress the encoded image of the transmitted data quantity, also known as bidirectional predictive frame。 Frame B is a two-way differential frame,which is the difference between the frame and the frame in frame B.(Concrete more complex, there are 4 kinds of situation, but I say so simple), in other words, to decode B frame, not only to obtain the cache before the screen, but also to decode the screen after the screen, through the picture and the stack of this frame to get the final picture. b frame compression rate is high, but the CPU will be more tired when decoding.

The prediction and reconstruction of frame B
The frame B is based on the previous I or P frame and the P frame followed by "finding" the predicted value and two motion vectors of "B" frame "a certain point", and taking the predictive difference and the motion vector for transmission. The receiving End "finds (calculates)" The predicted value and sums it with the difference value according to the motion vector in two reference frames, obtains the B frame "some point" the sample value, thus can obtain the complete B frame.
B Frame Features
1.B frames are predicted by the previous I or P frames and the p frames behind them;
The 2.B frame transmits the prediction error and the motion vector between it and the previous I or P frame and the p frame behind it;
3.B frames are bidirectional predictive coding frames;
4.B Frame Compression ratio is the highest, because it only reflects the change of the motion body between the C reference frame and the prediction is more accurate;
5.B frames are not reference frames and do not cause the spread of decoding errors.

Note: The frames of I, B and p are defined according to the need of the compression algorithm, and they are all real physical frames. In general, I frame compression rate is 7 (similar to JPG), p frame is 20,b frame can reach 50. Visible use of B frame can save a lot of space, save the space can be used to save more I frame, so at the same rate, can provide better quality.

--------------------------------

Description of the compression algorithm

--------------------------------

H264 Compression Method:

1. Grouping: A few frames of image into a group (GOP, that is, a sequence), in order to prevent movement changes, the number of frames should not be more.
2. Define frames: Each frame image in each group is defined as three types, i.e. I frame, b frame and p frame;
3. Prediction FRAME: Frame by I frame, the P frame is predicted by I frame, and B frame is predicted by I frame and p frame;
4. Data transmission: Finally the I frame data and the predicted difference information are stored and transmitted.

Intra-frame (intraframe) compression is also called space compression (spatial compression). When compressing a frame image, consider only the data in this frame, regardless of the redundant information between adjacent frames, which is actually similar to static image compression. In the frame, the lossy compression algorithm is generally used, because the frame compression is a complete image, so it can be decoded and displayed independently. Intra-frame compression is generally less than a high compression, similar to the encoding of JPEG.

The principle of intra-frame (interframe) compression is: There is a great correlation between the data of several frames, or the characteristics of two frame information change very little. Also that is, continuous video has redundant information between adjacent frames, according to this feature, compressing the redundancy between adjacent frames can further improve the compression volume and reduce the compression ratio. Intra-frame compression, also known as time compression (temporal compression), is compressed by comparing data between different frames on the timeline. Compression between frames is generally lossless. Frame differencing algorithm is a typical time compression method, which can greatly reduce the amount of data by comparing the difference between the frame and adjacent frames, and only the difference between frames and their adjacent frames.

By the way, lossy (Lossy) compression and lossless (Lossy less) compression. Lossless compression is the exact same data that is compressed before and after compression. Most lossless compression uses the RLE stroke coding algorithm. lossy compression means that the uncompressed data is inconsistent with the data before compression. In the process of compression to lose some human eyes and human ears are not sensitive to the image or audio information, and the lost information can not be restored. Almost all high compression algorithms use lossy compression to achieve a low data rate target. The loss of data rate and compression ratio, the smaller the compression ratio, the more lost data, the effect of decompression is generally worse. In addition, some lossy compression algorithms use multiple repetitive compression methods, which can cause additional data loss.


pts:presentation time Stamp. pts is mainly used to measure the decoded video frame when it is displayed

dts:decode time Stamp. DTS is mainly to identify when the bit stream in the read memory begins to be sent into the decoder for decoding.

In the absence of frame B, the order of DTS and the order of PTS should be the same.


Summary:

IPB Frame differences:

I frame: itself can be extracted through the video decompression algorithm into a separate complete picture.

P frame: You need to refer to an I frame or a B frame in front of it to generate a complete picture.

B frame: To generate a complete picture, refer to the previous I or P frame and a p frame behind it.

A GOP is formed between the two I frame, in which the size of the BF can be set by parameters in the x264, that is, the number of B between I and P or two p.

The above basic can be explained that if a B frame exists, the last frame of a GOP must be p.

the difference between DTS and pts:

DTS is mainly used for video decoding, in the decoding phase. PTS is mainly used for video synchronization and output. Used in display. In the absence of frame B. The output order of DTS and PTS is the same.

Example:

Here is an example of a GOP 15, with the decoded reference frame and the order in which it is decoded:

As shown above: The decoding of I frame does not depend on any other frame. The decoding of P frame relies on the I frame or p frame in front of it. The decoding of B frame relies on the nearest one I frame or p frame and the nearest p frame after it.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.