H264 Coding + Packaging + decoding related knowledge. __ Video decoding

Source: Internet
Author: User

This article turn from click to open link

1 es Stream (Elementary stream): Also called basic code flow, contains video, audio or data continuous stream.


2 pes stream (Packet Elementary Stream): Also known as the packaging of the basic code flow, is the basic code flow ES stream according to the need to be divided into different lengths of data packets, and with Baotou to form a packaged basic code stream PES flow.


3 TS Stream (Transport Stream): Also known as the transmission flow, is a fixed length of 188 bytes of packets consisting of a separate time base of one or more program, a program can contain multiple video, audio, and text information of the ES stream; Each ES stream will have a different PID label. And in order to be able to analyze these ES streams, TS has some fixed PID used to send out tables for program and ES flow information: Pat and PMT tables, (in the MPEG-2 system, by video, The ES stream of audio and the standard information flow generated by auxiliary data multiplexing for actual transmission is called MPEG-2 transfer stream)

Encapsulation: Is bundled packaging, the screen video files and track files packaged together, and according to certain rules to establish sorting and indexing, easy to play player or play software to index playback. Includes Avi/ps (program Stream)/TS (Transport Stream)/MKV (Matroska), etc.


4) I frame: Intra-frame coding frame is also called Intra picture,i frame is usually the first frame of each GOP (MPEG video compression technology), after moderately compressed, as a random access reference point, can be used as an image. I frame can be regarded as the product of a compressed image.


5) P Frame: Forward predictive coding frame is also called predictive-frame, which can compress the transmitted data quantity by the time redundancy information which is lower than the previous coded frame in the image sequence.


6 B Frame: Bidirectional predictive interpolation frame is also called bi-directional interpolated Predictionframe, which takes into account the source image sequence before the encoded frame, It also takes account of the time redundancy information between the coded frames behind the source image sequence to compress the encoded image of the transmitted data quantity, also known as bidirectional predictive frame;


7) Pts:presentationtime Stamp. PTS is mainly used to measure when the decoded video frame is displayed;


8) Dts:decode TimeStamp. DTS is mainly to identify when the bit stream in the read memory begins to be sent into the decoder for decoding.


In the absence of frame B, the order of DTS and the order of PTS should be the same.

9)

I frame: itself can be extracted through the video decompression algorithm into a separate complete picture.

P frame: You need to refer to an I frame or a B frame in front of it to generate a complete picture.

B frame: To generate a complete picture, refer to the previous I or P frame and a p frame behind it.

A GOP is formed between the two I frame, in which the size of the BF can be set by parameters in the x264, that is, the number of B between I and P or two p.

The above basic can be explained that if a B frame exists, the last frame of a GOP must be p.

(a) The differences between DTS and pts:

DTS is mainly used for video decoding, in the decoding phase. PTS is mainly used for video synchronization and output. Used in display. In the absence of frame B. The output order of DTS and PTS is the same.


Here is an example of a GOP 15, with the decoded reference frame and the order in which it is decoded:

As shown above: The decoding of I frame does not depend on any other frame. The decoding of P frame relies on the I frame or p frame in front of it. The decoding of B frame relies on the nearest one I frame or p frame and the nearest p frame after it.


(11) Video decoding the approximate process is as follows:



(12) Calculation of PTS

Method one, according to the IPB type of frame before and after, can know the actual display order of the frame, using the frame rate in the SPS information obtained from the previous, and the frame Count Frame_count can calculate pts. This method requires several frame caches (typically caching the length of a group).

ipbbipbbipb... Frame type
1 2 3 4 5 6 7 8 9 10 11 ... First few frames
1 4 2 3 5 8 6 7 9 12 10 ... frame display order

Between an I frame and the next I frame, is a group.
As you can see from the figure above, the order of the frames of type P is displayed after the last frame B.
Therefore, to obtain the 7th frame of PTS, at least to know his next frame type, in order to learn his display order.

8th frame pts=1000 (milliseconds) *7 (frame display order) * frame rate

Method Two, each slice information inside, all records has PIC_ORDER_CNT_LSB, the current frame in this group's display order. Through this pic_order_cnt_lsb, we can calculate the pts of the current frame directly. This method does not require frame caching.

Calculation formula:

pts=1000* (i_frame_counter+ pic_order_cnt_lsb) * (Time_scale/num_units_in_tick)

I_frame_counter is the last time I frame position frame order, through I frame count + current group's frame order, obtains the frame actual display sequence position, multiply the frame rate, then multiply by 1000 (milliseconds) Base_clock (basic clock frequency), obtains the PTS.

ipbbipbbipb... Frame type
1 2 3 4 5 6 7 8 9 10 11 ... First few frames
1 4 2 3 5 8 6 7 9 12 10 ... Frame Display Order
0 6 2 4 0 6 2 4 0 6 2 ... pic_order_cnt_lsb

Be careful to note that in the above figure, the PIC_ORDER_CNT_LSB in the slice is incremented by 2.
The frame rate, which is usually recorded in the SPS in H264, is twice times the actual frame rate time_scale/num_units_in_tick=fps*2

Therefore, the actual calculation formula should be this way
PTS=1000* (I_FRAME_COUNTER*2+PIC_ORDER_CNT_LSB) * (Time_scale/num_units_in_tick)
Or is
PTS=1000* (I_FRAME_COUNTER+PIC_ORDER_CNT_LSB/2) * (TIME_SCALE/NUM_UNITS_IN_TICK/2)

So, the 11th-frame pts should be so calculated
1000* (9*2+2) * (Time_scale/num_units_in_tick)

RTP Packaging H264 Timestamp, because the H264 standard description is 90000/frame rate, here pts Base_clock are calculated according to 1000 (milliseconds), if the multiplexing to TS, Base_clock is 90k, so it should be multiplied by 90.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.