There are two types of timestamps in FFmpeg: DTS (decoding time Stamp), and pts (Presentation times Stamp). As the name implies, the former is the decoding time, the latter is the time of the display. To understand these two concepts carefully, you need to first understand the concepts of packet and frame in ffmpeg.
In FFmpeg, the avpacket structure is used to describe the pre-decoded or encoded compressed packets, and the AVFRAME structure is used to describe the signal frames before decoding or encoding. For video, Avframe is a frame image of the video. When the frame image is displayed to the user, it depends on its pts. DTS is a member of the Avpacket that indicates when the package should be decoded. If the encoding of each frame in the video is done sequentially in the order of input (that is, the display order), then the decoding and display time should be consistent. In fact, in most codec standards, such as H. T or HEVC, the encoding sequence and input order are inconsistent. The two different timestamps of PTS and DTS are required.
I,p,b frame and Pts,dts relationship
Basic Concepts:
I Frame: Intra-frame encoding frame also known as intra picture,i frame is usually the first frame of each GOP (a video compression technique used by MPEG), which is moderately compressed as a reference point for random access and can be used as an image. I-frames can be seen as a product of a compressed image.
P Frame: Forward prediction coding frame, also known as predictive-frame, by fully compressing the encoded image of the transmitted data volume by the time redundancy information of the previously encoded frames in the image sequence, is called the prediction frame;
B frame: Bidirectional prediction interpolation coding frame also known as bi-directional interpolated prediction frame, both considering with the source image sequence before the encoded frame, It also takes into account the time redundancy information between the encoded frames behind the source image sequence to compress the encoded image of the transmitted data, also called the bidirectional prediction frame;
Pts:presentation time Stamp. PTS are primarily used to measure when a decoded video frame is displayed.
Dts:decode time Stamp. DTS primarily identifies when the bit stream in memory is read into the decoder to decode.
The order of DTS and the order of PTS should be the same without the existence of B-frames.
IPB frames are different:
I frame: itself can be extracted by the video decompression algorithm into a single complete picture.
P frame: You need to refer to an I frame or b frame in front of it to generate a complete picture.
B frame: Refer to its previous I or P frame and a P-frame behind it to generate a complete picture.
A GOP is formed between two I frames, and the size of the BF can be set by parameter in x264, i.e. the number of B between I and P or two p.
The above-mentioned basic can be explained that if a B frame exists, the last frame of a GOP must be p.
different for DTS and pts:
DTS is primarily used for decoding video, which is used in the decoding phase. PTS are primarily used for video synchronization and output. Used when display. In the absence of a B frame. The output order of DTS and PTS is the same.
Example:
Here is an example of a GOP of 15, with its decoded reference frame and its decoding sequence in it:
As shown above: The decoding of I frame does not depend on any other frames. The decoding of P frame depends on the I frame or p frame in front of it. The decoding of frame B depends on the nearest I-frame or P-frame and the nearest P-frame in front of it.