Reprinted from http://blog.chinaunix.net/uid-9688646-id-1998407.html
TS Stream decoding process:
1. Get Pat in ts
2. Obtain PMT From ts
3. You can know the video (audio) type (h264), corresponding PID, and pcr pid transmitted in the current network based on PMT.
4. Set the video filter of the Demux module to the PID and stream type of the corresponding video.
5. the payload data in the ts data packet obtained from the video Demux filter is one piece of PES. The ts header contains the number of packets about which PES the payload belongs. Therefore, the software should copy the data in the payload to the PES buffer to splice a PES package.
6. The merged PES packet header contains PTS and DTS information, and the PES header is es.
7. directly send the es package that is unplugged from the PES header to the decoder for decoding. The decoded data is a frame-by-frame video data, which should be associated with PTS in PES at least for video and audio synchronization.
8. I, B, B, P information is in Es.
ES is a data stream directly from the encoder. It can be a collectively referred to as an encoded video data stream, audio data stream, or other encoded data streams. The es stream is converted to the PES package after it passes through the PES package. The PES package consists of a packet header and a payload.
In the PES layer, PTS (display time tag) and DTS (Decoding time tag) are added to the PES header information for video and audio synchronization. In fact, the Mpeg-2 for video and audio synchronization and system clock recovery time labels are respectively in ES, PES and TS three layers. On the es layer, video buffer verification vbv (video buffer verifier) is mainly related to synchronization to prevent the buffer overflow or underflow of the decoder. On the PES layer, displays the time tags PTS (Presentation Time Stamp) and decoding time stamp in the PES header information. In the TS layer, the ts header contains the program clock reference PCR (program clock reference), which is used to restore the system time series clock STC (system time clock) consistent with the encoding end ).
The basic process is as follows: first, the es basic stream produced by MPEG-2 compression encoding, this data stream is very large, and only the I, P, B of these video frame or audio sampling information, then add some synchronization information and package it into a variable-length data packet PES. It turns out to be a stream format and is now a form of data packet segmentation. At the same time, it should be noted that ES is a data stream that only contains one type of content, such as only video or only audio. the packaged PES also only contain one type of Es, for example, only PES for video es and PES for audio es are included. Elasticsearch is an encoding video or audio data stream. Each elasticsearch consists of several access units (Au, each video AU or audio Au is composed of two parts: the header and the encoded data. One AU is equivalent to one video image or one audio frame. You can also say that, each Au is actually the display unit of the encoded data stream, that is, sampling of a decoded video image or an audio frame. Peg-2 compresses the video to produce I, P, and B frames. Encode the es of I1, P4, B2, B3, P7, B5, and B6 frames in the frame sequence by packaging and inserting the PTS/DTS Mark into each frame to change it to PES. When the PTS/DTS mark is inserted, DTS does not need to be inserted in multiple frames of B because the PTS in B frame is equal to DTS. For I and P frames, the sequence of data packets changes after multiplexing. Therefore, the data packets must be stored in the new sort cache of the Video Decoder before display, after being sorted and displayed again, you must insert PTS and DTS at the same time as the basis for the new sorting.
The existence of the PTS/DTS mark is the key to solving synchronous display of video and audio and preventing overflow or underflow of the decoder input cache.PTSIndicates that the display unit appears in the system target Decoder (STD-system target Decoder)., DTSIndicates that all the access unit bytes areSTDOfEsThe time when the decoder is removed.The video encoding image frame sequence is I1, P4, B2, B3, P7, B5, B6, I10, B8, B9 es. After pts/DTS is added, package them into video PES packages. Each PES package has a header that defines the data content in PES and provides scheduled data. Each packet header of I, P, and B has a PTS and DTS, but PTS and DTs are the same for B frames and do not need to mark the data transmission of B frames. For frames I and P, it must be stored in the re-Sort cache of the Video Decoder before display. After delay (re-sorting), it must be marked with PTS and DTS respectively. For example, the sequence of frames input by the decoder is I1, P4, B2, B3, P7, B5, B6, I10, B8, and B9, p4 should be earlier than B2 and B3, but when it is displayed, P4 must be later than B2 and B3, that is, P4 should be re-ordered by the cache after being guided by the time mark in the data stream inserted in advance, to reconstruct the pre-encoding video frame sequence I1, B2, B3, P4, B5, B6, P7, B8, B9, I10. Obviously, the PTS/DTS mark indicates the existence of a dedicated time scale for determining an event or determining Information Decoding. Relying on the dedicated time scale decoder, you can know the time when the identified event or the information is decoded or displayed. For example, the PTS/DTS mark can be used to determine the encoding, multiplexing, decoding, and reconstruction time.
PCR
PCR is included in ts, that is, the header of TS packet may be included. It is used to specify the expected time when the TS packet arrives at decoder. Its effect on SCR is similar.
DTS, PTS
For an elasticsearch video, there are many I, P, and B frames, while all P and B frames use I and P frames as the reference. Because frame B is a forward and backward reference, to decode frame B, you must decode the P or I after frame B. Therefore, the decode time is inconsistent with the real present time of the frame. Decode each frame at a time based on DTS, and then present each frame according to PTS.
Sometimes there are DTS and PTS in the PES header. For pts, he represents the first fully-qualified audio access unit or video access unit PTS time in the PES package payload (not every audio/video access unit has pts/DTS, therefore, you can specify one in PES as the START ).
The DTS of PES Baotou is also based on this principle, but note that for video, its DTS and PTS can be different, because the existence of B frames makes the order inverted. For audio, There is no two-way prediction. His DTS and PTS can be regarded as a sequence. Therefore, you can always use one to only use PTS.