audio and video synchronization problems
The audio and video streams contain the playback rate information, which is expressed using the sample rate, while the video is represented by F/S, but we cannot simply use these two data to synchronize the audio and video, we need to use DTS (decoding timestamp) and PTS (playback timestamp) The two data , we know that there will be many frames in the video data storage, for example, I,b and p are used in MPEG, because the existence of B-frame makes PTS and DTS different (for reasons see appendix), as shown in Figure 1 is a simple example; Of course, it is the PTS that really affect our audio and video synchronization.
We can get a packet of pts from a movie file, but we can't directly get the PTS for the frame (which we really care about), the solution is to use the PTS of the first packet in a frame as the PTS for this frame, which is possible, because when a package starts a frame, avcodec_decode_video () The function that requests storage space for frames is called, and we can override this function to add a method to get the package DTS (which is easy), because FFmpeg will reorder the packets, so it is Avcodec_decode_video () The processed packages of DTS and the returned frames of PTS are the same, so that the PTS of the frames can be obtained. Of course we may not be able to get this pts at times, so we use internal video_clock (to record the time the video has passed);
So we can use PTS to re-queue our frames (queue_picture);
Then we can go to get the audio playback time pts.
Get_audio_clock ():
Audio_clock is used as the PTS for audio, but in Audio_decode_frame () the Audio_clock is assumed to be full, while in fact the buffer is not satisfied, we need to subtract the idle part of the time: pts-= (double) Hw_buf_size/bytes_per_sec;
Audio_decode_frame (): Audio decoding
When reading a packet, get its PTS and put it in Audio_clock;
The playback time is calculated based on the buffer size and playback rate.
With these two pts, we have three options: Video sync audio (calculate audio and video pts only, to determine if video has a delay), Audio Sync Video (adjusts the sample value of the audio according to the difference of the sound and video pts, i.e. change the size of the sound buffer) and the Audio Video Sync external clock (the previous one).
Appendix:
I Frame: Intra-frame encoding frame also known as intra picture,i frame is usually the first frame of each GOP (a video compression technique used by MPEG), which is moderately compressed as a reference point for random access and can be used as an image. I-frames can be seen as a product of a compressed image.
P Frame: Forward prediction coding frame, also known as predictive-frame, by fully compressing the encoded image of the transmitted data volume by the time redundancy information of the previously encoded frames in the image sequence, is called the prediction frame;
B frame: Bidirectional prediction interpolation coding frame also known as bi-directional interpolated prediction frame, both considering with the source image sequence before the encoded frame, It also takes into account the time redundancy information between the encoded frames behind the source image sequence to compress the encoded image of the transmitted data, also called the bidirectional prediction frame;
I frame: itself can be extracted by the video decompression algorithm into a single complete picture.
P frame: You need to refer to an I frame or b frame in front of it to generate a complete picture.
B frame: Refer to its previous I or P frame and a P-frame behind it to generate a complete picture.
The decoding of I frame does not depend on any other frames. The decoding of P frame depends on the I frame or p frame in front of it. The decoding of frame B depends on the nearest I-frame or P-frame and the nearest P-frame in front of it.