Tutorial 5: Video Synchronization
How to synchronize videos
In the past, we had an almost useless movie player. Of course, it can play videos and audio, but it cannot be called a movie. So what should we do?
PTS and DTS
Fortunately, both audio and video streams have information about how fast and when to play them. Audio streams have sampling, and video streams have frame rates per second. However, if we simply synchronize videos through several frames and multiplied by the frame rate, it is very likely that synchronization will be lost. As a supplement, the packet in the stream has a mechanism called DTS (Decoding timestamp) and PTS (displaying timestamp. For these two parameters, you need to know how to store movies. Formats such as MPEG are called B frames (B Indicates bidirectional bidrectional. The other two types of frames are called I frame and P frame (I represents the key frame and P represents the prediction frame ). Frame I contains a specific complete image. P-frames depend on the previous I-frames and p-frames and are encoded in a comparison or difference method. Frame B is a bit similar to frame P, but it depends on the information of the front and back frames. This explains why we may not get an image after avcodec_decode_video is called.
So for a movie, the frame is shown as follows: I B P. Now we need to know the information in the p frame before the B frame is displayed. Therefore, frames may be stored in this way: ipbb. This is why we have a decoding Timestamp and a display timestamp. The decoding timestamp tells us when to decode and the display timestamp tells us when to display it. Therefore, in this case, our stream can be like this:
PTS: 1 4 2 3
DTS: 1 2 3 4
Stream: I P B
Generally, PTS and DTs are different only when the stream has B frames.
When we call av_read_frame () to obtain a package, the information of PTS and DTS will also be saved in the package. But what we really want is the PTS of the original frame we just decoded, so that we can know when to display it. However, the frame we get from the avcodec_decode_video () function is only an avframe and does not contain any useful PTS values (Note: avframe does not contain timestamp information, but when we wait for the frame, it is not what we want ). However, the FFMPEG re-sorts the packages so that DTS of the packages processed by the avcodec_decode_video () function can always be the same as the PTS returned. However, another warning is that we do not always get this information.
Don't worry, because there is another way to find the post pts, we can makeProgramSort the packages by yourself. We save the PTS of the first packet of a frame: this will serve as the PTS of the entire frame. We can use the avcodec_decode_video () function to calculate which package is the first package of a frame. How to implement it? At any time when a package starts a frame, avcodec_decode_video () calls a function to request a buffer for a frame. Of course, FFMPEG allows us to redefine the function for allocating memory. So we created a new function to save the timestamp of a package.
Of course, even so, we may not get a correct timestamp. We will handle this problem later.
Synchronization
Now, I know when to display a video frame. But how can we do this? Here is an idea: When we show a frame, we calculate the time of the next frame. Then we simply set a new timer. You may want to check the PTS value of the next frame instead of the system clock to see if the timeout will arrive. This method can work, but there are two situations to handle.
First, you need to know what the next PTS is. Now we can add the video rate to our PTS-that's right! However, some movies require repeated frames. This means that the current frame is replayed. This will cause the program to display the next frame too quickly. So we need to calculate them.
Second, as the program now does, the video and audio play is very cheerful and not synchronized at all. If everything works well, we don't have to worry. However, your computer is not the best, and many video files are not in good condition. Therefore, we have three options: synchronizing audio to video, Synchronizing Video to audio, or synchronizing all of them to an external clock (such as your computer clock ). From now on, we will synchronize the video to the audio.
WriteCode: Get the frame Timestamp
Now let's do this in the code. We will need to add some members to our large struct, but we will do it as needed. First, let's take a look at the video thread. Remember, here we get the package output from the decoding thread to the queue. Here, we need to get the frame timestamp from the avcodec_decode_video function. The first method we discussed is to get DTS from the last processed package, which is very easy:
Double PTS;
For (;;){
If (packet_queue_get (& is-> videoq, packet, 1) <0 ){
// Means we quit getting packets
Break;
}
PTS = 0;
// Decode Video Frame
Len1 = avcodec_decode_video (is-> video_st-> codec,
Pframe, & framefinished,
Packet-> data, packet-> size );
If (packet-> DTS! = Av_nopts_value ){
PTS = packet-> DTS;
} Else {
PTS = 0;
}
PTS * = av_q2d (is-> video_st-> time_base );
If no PTS is available, set it to 0.
Okay, that's easy. However, if the DTS of the package cannot help us, we need to use the PTS of the first package of this frame. We can use our own frame application program to implement FFMPEG. The following is the function Format:
Int get_buffer (struct avcodeccontext * C, avframe * pic );
Void release_buffer (struct avcodeccontext * C, avframe * pic );
The applying function does not tell us anything about the package, so we need to save PTS to a global variable every time we get a package. We can read it by ourselves. Then, we save the value to a variable that is hard to understand in the avframe struct. So at the beginning, this is our function:
Uint64_t global_video_pkt_pts = av_nopts_value;
Int our_get_buffer (struct avcodeccontext * C, avframe * pic ){
Int ret = avcodec_default_get_buffer (C, PIC );
Uint64_t * PTS = av_malloc (sizeof (uint64_t ));
* PTS = global_video_pkt_pts;
PIC-> opaque = PTS;
Return ret;
}
Void our_release_buffer (struct avcodeccontext * C, avframe * pic ){
If (PIC) av_freep (& pic-> opaque );
Avcodec_default_release_buffer (C, PIC );
}
Avcodec_default_get_buffer and avcodec_default_release_buffer are default buffer request functions in FFMPEG. The av_freep function is a memory management function that not only releases the memory but also sets the pointer to null.
Now we have the stream_component_open function. We can add these lines to tell FFMPEG how to do it:
Codecctx-> get_buffer = our_get_buffer;
Codecctx-> release_buffer = our_release_buffer;
Now we need to add code to save PTS to global variables and use it as needed. Our code now looks like this:
For (;;){
If (packet_queue_get (& is-> videoq, packet, 1) <0 ){
// Means we quit getting packets
Break;
}
PTS = 0;
// Save global PTS to be stored in pframe in first call
Global_video_pkt_pts = packet-> PTS;
// Decode Video Frame
Len1 = avcodec_decode_video (is-> video_st-> codec, pframe, & framefinished,
Packet-> data, packet-> size );
If (packet-> DTS = av_nopts_value
& Pframe-> opaque & * (uint64_t *) pframe-> opaque! = Av_nopts_value ){
PTS = * (uint64_t *) pframe-> opaque;
} Else if (packet-> DTS! = Av_nopts_value ){
PTS = packet-> DTS;
} Else {
PTS = 0;
}
PTS * = av_q2d (is-> video_st-> time_base );
Technical tip: You may have noticed that we use int64 to represent PTS. This is because PTS is saved as an integer. This value is a timestamp equivalent to a time measurement. It is used to measure the time in the unit of time_base of the stream. For example, if a stream is 24 frames per second, pts with a value of 42 indicates that this frame should be placed at the position of 42nd frames if we have 24 frames per second (which is not correct here ).
We can convert this value to seconds by dividing by the frame rate. The time_base value in the stream indicates 1/framerate (for Fixed Frame Rate), so we get the PTS in seconds. We need to multiply it by time_base.
Write code: Use PTS for synchronization
Now we get PTS. We should pay attention to the two synchronization problems discussed above. We will define a function called synchronize_video, which can update the synchronized PTS. This function can also finally handle the situation where we cannot get PTS. At the same time, we need to know the next frame time so as to set the refresh rate correctly. We can use the internal clock video_clock that reflects the current video playback time to complete this function. We add these values to the large struct.
Typedef struct videostate {
Double video_clock ;///
The following is the synchronize_video function, which can be well annotated:
Double synchronize_video (videostate * is, avframe * src_frame, double PTS ){
Double frame_delay;
If (PTS! = 0 ){
Is-> video_clock = PTS;
} Else {
PTS = is-> video_clock;
}
Frame_delay = av_q2d (is-> video_st-> codec-> time_base );
Frame_delay + = src_frame-> repeat_pict * (frame_delay * 0.5 );
Is-> video_clock + = frame_delay;
Return PTS;
}
You will also notice that we have calculated repeated frames.
Now let's get the correct PTS and queue frames using queue_picture. Add a new timestamp parameter PTS:
// Did we get a video frame?
If (framefinished ){
PTS = synchronize_video (is, pframe, PTS );
If (queue_picture (is, pframe, PTS) <0 ){
Break;
}
}
The only change for queue_picture is to save the time stamp value PTS to the videopicture struct. We must add a timestamp variable to the struct and add a line of code:
Typedef struct videopicture {
...
Double PTS;
}
Int queue_picture (videostate * is, avframe * pframe, double PTS ){
... Stuff...
If (VP-> BMP ){
... Convert picture...
VP-> PTS = PTS;
... Alert queue...
}
Now all the images in our image queue have the correct timestamp value, so let's take a look at the video refresh function. You will remember the last time we used 80 ms of refresh time to cheat it. Now we will calculate the actual value.
Our policy is to simply calculate the timestamp of the previous frame and the current frame to predict the time of the next timestamp. At the same time, we need to synchronize the video to the audio. We will set an audio time Audio clock; an internal value records the position of the Audio being played. Just like the number read from any mp3 player. Since we synchronize the video to the audio, the video thread uses this value to determine whether it is too fast or too slow.
We will implement the Code later. Now we assume that we already have a function get_audio_clock that can give us the audio time. Once we have this value, what should we do when the audio and video are out of sync? A simple and stupid solution is to skip the correct frame or use other methods. As an alternative, we will adjust the value of the next refresh. If the timestamp lags far behind the audio time, we will increase the computing latency. If the timestamp is too advanced than the audio time, We will refresh it as quickly as possible. Since we have adjusted the time and delay, we will compare it with the time we calculated through frame_timer. This frame time frame_timer will calculate all the latencies in the playing of the movie. In other words, this frame_timer indicates when we will display the next frame. We simply add a new frame timer latency, compare it with the system time of the computer, and then use that value to schedule the next refresh. This may be a bit difficult to understand, so study the code carefully:
Void video_refresh_timer (void * userdata ){
Videostate * Is = (videostate *) userdata;
Videopicture * VP;
Double actual_delay, delay, sync_threshold, ref_clock, diff;
If (is-> video_st ){
If (is-> pictq_size = 0 ){
Schedule_refresh (is, 1 );
} Else {
Vp = & is-> pictq [is-> pictq_rindex];
Delay = VP-> PTS-is-> frame_last_pts;
If (delay <= 0 | delay> = 1.0 ){
Delay = is-> frame_last_delay;
}
Is-> frame_last_delay = delay;
Is-> frame_last_pts = VP-> PTS;
Ref_clock = get_audio_clock (is );
Diff = VP-> PTS-ref_clock;
Sync_threshold = (delay> av_sync_threshold )? Delay: av_sync_threshold;
If (FABS (diff) <av_nosync_threshold ){
If (diff <=-sync_threshold ){
Delay = 0;
} Else if (diff> = sync_threshold ){
Delay = 2 * delay;
}
}
Is-> frame_timer + = delay;
Actual_delay = is-> frame_timer-(av_gettime ()/1000000.0 );
If (actual_delay <0.010 ){
Actual_delay = 0.010;
}
Schedule_refresh (is, (INT) (actual_delay * 1000 + 0.5 ));
Video_display (is );
If (++ is-> pictq_rindex = video_picture_queue_size ){
Is-> pictq_rindex = 0;
}
Sdl_lockmutex (is-> pictq_mutex );
Is-> pictq_size -;
Sdl_condsignal (is-> pictq_cond );
Sdl_unlockmutex (is-> pictq_mutex );
}
} Else {
Schedule_refresh (yes, 100 );
}
}
We have done a lot of checks here: first, we ensure that the delay between the current Timestamp and the previous timestamp makes sense. If not, we guess the last delay is used. Next, we have a synchronization threshold, because things are not always so perfect during synchronization. Use 0.01 as its value in ffplay. We also ensure that the threshold value is not shorter than the interval between timestamps. Finally, we set the minimum refresh value to 10 ms.
(I don't know where to put this sentence.) In fact, we should skip this frame, but we don't want to worry about it.
We have added many variables to the large struct, so do not forget to check the code. At the same time, do not forget to initialize the frame time frame_timer and the Frame delay before the Frame delay in the streame_component_open function:
Is-> frame_timer = (double) av_gettime ()/1000000.0;
Is-> frame_last_delay = 40e-3;
Synchronous: Sound clock
Now let's take a look at how to get the sound clock. We can update the clock time in the audio decoding function audio_decode_frame. Now, remember that we are not processing new packages every time we call this function, so we have to update the clock in two places. The first place is when we get a new package: we simply set the sound clock to the timestamp of this package. Then, if there are many frames in a package, we calculate the number of samples and the sampling rate, so when we get the package:
If (Pkt-> PTS! = Av_nopts_value ){
Is-> audio_clock = av_q2d (is-> audio_st-> time_base) * Pkt-> PTS;
}
Then when we handle this package:
PTS = is-> audio_clock;
* Pts_ptr = PTS;
N = 2 * is-> audio_st-> codec-> channels;
Is-> audio_clock + = (double) data_size/
(Double) (N * is-> audio_st-> codec-> sample_rate );
Some details: the temporary function is changed to include pts_ptr, so make sure you have changed those. In this case, pts_ptr is a pointer that notifies the audio_callback function of the timestamp of the current sound package. This will be used to synchronize sound and video next time.
Now we can finally implement our get_audio_clock function. It is not as simple as obtaining the is-> audio_clock value. Note that we will set the sound timestamp every time we process it. But if you read the audio_callback function, it takes time to move data from the sound package to our output buffer. This means that the time recorded in our sound clock is much earlier than the actual time. Therefore, we must check the number of data records that have not been written. The complete code is as follows:
Double get_audio_clock (videostate * Is ){
Double PTS;
Int hw_buf_size, bytes_per_sec, N;
PTS = is-> audio_clock;
Hw_buf_size = is-> audio_buf_size-is-> audio_buf_index;
Bytes_per_sec = 0;
N = is-> audio_st-> codec-> channels * 2;
If (is-> audio_st ){
Bytes_per_sec = is-> audio_st-> codec-> sample_rate * N;
}
If (bytes_per_sec ){
PTS-= (double) hw_buf_size/bytes_per_sec;
}
Return PTS;
}
You should know why this function works properly ;)
That's it! Let's compile it:
Gcc-O tutorial05 tutorial05.c-lavutil-lavformat-lavcodec-LZ-LM 'sdl-config-cflags-libs'
Finally, you can use our own movie player to watch movies. Next time we will take a look at the sound synchronization, and then we will discuss the query for the next guidance.