Two ways of synchronizing audio and video synchronization with the time stamp as the benchmark with the time stamp of the video as the benchmark with the external clock as the reference to the standard audio-video synchronization This is a digression. This blog theme is not write audio and video synchronization but write audio and video synchronization a neglected place summary
Preface
These days to do video screen software, generated h264 files, with VLC playback is audio and video synchronization, but with their own write the player, is always a little bit worse. From the perception to see probably between 0.2s~0.4s.
The original video is synchronized with audio and video.
It is when a person speaks quickly, the mouth is aligned with the sound. two modes of audio and video synchronization
Prerequisite: Both the audio frame and the video frame must have their own time stamp.
No time stamp:
If there is no such premise. Audio is OK, set the audio playback parameters after the continuous broadcast.
Video frames are in trouble, either playing at a fixed frame rate or playing fast or slow.
No time stamp is not considered, this is the practice of beginners. base on audio timestamp
That is, sync the video to audio. This way the sound plays smoothly.
Application scenario: most video players. base on video timestamp
Video frames play at their own pace (the buffer is not playing more than the frame of the play faster, on the contrary on the slow, or fixed frame rate);
The disadvantage is: in order to audio and video synchronization, the sound will be Carrington.
Application Scenario: Live class receiver end player. base on an external clock
Select an external clock as the benchmark, and the video and audio playback speed is based on the clock. standard for audio and video synchronization
From the perspective of perception (this is my perception, not the standard).
Let's agree on some data:
Tv0: The time stamp that currently displays the video frame;
TA0: The current display of the video frame should play the time stamp of the audio frame;
TA1: The current display of the video frame actually plays the time stamp of the audio frame;
If | ta0-ta1| < 0.1s, can be considered synchronous.
If | ta0-ta1| >= 0.3s, it was obvious that the conversation was not right. write here feeling digress, this blog theme is not write audio and video synchronization, but write audio and video sync a neglected place
Audio and video synchronization, a lot of online articles. Find yourself, not explain.
Here's a picture of the sound playing in the system
A description of several nouns:
T0: The moment of the sound that is currently playing
T1-t0: Time required for audio data not yet played in the current system cache
T2-T1: The time that the data that is read by each audio callback function is played
BUF0: Data not yet played in the system
BUF1: Data read every time the audio callback function
When the player finishes playing the Buf0, Buf1 hasn't read it yet. So now there's no sound.
The size of the BUF1 should be based on the system. I use QT in Win, read 16,384 byte at a time.
16,383 bytes:
If it is 11025hz, two channels, 16bit, then the sound duration is probably 0.371519s.
If it is 44100hz, two channels, 16bit, then the sound duration is probably 0.0928798s.
In general, the size of the Buf0 <=buf1. But there's no absolute. Maybe it just wants to read a little bit more data.
After I modified the playback sample rate, Qaudiooutput::buffersize () also changed, but I did not feel the impact of the change here.
After I modified the playback sampling rate, the number of bytes per read was 16384 unchanged.
In fact, I always wanted to change the size of Buf1, but I didn't find a way. Summary
The details of this place are easily overlooked. Before writing the player, did not pay attention to such a little gap, when their own coding to deal with, the problem emerges.
I don't really want to use the 44100 sample rate because the amount of data is 4 times times that of 11025.
But because of the time difference between audio and video synchronization, we have to adopt a 44100 sampling rate.