First, audio and video synchronization principle Introduction
In the multimedia sampling/encoding, audio and video have a fixed frequency, such as video with H264 as an example, its frame rate is 15f/s, audio in AAC for example, the sample rate is 44100, each frame contains 1024 samples, so its frame rate is about 43f/s. In this way, it is theoretically necessary to play 15 frames of video data and 43 frames of audio data per second. If one side plays too fast or too slowly, it will affect the user experience, which is why the audio and video need to be synchronized.
Ii. introduction of the implementation plan
2.1 Two time reference points
To achieve audio and video synchronization, two time reference points are required
(1), Encoding time reference point
The first arrived video frame timestamp as the encoded reference timestamp Enpretime
(2), decoding time reference point
Take the first video frame arrival time as the playback reference time Playpretime
2.2 Four cache queues
(1) video receive cache queue
(2) audio receive cache queue
(3) video playback cache queue
(4) Audio playback cache queue
2.3 Synchronous Implementation Steps
The specific implementation scenarios are as follows:
(1). Take the first arrived video frame timestamp as the encoded reference timestamp Enpretime, and with the arrival time as the playback reference time Playpretime. Audio frames that arrive before that time (playpretime), timestamps greater than enpretime, are placed in the cache, are less than enpretime, and are discarded directly.
(2). Periodically reads the audio and video data from the playback cache, and uses the current frame time stamp and the enpretime to do the difference, obtains the encoding time difference, obtains the reading times and the previous playback frame the time difference, obtains the wait times.
When the wait time >= encoding time difference, it is decoded and played. Otherwise, the data is put into the playback cache. When the playback cache is empty, the data is read from the receiving cache queue. And make the same operation as above.
Iii. Similar implementation scenarios
This program is video-driven implementation, there is audio-driven implementation and no driver synchronization scheme, here do not introduce, interested colleagues can study their advantages and disadvantages and the application of the scene.
Audio-video synchronization of network media stream