The method of video and audio time stamping and its audio-video synchronization (playback) principle

Source: Internet
Author: User
how to make video and audio time-stampinghttp://blog.csdn.net/wfqxx/article/details/5497138

1. Video time stamp

PTS = inc++ * (1000/fps); Where Inc is a static, initial value of 0, each time the timestamp Inc plus 1.

In FFmpeg, the code in

pkt.pts= m_nvideotimestamp++ * (M_vctx->time_base.num * 1000/m_vctx->time_base.den);

2. Audio time stamp

PTS = inc++ * (frame_size * 1000/sample_rate)

The code in FFmpeg is

pkt.pts= m_naudiotimestamp++ * (m_actx->frame_size * 1000/m_actx->sample_rate);


The sampling frequency is the number of times per second that the acoustic amplitude sample is sampled when the analog sound waveform is digitized.

。 The frequency range of normal hearing is approximately between 20hz~20khz, according to the Nyquist sampling theory, in order to ensure that the sound is not distorted, the sampling frequency should be around 40kHz. Commonly used audio sampling frequency 8kHz, 11.025kHz, 22.05kHz, 16kHz, 37.8kHz, 44.1kHz, 48kHz, etc., if the use of higher sampling frequency, but also to achieve the sound quality of the DVD

When decoding AAC audio with a sampling rate of 44.1kHz, the decoding time of a frame must be controlled within 23.22 milliseconds.

Background knowledge:

(An AAC original frame contains 1024 samples and related data over a period of time)

Analysis:

1 AAC

Playback time of the audio frame = number of samples/sample frequency (in s) for the corresponding sample of an AAC frame

A frame of 1024 sample. Sample Rate Samplerate 44100KHz, 44,100 samples per second, so the playback time of the audio frame according to the formula = the number/sampling frequency of the sample sample corresponding to an AAC frame

The playback time of the current AAC frame is = 1024*1000000/44100= 22.32ms (in ms)

2 MP3

MP3 Each frame is 1152 bytes, then:

frame_duration = 1152 * 1000000/sample_rate

For example: Sample_rate = 44100HZ, the calculated length is 26.122ms, this is often heard mp3 each frame playback time fixed to 26ms.




Audio and video synchronization (playback) principle

Each frame of audio or video has a duration: Duration:
The sampling frequency is the number of times per second that the acoustic amplitude sample is sampled when the analog sound waveform is digitized.
。 The frequency range of normal hearing is approximately between 20hz~20khz, according to the Nyquist sampling theory, in order to ensure that the sound is not distorted, the sampling frequency should be around 40kHz. The commonly used audio sampling frequency is 8kHz,

11.025kHz, 22.05kHz, 16kHz, 37.8kHz, 44.1kHz, 48kHz, etc., if using higher sampling frequency, can also achieve the sound quality of DVD
When decoding AAC audio with a sampling rate of 44.1kHz, the decoding time of a frame must be controlled within 23.22 milliseconds.
Background knowledge:
(An AAC original frame contains 1024 samples and related data over a period of time)
Analysis:
1) AAC
Playback time of the audio frame = number of samples/sample frequency (in s) for the corresponding sample of an AAC frame
A frame of 1024 sample. Sample Rate Samplerate 44100KHz, 44,100 samples per second, so the playback time of the audio frame according to the formula = the number/sampling frequency of the sample sample corresponding to an AAC frame
The playback time of the current AAC frame is = 1024*1000000/44100= 22.32ms (in ms)
2) MP3
MP3 Each frame is 1152 bytes, then:
frame_duration = 1152 * 1000000/sample_rate
For example: Sample_rate = 44100HZ, the calculated length is 26.122ms, this is often heard mp3 each frame playback time fixed to 26ms.
3) H264
Video playback time is related to frame rate frame_duration = 1000/fps
For example: fps = 25.00, calculated from time to time is 40ms, this is what the peer said 40ms a frame of video data.

The theoretical audio and video (playback) synchronization is this:
Thus, the duration of each frame of data is obtained, and the audio and video are stored in the container: one time axis:
Time axis: 0 22.32 40 44.62 66.96 80 89.16 111.48 120 ........
Audio: 0 22.32 44.62 66.96 89.16 111.48 ........
Video: 0 40 80 120 .........
That is, the duration of the video is added and the duration of the audio is added for comparison, who writes the small one.


but the actual situation (play) is not tenable

1: First Solve a problem

Why not audio broadcast frequency video broadcast video that is above the 22.32ms multicast one frame of audio, to 40ms broadcast a frame video.

because this 22.32ms or 40ms is not an accurate, or the sound card broadcast time is not the same . Here you need to know how long it takes for a sound card to broadcast a frame/or to play a buf audio.

2: The sound card broadcasts a sample point instead of one frame at a time. The sound can be heard when a sample point is lost, and the video is not.


3: Audio and Video synchronization mode: 1----callback mode

Assuming that the sound card has two caches that are stored to play the sound PCM has been playing "B" BUF first determine the points

(1) buf size is fixed so that playing a buf time is fixed, assuming 30ms;

(2) when the BUF "B" is finished buf run out, then play buf "A" to ensure that the audio PCM has been continuously

(3) When a buf play, that means the system (sound card) over 30ms, it is possible that the real time over 40ms (here do not care), here through the callback to get a time 30ms;

(4) To use the video corresponding to the 30ms audio, the time is accurate:

Time Axis: 0 30 60 90 1 20 ..........
Audio: 0 22.32 44.62 66.96 89.16 111.48 ................
Video: 0 40 80 120 ..........

(5) Here is a problem is the video 30ms to 40ms in the middle of the 10ms is how to calculate, this is not concerned about, because the eyes 10ms is not visible,

That is, when the audio 30ms a callback, you can play the second frame of video, as above

First callback (30MS)---broadcast (40ms) video,

First callback (60MS)---broadcast (80ms) video,

The first callback (90MS)---not broadcast video,

The first callback (120MS)---broadcast (120ms) video.

4: Audio and Video synchronization mode: 1----blocking mode

Or look at the picture above?

(1) buf "B" has been playing, incoming BUF "a" external buf the data to Buf "a" not immediately return, wait until the BUF "B" playback is complete and then return,

At this point, from the incoming to blocked out is a buf time such as the above 30ms.

(2) then BUF "A" has been playing, the incoming buf "B" external buf the data to buf "B" not immediately return, wait until the BUF "a" playback is complete and then return,

At this point, from the incoming to blocked out is a buf time such as the above 30ms.

(3) loop above (1) (2), that is, the same 30ms time as the callback method. The following is the same as the callback method, see callback Mode (4) (5).


Transfer from http://blog.csdn.net/zhuweigangzwg/article/details/25815851

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.