Audio Stream (proportion of redundant data):
Let's take a brief look at the structure of the ADTS header:
1) The ADTS header is at the beginning of each AAC frame and is typically 7 bytes long (or 9 bytes, not seen).
2) The length of each AAC frame is fixed to 1024 sample (can be 1024*n, have not seen n>1 case).
3) Most of the information in the ADTS header is useless, only the sample rate (4bit), the number of channels (3bit) and the frame size (13bit) are useful, and all three items are only 20bit.
The MP4 format centrally stores the index of each frame, with each index accounting for 4 bytes. But because the MP4 itself has other tags, the shorter small file redundancy is still larger than Adts.
such as a 20kpbs of 48kHz HE-AAC voice, if stored with ADTS, redundant data accounted for the proportion can be calculated
1, the amount of audio data per second is 20/8=2560 Byte;
2, the number of audio frames per second is 24000/1024=23.4375 frame, (because the encoded AAC algorithm is HE-AAC, it comes with SBR technology, so only half of the SBR sample rate) to be actually verified
3) The size of the Adts header per second is 7*23.4375=164.0625 Byte
4) The proportion of redundant data is 164.0625/2560 = 6.4%
Visible or quite large the MP4 format will focus on the index of each frame, with each index accounting for 4 bytes. But because the MP4 itself has other tags, the shorter small file redundancy is still larger than Adts.
===============================================================================================
The sampling frequency is the number of times per second that the acoustic amplitude sample is sampled when the analog sound waveform is digitized.
。 The frequency range of normal hearing is approximately between 20hz~20khz, according to the Nyquist sampling theory, in order to ensure that the sound is not distorted, the sampling frequency should be around 40kHz. Commonly used audio sampling frequency 8kHz, 11.025kHz, 22.05kHz, 16kHz, 37.8kHz, 44.1kHz, 48kHz, etc., if the use of higher sampling frequency, but also to achieve the sound quality of the DVD
When decoding AAC audio with a sampling rate of 44.1kHz, the decoding time of a frame must be controlled within 23.22 milliseconds.
Background knowledge:
(An AAC original frame contains 1024 samples and related data over a period of time) according to the AAC document
Analysis:
1 AAC
Playback time of the audio frame = number of samples/sample frequency (in s) for the corresponding sample of an AAC frame
A frame of 1024 sample. Sample Rate Samplerate 44100KHz, 44,100 samples per second,
So according to the formula
playback time of audio frames = number of sampled samples per AAC frame/Sample Frequency
(Note: This time can be used as a reference for decoding time, decoding time should be biased to ensure that the time of a certain range, abnormal words, do some abnormal handling)
For example, the playback time of the current AAC frame is = 1024*1000000/44100= 22.32ms (in ms)
Or
The playback time of the current AAC frame is = 1024/44100 = 0.02232 s (in seconds) =22.32ms (in MS)
Conversely, you can calculate the number of audio frames that are actually buffered when you want to buffer the number of MS from the audio:
For example, how many buffers are needed for 48K buffer 300ms,
Buffer = The number of audio frames (48000/1024) multiplied by the time scale (300/1000) = (48000*300)/(1024*1000) = 14.0625 per second.
2 MP3
MP3 Each frame is 1152 bytes, then:
frame_duration = 1152 * 1000000/sample_rate
For example: Sample_rate = 44100HZ, the calculated length is 26.122ms, this is often heard mp3 each frame playback time fixed to 26ms.
=============================================================================================== the transmission of audio and video streams
1 Ask the guys
I've only done MPEG4. RTP Packet transfer reception for video streaming
Now I'm adding audio capture so I'm going to send a video stream.
I would like to ask you guys in the general practice is to send audio and video streams separately
or a synthetic stream that sends audio and video.
Answer:
Live broadcast considering real-time, you may want to consider the loss of video packets, Bao audio packets, separate transmission is better
On-demand does not consider the real-time, requires synchronization, can be buffered; Merge transfer is better
Now, it's not necessary to pack the synthetic flow.
At the receiving end, the RTP packet is decomposed in the filter to decompress the audio and video stream separately.