In flash projects with audio/video interaction, the audio encoding can only be in speex format.
This article is divided into three parts. These are the audio interfaces provided in flex, The speex data in rtmp, and how to convert them to RTP streams.
I. audio interfaces provided in flex
The client is written using Flex. The interface provided by Flex is encapsulated. The call to the client is equivalent to a black box. The difference between the two is analyzed.
Microphone audio interfaces are provided by the mircophone class. Most of them have Chinese comments. I will not repeat them one by one, just pick out some of them for my own explanation.
Codec |
Encoding format. Only the nellymoser and speex formats are supported, Nellymoser is mostly used for game development and has many restrictions on commercial use. |
Rate |
Set the sampling frequency. Note that the microphone sampling rate is not the encoding sampling rate. |
Framesperpacket |
Number of audio frames contained in an audio package (More details will be provided later) |
Encodequality |
Encoding quality. In the same encoding sampling rate, the higher the quality, the better the effect, However, the more data each frame contains. When the value is determined,
The size of the data in each frame is determined. |
Enablevad |
Whether to enable voice activation detection. Its role is Google. When enabled, the speex encoder continuously encodes 10-byte audio frames in the mute state. |
Three modes are available for speex encoding.
Mode |
Encoding Sampling Rate |
One frame of data Time |
Encode a frame Sample count |
Narrow Band (narrowband) |
8 kHz |
20 ms |
160 |
Wide Band (broadband) |
16 kHz |
20 ms |
320 |
Ultra-wide band (ultra-broadband) |
32 kHz |
20 ms |
640 |
Ii. speex data in rtmp
In each audio package (equivalent to the audiotag in FLV), the first four bits of the first byte indicate the encoding format. If the value is equal to 11, it indicates the speex encoding. The last four bytes indicate the encoding sampling rate, single channel or stereo, and each sample size is 8-bit or 16-bit. However, when using speex encoding, they are fixed and invalid data in the Protocol. The encoding sampling rate is 16 kHz, single channel, 16 bit/sample.
The remaining data is the audio frame data, which can be a set of multiple frames, depending on the framesperpacket mentioned above. The default value in Flex is 2, so each audio package has two frames of data. Note: When the VAD function is enabled, the two data frames can be two actual data frames, or two 10-byte data frames, or each occupies one frame.
Iii. How to convert to RTP stream
After completing the above two steps, this part of work is not difficult. Set the audio frame data to the RTP Header. Join rfc5574 (rtp_payload_format_for_the_speex_codec ).
The only note is that in the rtmp protocol, the interval between audio data is measured in time units, while the timestamp in RTP is the number of samples. Therefore, when the two audio packets in rtmp differ by 20 ms, the RTP timestamp should be added with 320 (and 320 because the 16 kHz encoding sampling rate is always used ).