I have written an article "converting FLV stream to standard h264 and ACC in rtmp", link address
Http://www.cnblogs.com/chef/archive/2012/07/18/2597279.html
. The extraction of h264 from rtmp is analyzed.
In flash projects with audio/video interaction, the audio encoding can only be in speex format.
This articleArticleIt is divided into three parts. These are the audio interfaces provided in flex, The speex data in rtmp, and how to convert them to RTP streams.
I. audio interfaces provided in flex
The client is written using Flex. The interface provided by Flex is encapsulated. The call to the client is equivalent to a black box. The difference between the two is analyzed.
Microphone audio interfaces are provided by the mircophone class. Most of them have Chinese comments. I will not repeat them one by one, just pick out some of them for my own explanation.
Codec |
Encoding format. Only the nellymoser and speex formats are supported, Nellymoser is mostly used for game development and has many restrictions on commercial use. |
Rate |
Set the sampling frequency. Note thatMicrophone Sampling Rate, AndNonEncoding Sampling Rate |
Framesperpacket |
OneAudio packageIncluded inAudio FrameQuantity (More details will be provided later) |
Encodequality |
Encoding quality. In the same encoding sampling rate, the higher the quality, the better the effect, HoweverThe more data each frame contains.. When the value is determined,
The size of each frame of data is also in bytes.OKNow |
Enablevad |
Whether to enable voice activation detection. Its role is Google. When enabled, the speex encoder continues encoding in the mute state.10 bytesAudio frame size |
Three modes are available for speex encoding.
Mode |
Encoding Sampling Rate |
One frame of data Time |
Encode a frame Sample count |
Narrow Band (Narrow Band) |
8 kHz |
20 ms |
160 |
Wide Band (Broadband) |
16 kHz |
20 ms |
320 |
Ultra-wide band (Ultra-broadband) |
32 kHz |
20 ms |
640 |
Ii. speex data in rtmp
In each audio package (equivalent to the audiotag in FLV), the first four digits of the first byte represent the encoding format, equal11The description is speex encoding. The last four bytes indicate the encoding sampling rate, single channel or stereo, and each sample size is 8-bit or 16-bit.However, when using speex encoding, they are fixed and invalid data in the Protocol. The encoding sampling rate is 16 kHz, single channel, 16 bit/sample.
The remaining data is the audio frame data, which can be a set of multiple frames, depending onFramesperpacket. The default value in Flex is 2, so each audio package has two frames of data. Note: When the VAD function is enabled,The two data frames can be two actual data frames, or two 10-byte data frames.
Iii. How to convert to RTP stream
After completing the above two steps, this part of work is not difficult. Set the audio frame data to the RTP Header. Join rfc5574 (rtp_payload_format_for_the_speex_codec ).
Unique valueNote:In the rtmp protocol, the interval between audio data is measured in time units, while the timestamp in RTP is the number of samples. Therefore, when the two audio packets in rtmp differ by 20 ms, the RTP timestamp should be added with 320 (and 320 because the 16 kHz encoding sampling rate is always used ).
For more information, see www.cnblogs.com/chef.