1. RTP speex Header
The RTP Header is defined in [rfc3550. This document defines the usage of fields in the RTP Header.
Payload type (PT): the charge type number in this format.
Marker (m) bit: this bit is used to mark the beginning of a silent sound. Place it on the first package of the audio data. Speex supports sound detection and does not generate frame data when there is no silence. Therefore, the package may be non-continuous transmission.
Extension (x) bit: see RTP rules.
Timestamp: A 32-bit integer that indicates the sampling time of the first frame in a package.
2. the RTP load format of speex
The RTP load of speex is 1. There is no additional header in this format, so only one or more load databases (speex frames) are followed by the standard RTP Header ). Some data may need to be filled at the end of the package.
Figure 1: RTP load of speex
3. speex Load
To package the encoded data into RTP, we only need to consider that the bit stream output by the speex encoder must appear on the decoder in the same order. The load format mentioned here maintains this order.
A typical speex frame has a maximum encoding bit rate of about 110 bytes. The total number of bytes of all speex frames in a package should be smaller than MTU to avoid being split. Speex frames cannot be split!
Frames must be pushed to the package in chronological order.
A rtp packet may contain frames of the same bit rate or different bit rates. However, the bit rate is transmitted in the band, and each frame contains its own bit rate, so you don't have to worry about it during packaging.
The encoding and decoding algorithms can change the bit rate at the boundary of 20 milliseconds. Notifications of bitrate changes are transmitted in the band. Each frame contains the sampling rate (narrowband, broadband, or ultra-wideband) and "Mode" (Bit Rate) information. Therefore, the out-of-band data notification decoder is not required to handle those changes.
The sampling rate must be 8000Hz, 16000Hz, or one of 32000Hz.
RTP load must be filled with data to ensure that the data in integer bytes can be provided. These fill bits are LSB (minimum valid bits)-aligned and placed in the byte order of the network, it is composed of a 0 followed by a group of 1. Only the last frame in the package needs to fill the data. To ensure that the content of a packet ends at the byte boundary.
4. Example of the speex RTP package
In the following example, there is a speex frame in our package, and there are 5 bits to fill the data to ensure that the package size is byte aligned.
5. RTP packets with multiple speex Frames
The following example demonstrates that an RTP packet contains two speex frames. In this example, the length of the speex frame is byte aligned, so no data needs to be filled.
The speex decoder detects the bit rate from the load and checks the frame boundary of 20 milliseconds between frames.
6. Media type
Media type name: Audio
Media word type name: speex
Required parameters:
Rate: the RTP timestamp clock frequency, which is equal to the sampling rate Hz. The sampling rate must be 8000,160 00 or 32000.
Optional parameters:
Ptime: Must be divisible by 20 milliseconds [rfc4566]
Maxptime: Must be divisible by 20 milliseconds [rfc4566]
VBR: Variable Bit Rate-'on', 'off', or 'vad' ('off' by default '). If it is 'on', the variable bit rate is used. If it is 'off', it is not used. If it is 'vad', the fixed bit rate is used, but the silent period is encoded as a special short frame to indicate that there is no sound at that time. This parameter
Used for encoder.
CNG: produce comfortable noise-can be 'on' or 'off' (default: 'off '). If it is 'off', the silent frame is silent. If it is 'on', these frames will be filled with comfortable noise. This parameter is used for encoder.
Mode: the decoding modes supported by multiple speex separated by commas (,) are sorted by priority. The first has the highest priority, and the rest are arranged in sequence. The available narrowband and broadband modes are different. See the following definition:
* {1, 2, 3, 4, 5, 6, 7, 8, any} is used for narrowband
* {, 10, any} for broadband
The mode parameter may contain multiple values. In this case, the remote encoder must be configured to support the first value in the mode list. When 'any' is used, it indicates that it supports all decoding modes. The 'Mode' parameter must always have a value. If 'Mode' does not appear, the mode value is set
The bandwidth is 'mode = "3, any" ', and the bandwidth is 'mode = "8, any "'. Note that each speex frame containing the mode (or bit rate) must be decoded. Therefore, an application must be able to decode any speex frame unless it explicitly specifies in SDP that some modes are not supported (for example, not 'mode = "any "').
The decoding end specifies which modes are supported, which means the encoding end also supports those modes.
Next notice: speex uses SDP.