I have summarized the commonly used ITU speech codec for your reference:
1. g.711: pulse code modulation (PCM) of voice Frequencies
Encoding type: Expanded PCM
Encoding rate: 64 Kbps
Theoretical latency: the time for processing a sample (1/8000 seconds = 0.125 milliseconds );
Sound Quality: Long Distance quality;
Advantages: Low algorithm complexity, low compression ratio (CD sound quality> 400 Kbps), minimum codec latency (relative to other technologies)
Disadvantage: the occupied bandwidth is high.
Application fields: VOIP and PSTN Telephone Networks
Royalty: free
Note:
G.711 64kb/s pulse coding modulation PCM released by CCITT on April 9, 1970s.
G.711 is the most basic encoding method, commonly known as PCM. It is compressed using μ-Law (mainly used in North America) and a-Law (other regions) nonlinear quantization methods. It is "Basic" because the PCM algorithm is very simple, and many ADC hardware input and output directly support the PCM format. On the other hand, the PCM format often needs to be further compressed in the communication system, therefore, it is the input source of other Speech Encoding algorithms.
2. g.722.1: low-complexity coding at 24 and 32 kbit/s for hands-free operation in systems with low frame loss
Encoding type: Transform the domain Audio Coding
Encoding rate: 24 kbps and 32 Kbps (in fact, more bit rates can be achieved, with an increase of 8 kbps, and a higher bit rate version with 32 KHz sampling );
Theoretical latency: 40 ms (20 ms frame size + 20 ms lookhead)
Sound Quality: audio algorithms such as MP3 and WMA are better at low bit rates. For details, see the Polycom website;
Advantages: Low algorithm complexity, less than 5.5 wmops, sound quality is better than MP3, WMA, and other algorithms at low bit rates;
Disadvantage: for speech, the sound quality is not as loud as the CELP model encoder. In addition, the sound quality above 48 kbps does not change much;
Application fields: video conferences, teleconference, and Internet streaming applications;
Royalty: free
Note:
G.722.1 is a set of Low-bit-rate and low-complexity broadband speech coding algorithms proposed by Polycom. It mainly uses the Transform Domain encoding method, so it can ~ 4000Hz) and less than 7 kHz music encoding, sampling rate is 16 kHz, its 32 Kbps bit rate reconstruction speech quality is equivalent to 64 Kbps g.722 Sb-ADPCM. In practical applications, the audio quality of this algorithm is higher than that of MP3 and other audio algorithms when the bit rate is low. In addition, the low complexity of this algorithm is very suitable for embedded platform communication and storage applications. Finally, it provides a 14 K bandwidth audio encoding scheme with 32 KHz sampling in appendix C, further improving the sound quality.
3. g.722.2: wideband coding of speech at around 16 kbit/s using adaptive multi-Rate Wideband (AMR-WB)
Encoding type: ACELP
Encoding rate: 6.6kbps ~ 23.85 kbps;
Theoretical latency: 25 ms (20 ms frame size + 1/4 subframe size)
Sound Quality: higher quality than narrowband speech;
Advantages: high quality, multi-bit rate processing, and adaptive features;
Disadvantage: the complexity is too high;
Application: 3GPP Wireless Communication;
Royalty method: single authorization
Note:
AMR Wideband (AMR-WB) is so far the only voice codecs standardized for wireless (3GPP) and wired (ITU-T Recommendation G.722.2) applications. Therefore, it is also an ideal decoder for broadband speech applications because it ensures the compatibility of converged Wired/wireless networks. AMR-WB is the only broadband voice standard adopted by 3GPP, and, with broadband voice (sampling frequency 16 kHz) support, it is also a designated decoder for broadband voice and multimedia services in GSM and WCDMA networks. These services include multimedia information services (MMS) IMS Information and presentation service, group exchange streaming media service (PSS), multimedia broadcast/multicast service (MBMS), and wireless one-click Access Service (PoC ). Other applications include VoIP, conferences, Wi-Fi phones, satellite phones, video phones, and Internet streaming audio.
4. G.723.1: Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s
Encoding: ACELP, MP-MLQ
Encoding rate: 5.3 Kbps and 6.3 Kbps;
Theoretical latency: 37.5 ms (30 ms frame size + 1/4 subframe size)
Sound Quality: less than long-distance quality, MOS 3.7;
Advantage: low bit rate and low bandwidth requirements. And to achieve the ITU-TG723 requirements of the Speech Quality, stable performance, to avoid the timing of the carrier signal.
Disadvantage: Moderate Sound Quality;
Application field: VOIP;
Royalty: Free
Note:
G.723.1 is a dual-Rate Speech Encoder, which is a compression algorithm recommended by ITU-T for voice or other audio signals in low-rate multimedia services;
Its target application systems include multimedia communication systems such as H.323 and H.324. At present, this algorithm has become one of the mandatory algorithms in the IP Phone System; the encoder first filters the bandwidth of the traditional telephone signal (based on G.712), and then samples the voice signal at the traditional 8000-Hz rate (based on G.711 ), and transformed into 16 bit linear PCM code as the input of the encoder; In the decoder, the output is reversed to reconstruct the voice signal; high-speed encoder uses multi-pulse Maximum Likelihood quantization (MP-MLQ ), the low-rate encoder uses the ACELP method. Both encoder and decoder must support the two rates and can convert the two rates between frames;
This system can also compress and decompress music and other audio signals, but it is optimal for voice signals. It adopts the mute compression for discontinuous transmission, this means that artificial noise is added to the bit stream during mute. In addition to reserved bandwidth, this technology keeps the sender's modem working continuously and avoids the intermittent interruption of the carrier signal.
5. G.726: 40, 32, 24, 16 kbit/s Adaptive Differential Pulse Code Modulation (ADPCM)
Encoding type: ADPCM
Encoding rate: 40 Kbps, 32 Kbps, 24 Kbps, 16 Kbps;
Theoretical latency: 0.125 ms (8 kHz sample rate)
Sound Quality: 32 kbps, long distance quality;
Advantage: simple computing, half of G.711 bandwidth, and sound quality is similar;
Disadvantage: relatively high bandwidth occupation;
Application fields: VOIP and telephone communication networks;
Royalty: free
Note:
G.726 is the combination of G.721 and G.723. It also adds 16 kbps ADPCM, but the most common method is 32 kbit/s. G.726 is half the bandwidth of G.711, so the available space of the network can be doubled. G.726 specifies how a 64 kbpsA-law or micro-law PCM signal is converted to an ADPCM channel of 40, 32, 24, or 16 kbps. In these channels, 24 and 16 kbps channels are used for voice transmission in digital circuit multiplier devices (DCME, the 40 kbps channel is used for data demodulation signals in DCME (especially 4800 kbps or a higher modem ).
6. G.728: Coding of speech at 16 kbit/s using low-delay code excited linear prediction
Encoding type: CELP
Encoding rate: 16 Kbps;
Theoretical latency: 0.625 ms (8 kHz sample rate)
Sound Quality: Long Distance quality;
Advantages: low latency and strong anti-code capability;
Disadvantage: it is more complicated than other encoders;
Application fields: IP phones, digital mobile communications, and satellite communications;
Royalty: free
Note:
G.728 low-latency code excitation linear prediction coding (LD-CELP) is the world's first standard parameter speech CODEC, based on the CELP algorithm, backward adaptive linear prediction, 50th-order synthetic filtering, and short excitation vector are used to achieve low latency.
7. G.729: Coding of speech at 8 kbit/s using conjugate-structure algebraic-code-excited linear prediction (CS-ACELP)
Encoding: CS-CELP
Encoding rate: 8 Kbps;
Theoretical latency: 15 ms (10 ms frame size + 1/2 subframe lookhead)
Sound Quality: Long Distance quality;
Advantages: low bit rate, high sound quality, and wide application;
Disadvantage: high algorithm complexity;
Application field: VOIP;
Royalty: free
Note:
The International Telecommunication Union (ITU-T) formally adopted G.729 in November 1995. ITU-T recommendation G.729, also known as the "CS-ACELP", is a new speech compression standard. G.729 was jointly developed by several famous international telecommunications entities in the United States, France, Japan and Canada. The G.729 algorithm adopts the CS-ACELP algorithm, which is a kind of algorithm, which is based on the derivative linear prediction and coding scheme. This algorithm combines the advantages of waveform encoding and parameter encoding. Based on the adaptive prediction coding technology, it uses technologies such as vector quantization, synthesis analysis, and sensory weighting. G.729 encoder is designed for low-latency applications. Its frame length is only 10 ms, and the processing latency is also 10 ms, the point-to-point latency generated by G.729 is 25 ms, and the bit rate is 8 kbps.
In 96 years, ITU-T has developed a simplified G.729A, which mainly reduces the complexity of computing for real-time implementation. Therefore, G.729A is currently used.
8. G.729.1: G.729-based embedded variable bit-rate coder: An 8-32 kbit/s scalable wideband coder bitstream interoperable with G.729.
Encoding type: CELP, TDBWE
Encoding rate: 8 kbps ~ 32 kbps, 12 bit rates;
Theoretical delay: 48.9375 ms;
Sound Quality: unknown;
Advantages: Multi-bit rate, which can be fully operated with G.729, G.729A, and G.729B;
Disadvantages: Immature applications;
Application field: VOIP;
Royalty method: Patent
Note:
G.729.1 is an 8-32 kbit/s hierarchical broadband speech and audio codec algorithm that can interoperate with G.729, G.729A, and G.729B. The output signal of G.729EV codecs provides a bandwidth of 50-4000Hz at 8 kbit/s and 12 kbit/s sampling frequencies, the bandwidth for 14-32 kbit/s sampling frequency is 50-7000Hz. In 8 kbit/s, G.729EV can fully interoperate with G.729, G.729 Exhibit A, and G.729 Exhibit B. Therefore, it is foreseeable that the existing G.729-based VoIP infrastructure will be effectively deployed. The codecs work on 20 ms frames, and the algorithm latency is 48.9375 ms. By default, the sampling frequency of the encoder input and decoder output signals is 16 kHz. The encoder generates an embedded bitrate, which is divided into 12 layers and corresponds to 12 available bitrates in 8-32 kbit/s. Bitrate can be truncated on the decoder or any part of the communication system, so that the bit rate can be adjusted to the ideal value in real time without the need for out-of-band signaling.
The basic algorithm is based on the three-stage encoding structure: low frequency band (50-4000Hz) embedded code Excitation Linear Estimation (CELP) codec, high frequency band (4000-7000Hz) Time Domain bandwidth extension (TDBWE) parameter codec and full-band enhancement through the estimation transform codec technology called Time-Domain mixing elimination (TDAC.
References:
1. http://wz.csdn.net/foobarren/
2. http://www.cnblogs.com/huaping-audio/archive/2008/06/19/1224287.html
3. http://kware.blogbus.com/tag/Codec/
4. http://www.cppblog.com/gtwdaizi/archive/2008/04/19/41884.html#47591
5. http://www.ctiforum.com/news/2009news/07/news07242.htm