G.729 spirit dsp definition:
Audio compression encoding
1. What is Speech Encoding Technology? What is its development and current situation?
A:Digital transmission of voice signals has always been one of the Developmental Directions of communication. The use of low-rate Speech Encoding Technology for speech transmission has many advantages over analog transmission of speech signals. The development trend of modern communication determines the two major advantages of Speech Encoding Technology:
- The bandwidth is greatly reduced. From the initial PCM64k encoding to the current standard voice compression Protocol, for example, G.723 encoding speed is 5.3K or 6.3 Kbps; G.729 encoding rate is 8 Kbps. There are also mature algorithms that have not yet formed protocol standards but lower encoding rates, such as AMBE, CELP, RELP, VSELP, MELP, MP-MLQ, LPC-10 and other speech compression algorithms, the minimum encoding rate reaches 2.4 kbps, and some algorithms have been applied in multiple fields including third-generation mobile communication systems (3 GB.
- Ease of integration with IP addresses. The successful use of the Internet has led to the inevitable development trend of integration with IP addresses. Group speech combines the concept of group speech switching with voice transmission, making it easier to connect voice information to the IP network. One of the key technologies of group speech is the Speech Encoding Technology. The low-speed Speech Encoding Technology ensures the real-time performance of speech information. The voice information transmitted through a network that uses grouped voice transmission is itself a packet. Such voice information is very convenient when connected to the Internet.
Speech Encoding can be implemented using both software and hardware. Software Implementation is to use the software method to implement the compression algorithm. The advantage of this method is that the cost is low, the modification is convenient and flexible, but the processing speed is slow, and it is not easy to guarantee the real-time processing. Hardware Implementation is to solidify the speech compression algorithm into a dedicated DSP chip, which is fast and easy to process in real time.
2. 1.1.2 what is G.711 encoding?
A:G.711 suggests a typical compression codec method using PCM waveform encoding to achieve higher speech quality, but the data compression rate is low.
G.711 it is recommended to describe the μ-law (A-Law) compression of PCM, as shown in:
The sampling rate is 8 kHz, and the 12-bit linear A/D is transformed into A digital signal. After the logarithm PCM, the data is compressed to 8 bit, and the first audio is 64 kbit/s.
Audio Compression Technology
1. What are the audio signal indicators?
A:1) frequency band width: the wider the frequency band of the audio signal, the richer the audio signal components, the better the sound quality.
2) Dynamic Range: the larger the dynamic range, the larger the relative variation range of the signal strength, the better the audio effect.
3) Signal-to-Noise Ratio: the Signal-to-Noise Ratio (Signal to Noise Ratio) is short for the Ratio of useful signals to Noise. Noise can be divided into environmental noise and device noise. The higher the signal-to-noise ratio, the better the sound quality.
4) subjective measurement: The sensory mechanism of a person determines the measure of sound. Sensory and subjective testing is an indispensable part of Evaluating sound quality. Of course, reliable subjective measurements are hard to obtain.
2. What is the principle of audio and digital audio?
A:Because the audio signal is a continuously changing analog signal, and the computer can only process and record binary digital signals, the audio signal generated from the natural audio source must undergo certain changes and processing, and be converted into binary data before being sent to the computer for further editing and storage.
PCM (Pulse Code Modulation) Pulse coding Modulation is the most basic Coding Method for modulus conversion. The process of converting analog signals to digital signals is called mode/number conversion. It mainly includes:
- Sampling:Digitization of signals on the timeline;
- Quantification:Digitizes signals on the amplitude axis;
- Encoding:Record the sampled and quantified digital data in a certain format.
First, a set of pulse sampling clock signals are used to multiply the input analog audio signals. The result of multiplication is the digitization of the input signal on the timeline. Then, the signal amplitude after sampling is quantified. The simplest quantization method is balanced quantization, which is completed by the quantizer. Encode the signal after A/D transformation of the quantizer, that is, convert the quantified signal level into A binary code group, and then obtain the discrete binary output data sequence x (n ), n indicates the quantified time series. The value of x (n) is the amplitude after the time is quantified. It is expressed and recorded in binary format.
3. What are the technical indicators of digital audio?
A:
1) sampling frequency:Sampling frequency refers to the number of samples in one second. The selection of sampling frequency should follow the northehield sampling theory (if a simulated signal is sampled, the highest signal frequency that can be restored after sampling is only half of the sampling frequency, or, as long as the sampling frequency is twice the highest frequency of the input signal, the original signal can be reconstructed from the sampling signal series ).
According to the sampling theory, the sampling frequency of the CD laser recording disk is 44 kHz, and the maximum audio record is 22 kHz. The sound quality is almost the same as that of the original sound, that is what we often call Super High Fidelity Sound Quality. In the communication system, the frequency of digital calls is usually 8 kHz, which is consistent with that of the original 4 K bandwidth.
2) quantified digits:The quantize bit is used to digitize the amplitude axis of the analog audio signal. It determines the dynamic range after the analog signal is digitalized. Generally, 8-bit and 16-bit quantiles are calculated by byte. The higher the quantizing bit, the larger the dynamic range of the signal. The closer the digital audio signal is to the original signal, the larger the storage space required.
3) Number of audio channels:There are two channels: single channel and dual channel. Dual-channel, also known as stereo sound, occupies two lines in the hardware. Sound Quality and sound quality are good, but the space occupied by stereo digitization is twice that of single-channel.
4) encoding algorithm:Encoding uses a certain format to record digital data, and uses a certain algorithm to compress digital data to reduce storage space and improve transmission efficiency. Compression algorithms include lossy compression and lossless compression. lossy compression means that after decompression, data cannot be completely restored and part of the information must be lost. One of the basic indicators of compression encoding is the compression ratio, which is usually less than 1. The more compression, the more information is lost, and the greater the distortion after Signal Restoration. Different compression encoding algorithms should be selected based on different applications.
5) data rate and data file format:The data rate is the number of bits per second. It is directly related to real-time information transmission, and the total data volume is directly related to the storage space.
I have some understanding about audio encoding.
Speech Encoding is divided into waveform encoding, parameter encoding, and hybrid encoding. waveform encoding is the processing and processing of the waveform formed by the voice signal. The parameter encoding is based on the digital model generated by the voice signal. The model parameters are obtained and then the digital model is restored according to these parameters, synthesize speech. Hybrid coding integrates the strengths of waveform encoding and parameter encoding to achieve high-quality speech synthesis at a rate of 4-16 kbps. For example, multi-Pulse Excitation Linear Prediction encoding (MPLPC), the Code Base Excitation Linear Prediction encoding (CELP) is mixed encoding.
The auditory intensity of human ears is between 20-20 KHZ. To ensure that the sound is not distorted, the sampling frequency must be around 300 KHZ, but the voice frequency is between 3400-HZ, therefore, 8 kHz is used to sample human voice.
Before waveform encoding, we first sample the analog speech signal, then quantify the amplitude of the sampling result, and then perform binary encoding.
Parameter encoding is used to evaluate the parameters of the digital model. Currently, the lossless Acoustic Tube discrete time model is commonly used. This model integrates the three most important factors of human voice, excitation and radiation. All three factors can be expressed using mathematical functions. In addition, the non-uniformity and correlation of voice signals increase the probability of non-uniformity in the appearance of Small-range signals, correlation is manifested in the correlation between two adjacent sample points and the two sample points between different pitch periods. In linear prediction, relevance is used to calculate the signals that appear from the original voice signal. In addition, a person spends 50% of his/her time in a normal conversation, only listening to the other party's speech. The voice detection service (VAD) is used to determine whether the voice is muted, the soft noise regeneration CNG is used to generate a "nice" mute to the other party,
G.729 and G.723 are suitable for IP Phone Speech Encoding developed by ITU. They are widely used due to their high quality and low bit rate. The following sections describe them respectively:
G.729 is an 8 kbps voice encoding protocol developed by ITU, which is converted into 1 k Bytes. It uses the computation code of the bounded structure to stimulate Linear Prediction (CS-ACELP ).
G.723 is also developed by ITU, but it is a dual-Rate Speech Encoding. It can work in the 5.3kbps and 6.3kbps modes, and the corresponding linear prediction (ACELP) based on the digital generation excitation is used respectively) multi-pulse maximum energy-like (MP-MLQ ).
Register an account on ITU to download the actual C source code and description documents. Compile and run the code under VC 6, but find that the encoding efficiency is too low, the encoding time is 5-6 times after optimization. The source code of ITU adopts the most basic operation and has not been optimized, so the efficiency is extremely low. We recommend a debugging software, DevPartner Studio Professional, for error analysis and performance analysis. It is powerful and easy to operate.
Through DevPartner's source code analysis, we found that most of the operations, whether G.729 or G.723, are concentrated in the basic_op.c function. basic_op.c contains some basic operations, including L_mac (), L_mult (), and L_add () and sature () occupy the vast majority of the operations in basic_op.c. Therefore, optimization should also be concentrated in these functions. There is an overflow flag overflow in basic_op.c. Many basic functions waste a lot of time to judge it. In fact, you can directly remove it. When using it, you can directly determine whether the result is the maximum value to know whether it overflows, the basic_op.c function is optimized using the MMX command, and its saturation feature can be used to reduce many overflow judgment operations,
The following is an example of addition.
Word16 add (Word16 var1, Word16 var2)
{
_ Asm {
Movd mm0, var1;
Movd mm1, var2;
Paddsw mm0, mm1;
Movd eax, mm0;
Emms;
}
}
There are too many function calls in the source code. For example, L_mac calls the Rochelle mult and Rochelle add functions. If you directly enter the Rochelle mult and Rochelle add codes in Rochelle Mac, you can save a lot of time for function calls, in ITU Code, many similar loops are not merged to describe the algorithms in detail. merging them can also improve the efficiency. In addition, some operations in the body of the loop can also be referred to in vitro, it is best to use parameters For the number of cycles controlled by For (), so that the compiler can unbind the loop.
Because of my limited capabilities and energy, I only optimized G.729 and G.723 at the source code level, but also achieved considerable results, the efficiency after optimization is doubled compared with that before optimization. If we do more in-depth algorithm optimization, we believe it will be more improved. The above is the analysis method using wffy, of course, the results are also analyzed using vtune, there is time to try DevPartner Studio Professional.
G729a Program Structure Analysis
Algorithm:G.729 is an 8 kbps Conjugate-Structure Algebraic-Code-Excited Linear Prediction (CS-ACELP) speech compression algorithm approved by ITU-T. g.729 Annex A is a reduced complexity version of the G.729 coder. g.729 AB speech coder was developed for use in multimedia simultaneous voice and data applications. the coder processes signals with 10 MS frames and has a 5 MS look-ahead which results in a total of 15 MS algorithmic delay. the input/output of this algorithm is 16 bit linear PCM samples. forward error correction (FEC) can be ininitialized in the algorithm to achieve noise immunity of the data stream by including control bits into speech frames. corresponding solution was developed (G.729AB + FEC) for Intel x86 platform (fixed point C-code) and it can be ported to DSP or RISC platforms on request.
Features:Fully bit exact with ITU-T G.729 AB 8 Kbps encoded bit stream rate Discontinuous transmission support (DTX) using Voice Activity Detection (VAD) and Comfort Noise Generation (CNG) direct interface with PCM 8 KHz sampled data. both sample-by-sample and block based processing supported Very simple application interface Compliant with TI's eXpressDSP standard. code is reentrant, supports multithreading and dynamic memory allocation Can be easily ported to any DSP or RISC platform