I. Features of ECHO in Internet voice communication
Compared with traditional telephones, real-time voice transmission over the Internet has a fatal weakness. That is, the quality of voice is poor, and there are many factors that affect the quality of voice over the Internet, one of the most critical factors is the influence of ECHO. Therefore, to improve the voice quality of the Internet, echo cancellation must be performed during the voice transmission over the Internet. That is to say, the IP Telephone Gateway is used as the voice access device of the Internet, must have echo cancellation function. As voice transmission over the Internet is a brand new telecom service implemented by the technology of group exchange, the transmitted voice signal must undergo encoding, compression, packaging, and other processing, this not only causes a large latency of the ECHO path, but also causes a large latency jitter. Therefore, the echo problem is particularly prominent in the voice transmission over the Internet and has the following features.
1. The echo source is complex.
In traditional telephone systems, there is a so-called "circuit counterattack ". The main original ECHO is a 2-4 Line conversion in the system. The mixer that completes the 2-4 conversion results in "leakage" due to impedance matching, resulting in "circuit echo ". From the connection method of the Internet IP Phone gateway, we can see that one end of the IP Phone gateway is connected to the PSTN and the other end is connected to the Internet.
Although circuit ECHO is generated in the PSTN, it is also transmitted to the IP Phone gateway, which is one of the ECHO sources in Internet voice transmission, the second echo source in Internet voice transmission is the so-called "acoustic echo ". Acoustic Echo means that the sound played by the speaker is picked up by the microphone and sent back to the remote end, so that the remote speaker can hear his own voice. Acoustic Echo is divided into direct echo and indirect echo. Direct echo means that the sound played by the speaker directly enters the microphone without any reflection. This echo delay is the shortest possible because it is related to the voice energy of the remote speaker, the distance between the speaker and the microphone, the angle, the playback volume of the speaker, and the pick-up sensitivity of the microphone. Indirect echo refers to the echo set generated by a speaker playing a sound that enters the microphone after being reflected once or multiple times through different paths. Changes in the surrounding objects, such as human movement, will change the echo return path, because the Echo Features multi-path and time-varying. In addition, background noise is also one of the factors that produce echo.
2. Large echo path latency
There are three sources of latency in voice transmission over the Internet: compression latency, grouped transmission latency, and processing latency. Voice compression latency is the main latency for Echo generation. For example, in G.723.1 standard, the maximum latency for compressing a frame for 30 ms is 37.5 ms. Packet transmission latency is also an important source. tests show that the maximum end-to-end transmission latency is over Ms. Processing latency refers to the encapsulation latency and buffer latency of the voice packet.
3. Large latency jitter of ECHO paths
During voice transmission over the Internet, there are many uncertainties in Echo paths, voice compression latency, and packet transmission routes, and the fluctuation range is large, generally between 20 and 20 ~ Within 50 ms.
Ii. Structure and Related Algorithms of acoustic echo Eliminator
With the development of echo cancellation technology, the focus of current echo elimination research has shifted from the elimination of circuit echo to acoustic echo ".
1. Division of acoustic echo
(1) handling of the surrounding environment
By analyzing the mechanism of Acoustic Echo generation, we can know that the simplest method of acoustic echo control is to improve the surrounding environment of the speaker and minimize the reflection of the speaker's playing sound. For example, you can append a layer of sound-absorbing material on the surrounding walls, or add a layer of liner to increase scattering, the ideal surrounding environment is the time needed for its echo time or RT-60 sound attenuation 60dB) in ms ~ Between Ms. In this environment, reflection can be controlled and the speaker will not feel uncomfortable. The improved environment can effectively suppress indirect acoustic echo, but it cannot do anything about direct acoustic echo.
2) echo suppression is an early echo control method. Echo suppression is a non-linear echo cancellation. It uses a simple comparator to compare the level of the sound to be played by the speaker with the sound picked up by the current microphone. If the former is higher than a threshold value, it is allowed to be passed to the speaker, and the microphone is closed to prevent it from picking up the sound played by the Speaker and causing remote echo. If the sound level picked by the microphone is higher than a certain threshold value, the speaker is disabled to eliminate echo. Because echo suppression is a non-linear echo control method, it may cause non-consecutive playback of the speaker. The echo cancellation effect is affected. With the emergence of high-performance echo eliminators, few of them are using echo eliminators.
3) another method for Acoustic Echo cancellation is to use the Acoustic Echo eliminator AEC: Acoustic Echo Chancellor ), based on the correlation between the Speaker signal and the multi-path echo generated by the Speaker, the AEC establishes the voice model of the remote signal, uses it to estimate the echo, and constantly modifies the filter coefficient, this makes the estimation closer to the real echo. Then, the echo estimation value is subtracted from the input signal of the microphone to eliminate the echo. the AEC also compares the input of the microphone with the previous value of the speaker, this eliminates the acoustic echo of multiple reflections that prolong the latency. The number of output values of past speakers stored in the root speaker memory, and the AEC can eliminate echo of various latencies.