First, preface
to understand echo cancellation technology, we have to mention the theory of digital signal processing as the theoretical basis of modern communication technology. First of all, there is an important branch of the digital signal processing theory, called adaptive Signal Processing, and in the classical textbook, the echo elimination problem has always been discussed as a classical adaptive signal processing case. Since Echo Elimination is a classic and concrete application in textbooks, there is no mystery in terms of theory. But, why is it that the companies that provide echo cancellation (whether chip or algorithmic) are from abroad, and where are the difficulties and mysteries of echo cancellation technology?
second, the principle of echo elimination
From the reason of communication echo, can be divided into acoustic echo (acoustic echo) and line echo (lines echo), the corresponding echo cancellation technology is called acoustic echo cancellation (acoustic echo CANCELLATION,AEC) and line echo cancellation (lines Echo cancellation, LEC). Acoustic echo is caused by the sound of the speaker being fed back to the microphone multiple times in the hands-free or conference application, and the line echo is caused by the 24-wire matching coupling of the physical electronic circuit.
There are two main reasons for the echo to occur:1. Acoustic echo due to spatial acoustic reflection (see): The Man's voice signal (SPEECH1) is transmitted to the room where the woman is located, and due to the reflection of space, the Echo speech1 (Echo) is re-entered from the microphone, and the woman's voice signal (SPEECH2) is superimposed. When the mixed voice signal is transmitted to the men's room for playback, the man will hear the voice of the lady and her own voice superimposed, affecting the normal quality of the call. If the Echo cancellation module is applied in a lady's room, it will cancel out the echo of the man Echo and prevent the speech1 from being superimposed on the woman's voice signal.
2. Line echoes introduced by the 2-4-wire conversion (see):in the ADSL modem and the switch there are 2-4-wire conversion circuit, due to the problem of the circuit mismatch, there will be a part of the signal back to form an echo. If there is no echo cancellation on the switch side, the caller will hear his or her voice on the phone. Whatever the cause, it's the same thing for a voice communication terminal or voice relay switch: Remove unwanted echoes from the middle of the voice stream when you send them.
Imagine a voice stream that mixes at least two sounds, separating them and then removing one of them, which is a lot more difficult. Just like a bottle of blue ink and a bottle of red ink pour together, and then need to red ink out, this is probably impossible. So it's not surprising that echo cancellation is considered a mysterious and incomprehensible technique. Admittedly, it's impossible to get rid of an echo if it's just a separate voice signal that mixes the echoes (the most advanced blind-signal separation technology is not), but in fact, in addition to this mixed signal, we can also get the original signal that produces the echo, although different from the echo signal.
Let's look at the following AEC Acoustic echo cancellation block Diagram (see):
Among them, we can get two signals: one is the blue and red mixed signal 1, that is the actual need to send the speech and the actual need not send echo echo voice stream, the other is the dashed signal 2, which is the original voice flow caused by the echo.
Then everyone will say, oh, the original echo cancellation is so simple, directly from the mixed signal 1 inside the dashed line 2 to lose it? Please note that this dashed signal 2 and echo Echo are different, direct subtraction will make the speech beyond recognition. We call the mixed signal 1 is the near-end signal NE, the dashed signal 2 is called the remote reference signal FE, if there is no FE this signal, echo cancellation is impossible to complete the task. Although the reference signal FE and Echo are not exactly the same, there are differences, but the two are highly correlated, which is why Echo calls Echo.
Since the FE is related to echo height, echo is also caused by FE, we can represent echo as a mathematical function of Fe:
Echo=f (Fe).
function f is called the echo path.
In acoustic echo cancellation, the function f represents a physical process where the sound is reflected multiple times on the wall, ceiling, etc.
In line echo cancellation, the function f represents the 24-wire matching coupling process of the electronic circuit.
Obviously, the next job we're going to do is solve the function f. The function f can be obtained from the FE calculation Echo, and then from the mixed signal 1 minus echo to achieve echo cancellation.
Although Echo cancellation is a very complex technique, we can simply describe this approach:
1. The audio conference system in room A receives the sound from room B
2, the sound is sampled, this sample is called echo cancellation reference
3. Then the sound is sent to room A's speaker and acoustic Echo Canceller
4, Room B's voice and room A's voice was picked up by room A's microphone
5. The sound is sent to the acoustic echo Canceller, compared to the original sample, and the sound of room B is removed.
The process of solving the echo path function f is probably more difficult to express than the mathematical formula. In view of the difficulty of popular expression of mathematical formulas than the discovery of mathematical formulas, I do not bother to explain. The following paragraph expresses the process of solving the function f using the adaptive filter principle.
Adaptive filter
The adaptive filter is an algorithm or device that automatically adjusts the filter coefficients and achieves the best filtering characteristics based on the estimation of the statistical characteristics of the input and output signals. Adaptive filters can be contiguous or discrete domains. The discrete-domain adaptive filter consists of a set of tapped delay lines, variable weighting coefficients and automatic adjustment coefficients. The drawings indicate that a discrete-domain adaptive filter is used to simulate the signal flow graph of unknown discrete systems. Adaptive filter to the input signal sequence x (n) of each of the values, according to a specific algorithm, update, adjust the weighting coefficient, the output signal sequence Y (n) and the desired output signal sequence D (n) comparison of the mean square error is minimal, that is, the output signal sequence y (n) approximation of the desired signal sequence d (n). The coefficients of the adaptive filter designed with the minimum mean square error can be solved by the Wiener-Hov equation.
B. A method proposed by Videro can solve the adaptive filter coefficients in real time, and the results approach the approximate solution of the Wiener-Hov equation. This algorithm is called the least mean square algorithm or LMS method, and it uses steepest descent method, which computes the coefficient vector of the next moment from the current moment filter coefficient vector by the gradient estimation of the mean square error.
KS in the formula is a negative number, its value determines the convergence of the algorithm, V "ε2 (n)" is the mean square error gradient estimation,
The adaptive filter is applied to automatic equalization, echo cancellation, antenna array beamforming, and other related domain signal processing parameters identification, noise cancellation, spectral estimation and so on. For different applications, only the input signal and the expected signal are different, the basic principle is the same.
The above passage indicates that the echo path function f, which needs to be solved, is an adaptive filter
W(
N) The process of convergence. The input signal added
x(
N) is FE, the desired signal is echo, and the adaptive filter converges
W(
N) is the Echo path function f. After convergence, when the actual echo occurs, we put the FE through the function
W(
N), you can get a very accurate echo, the mixed signal directly minus Echo, get the actual need to send the voice speech, complete the Echo cancellation task.
Notable two points: 1, Adaptive filter Convergence stage, the expected signal is the complete echo, can not be mixed with speech. Because speech and FE are not related, will disrupt
W(
N) of the convergence process. In other words, the echo cancellation algorithm starts to converge to be very fast, the best of the other side too late to speak, you say on the convergence well; after convergence, if the other side began to talk, that is, there is speech mixing, this
W(
NThe coefficients do not change and need to stabilize. 2, the Echo path may be change, once the change, the echo cancellation algorithm to be able to determine, because the adaptive filter learning to start again, that is,
W(
N) requires a new convergence process to approximate the new echo path function f. Basically, the above two points are contradictory, one needs the adaptive filter after convergence to maintain the coefficient of stability, to ensure that not affected by speech speech interference, another need to be adaptive filter at any time to maintain the updated state, to ensure that can track the change of Echo path.
In this way, echo cancellation is difficult only from the mathematical algorithm level!
Simply put, the design of the echo-Cancellation adaptive filter has two contradictory characteristics, i.e. fast convergence and high stability, and how to achieve both of these features is the main design challenge.
After the above analysis, I believe that we have a deep understanding of the principle and technology of ECHO cancellation, which is easy to understand and difficult to implement technology.
Original link: http://silversand.blog.51cto.com/820613/166095
Echo Cancellation-Theory Chapter