This assumes that the reader has the basic knowledge of adaptive filters. The AEC of Speex is based on NLMs, which is realized by MDF frequency domain, and finally derives the optimal step estimation: The ratio of residual echo to error. The optimal step is equal to the residual echo variance and the error signal variance ratio , this conclusion can be noted, the following will be used.
For nlms filters of length n, the error signal is defined as the difference between the desired signal and the estimated signal, expressed as follows:
\[e (n) = d (n)-\hat y (n) = d (n)-\sum\limits_{k = 0}^{n-1} {{{\hat w}_k} (n) x (n-k)} \]
Then, the coefficients of the filter are updated with the equation:
\[{\hat W_k} (n + 1) = {\hat W_k} (N) + \mu \frac{{e (n) {x^*} (n-k)}}{{\sum\nolimits_{i = 0}^{n-1} {|x (n-i) {|^2}}}} = { \hat W_k} (N) + \mu \frac{{(d (n)-\sum\nolimits_i {{{\hat w}_i} (n) x (n-i)}) {x^*} (n-k)}}{{\sum\nolimits_{i = 0}^{n-1} {|x (n-i) {|^2}}}} \]
The coefficient error of the filter is:
\[{\delta _k} (N) = {\hat W_k} (N)-{w_k} (n) \]
And the expected signal is the residual echo estimated by the local (near-end) speech + filter.
\[d (n) = V (n) + \sum\nolimits_k {{{\hat w}_k} (n) x (n-k)} \]
The coefficients update equation of the filter can be rewritten as
\[{\delta _k} (n + 1) = {\delta _k} (N) + \mu \frac{{(V (n)-\sum\nolimits_i {{\delta _i} (n) x (n-i)}) {x^*} (n-k)}}{{\sum\ Nolimits_{i = 0}^{n-1} {|x (n-i) {|^2}}}}\]
If the offset at each moment is defined as:
\[\LAMBDA (N) = \sum\nolimits_k {\delta _k^* (n) {\delta _k} (N)} \]
Then, in each iteration of the step, the offset of the filter can be expressed as follows:
\[\LAMBDA (n + 1) = \sum\limits_{k = 0}^{n-1} {|{ \delta _k} (N) + \mu \frac{{(V (n)-\sum\nolimits_i {{\delta _i} (n) x (n-i)}) {x^*} (n-k)}}{{\sum\nolimits_{i = 0}^{n-1} {|x (n-i) {|^2}}}} {|^2}} \]
It is assumed that the remote signal and the near-end signal are white noise and irrelevant.
\[\sigma _v^2 = e\{|v (n) {|^2}\} \]
For the variance of the near-end speech signal, the update equation for the offset is
\[e\{\LAMBDA (n + 1) |\lambda (n), x (n) \} = \LAMBDA (n) \left[{1-\frac{{2\mu}}{n} + \frac{{{\mu ^2}}}{n} + \frac{{2{\mu ^2}\sigma _v^2}}{{\lambda (n) \sum\nolimits_{i = 0}^{n-1} {|x (n-i) {|^2}}}} \right]\]
Here the OFFSET function
\[e\{\LAMBDA (n + 1) |\lambda (n), x (n) \} \]
For the convex function, it is about the step length derivation, and the derivative is 0, you can get:
\[\frac{{\partial e\{\LAMBDA (n + 1) \}}}{{\partial \mu}} = \frac{{-2}}{n} + \frac{{2\mu}}{n} + \frac{{2\mu \sigma _v ^2}}{{\LAMBDA (n) \sum\nolimits_{i = 0}^{n-1} {|x (n-i) {|^2}}}} = 0\]
The optimal step size for the final rollout is:
\[{\mu _{opt}} (N) = \frac{1}{{1 + \frac{{\sigma _v^2}}{{\lambda (n)/n\sum\nolimits_{i = 0}^{n-1} {|x (n-i) {|^2}}}}}}\]
Everyone, don't look at the bottom denominator.
\[\LAMBDA (n)/n\sum\nolimits_{i = 0}^{n-1} {|x (n-i) {|^2}} \]
Very long, in fact, the meaning of the hatred is clear, can be approximated to the variance of the residual echo, so the variance of the output signal is: The variance of the near-end speech + residual echo variance, expressed as follows
\[\sigma _e^2 (n) = \sigma _v^2 (n) + \sigma _r^2 (n) \]
Finally, the optimal step size is derived:
\[{\mu _{opt}} (N) = \frac{1}{{1 + \frac{{\sigma _v^2}}{{\sigma _r^2 (N)}}} = \frac{1}{{\frac{{\sigma _r^2 (n) + \sigma _v^2 }}{{\sigma _r^2 (N)}}} \approx \frac{{\sigma _r^2 (n)}}{{\sigma _e^2 (n)}}\]
\[{\mu _{opt}} (N) = \min \left ({\frac{{\hat \sigma _r^2 (n)}}{{\hat \sigma _e^2 (n)}},1} \right) \]
The above analysis is in the time domain, based on the NLMs, can be seen: the optimal step size equals the residual echo variance and error signal variance ratio. The variance of the error is better, and the variance of residual echo is more difficult to find. Below we look at the above conclusions in the frequency domain how to solve, Speex in the frequency domain adaptive algorithm is: MDF (Multidelay block frequency domain) adaptive filtering.
In the frequency domain, set K for the frequency index, the letter (ELL) is the frame index, the above conclusions are converted to the frequency domain, the results are as follows:
\[{\mu _{opt}} (K,\ell) \approx \frac{{\sigma _r^2 (K,\ell)}}{{\sigma _e^2 (K,\ell)}}\]
So, in the frequency domain how to find the variance of residual echoes, we can define a leakage coefficient, indicating the degree of echo relative to the remote signal leakage, then the residual echo is expressed as
\[\sigma _r^2 (k,\ell) {\rm{=}}\hat \eta (\ell) \hat \sigma _{\hat y}^2 (K,\ell) \]
The optimal step size can be obtained by finding the residual echo based on the leakage coefficient.
\[{\mu _{opt}} (N) = \min \left ({\hat \eta (\ell) \frac{{|\hat Y (K,\ell) {|^2}}}{{| E (K,\ell) {|^2}}},{\mu _{\max}}} \right) \]
That is to say, according to the leakage coefficient, can estimate the residual echo of the remote signal, and then can get the optimal step size, then, another problem, how to estimate the leakage coefficient here? If the reader is familiar with the content of the orthogonal principle, the problem is easy to solve, the answer is as follows:
\[\hat \eta (\ell) = \frac{{\sum\nolimits_k {{R_{ey}} (K,\ell)}}}{{\sum\nolimits_k {{r_{yy}} (K,\ell)}}}\]
\[{r_{ey}} (K,\ell) = (1-\beta (\ell)) {R_{ey}} (K,\ell) + \beta (\ell) {p_y} (k) {p_e} (k) \]
\[{R_{YY}} (K,\ell) = (1-\beta (\ell)) {R_{yy}} (K,\ell) + \beta (\ell) {p_y} (k) {({p_y} (k)) ^2}\]
\[\beta (\ell) = {\beta _0}\min (\frac{{\hat \sigma _y^2 (\ell)}}{{\hat \sigma _e^2 (\ell)}},1) \]
In this case, the autocorrelation of each frequency point, the cross-correlation of the input signal and the error signal are obtained by means of recursive average processing. Finally get the leakage coefficient, the specific implementation can refer to Speex code implementation, the relevant parameters can refer to the following reference paper.
Speex's Echo elimination principle has been analyzed, the final conclusion is: only the change with the leakage coefficient of the relevant parts of the code, is the most impact on the effect of the place, because according to the leakage coefficient, will eventually estimate the optimal step size of the filter.
This article comes from on adjusting the learning rate in Frequency Domain Echo cancellation with Double-talk This paper, by Icoolmedia in his own language to do some sorting, the algorithm Interested friends can join the audio and video algorithm discussion group (374737122), we discuss together!
Depth analysis of Speex echo cancellation principle