An introduction to WebRTC's echo cancellation (AEC, AECM) algorithm

Source: Internet
Author: User
Tags arrays time interval


reproduced in the original: http://blog.csdn.net/u012931018/article/details/17045077 thank Bo Master.






WEBRTC Echo Cancellation (Acoustic ECHOCANCELLATION,AEC, acoustic echocancellation for MOBILE,AECM) algorithm mainly includes the following important modules: echo Time delay estimation, NLMS (normalized minimum mean square adaptive algorithm), NLP (nonlinear filtering), CNG (Comfort noise generation). The General classic AEC algorithm should also include double-ended detection (DT).



Considering that the NLMs, NLP and CNG used by WEBRTC belong to the classical algorithm category, this paper mainly introduces WEBRTC echo time-delay estimation algorithm, which is also the distinguishing feature of the WEBRTC echo cancellation algorithm, which is different from the general algorithm (such as video conferencing algorithm).



1,echo Delay estimation


The length of the echo delay has a relatively large impact on the performance of the echo canceller (the thread synchronization problem on the PC is not considered here), and the long filter tap can not be applied practically, so the delay estimation algorithm is more important. . Commonly used and easy to think estimation algorithms are based on the relevant time delay estimation algorithm (the communication principle should not be unfamiliar), and related algorithms are also widely used in speech coding, such as amr series, G.729 series, G .718 and other encoders. When the pitch of the speech signal is self-correlated, since the encoder is generally processed by frame, the frame length is generally 10 or 20 ms, and the calculation of the pitch period is small in the delay range. However, for the application of echo cancellation, the delay The search range is relatively large, resulting in high computational complexity. On handheld devices, we need to consider the impact of changes in the mobile environment on the performance of the algorithm, such as whether the delay is randomly changing, whether the reflection path is linear or nonlinear, and whether the amount of computation (battery) meets the requirements is more complicated.

     Back to webrtc's echo delay estimate, it uses the algorithm of Gips chief scientist Bastiaan. The following introduces the main idea of the algorithm:
Let 1 indicate that there is a speech sound, 0 means no speech sound (mute or very weak sound), and the possible combination of the reference end (distal) signal x(t) and the receiving end (near end) signal y(t) is as follows Several types: (0,0), (0,1), (1,0), (1,1),
 (0,0) means that the far end and the near end are relatively weak sounds, (1,1) means that the far end and the near end are relatively strong sounds, and the webrt c code defaults to the other two cases is impossible. . Set on the time interval p, that is, p = 1, 2, ..., P, frequency band q, q = 1, 2, ..., Q, the power after the input signal x is windowed (such as Hanning window) The spectrum is represented by Xw(p,q), and a threshold Xw(p,q)_threshold is set for the power spectrum in each frequency band.
If Xw(p,q) >= Xw(p,q)_threshold , then Xw(p,q) =1;
If Xw(p,q) < Xw(p,q)_threshold , then Xw(p,q) =0;
Similarly, for the signal y(t), the windowed signal power spectrum Yw(p,q) and the threshold Yw(p,q)_threshold,
If Yw(p,q) >= Yw(p,q)_threshold , then Yw(p,q) =1;
If Yw(p,q) < Yw(p,q)_threshold , then Yw(p,q) =0;
Considering the convenience of the actual processing, in the c code of webrtc, the frequency domain power spectrum after the fft transformation is divided into 32 subbands, so that the value of each specific subband Xw(p, q) can be 1 bit. It means that a total of 32 bits are needed, which can be represented by only one 32-bit data type.
Webrtc defines 75 32-bit binary_far_history arrays for the reference signal to store the historical remote reference signal. It defines 16 32-bit binary_near_history arrays to store the historical near-end reference signals. The most recent values are placed in the array with the subscript 0. The 32-bit bit of binary_near_history[15] is XORed with 75 32-bit bits in the binary_far_history array to obtain 75 32-bit bit data. The physical meaning of 32-bit bit is to use the power spectrum to calculate the correlation between two frames. Sex. The number of 1 in the 32-bit result is stored in bit_counts. Next, the bit_counts is smoothed to prevent delay mutation, and the mean_bit_count is obtained. It can be seen that the smaller the mean_bit_count, the more the near-end data and the far-end data of the frame are. In the same way, the delay between the two is closer to the required delay value, which is represented by value_best_candidate. The rest of the work is to protect the boundary value. If value_best_candidate is close to the worst delay (preset), the value is unreliable, and the delay data is not updated. If the data is reliable, the first-order markvo model is further used. The last delay data determines the final update delay last_delay.

Bastiaan's patent itself is more complicated than the existing c code implementation. For example, in the case of XOR (0,0), (0,1), (1,0), (1,1), the four combinations can be attached. The cost function, and the c code is equivalent to the default (0,0), (1,1) additional weight is 1, giving (0,1), (1,0) an additional weight of 0;
In addition, the c code algorithm performs XOR on the far-end and near-end arrays in order of the frame. In actual application, it can also perform XOR every 1 frame or 2 frames, which can expand the search range.
In general, the complexity of webrtc's delay estimation algorithm is greatly simplified compared with the correlation, especially for mobile terminals and other occasions where the computational complexity is sensitive. For practical applications, the algorithm has room for improvement.


2,NLMS (normalized minimum mean square adaptive algorithm)



Lms/nlms/ap/rls is a classic adaptive filtering algorithm, which only briefly introduces the NLMS algorithm used in WEBRTC. Set the remote signal to X (n), the proximal signal is D (n), W (n), then the error signal E (n) =d (n)-W ' (n) x (n) (here ' for the rank of), NLMs to the filter coefficient update using the variable step method, that is, step u=u0/(Gamma+x ' (n) *x Where u0 is the update step factor, gamma is a stabilizing factor, then the filter coefficients update equation is W (n+1) =w (n) +u*e (n) *x (n), NLMs is slightly more complex than the traditional LMS algorithm, but the convergence speed is significantly faster. Lms/nlms performance is inferior to AP and RLS algorithms.

It is also worth mentioning that WEBRTC uses the segmented block frequency domain adaptive filtering (PBFDAF) algorithm, which is also a common algorithm for adaptive filters. More information on adaptive filtering can be found in the Simonhaykin Adaptive filter principle.



3,NLP (nonlinear filtering)



The WEBRTC uses a Wiener filter. Here only the expression of the transfer function is given, the estimated power spectrum of the speech signal is Ps (W), the power spectrum of the noise signal is Pn (w), then the transfer function of the filter is H (w) =ps (w)/(Ps (W) +pn (W)).



4,CNG (Comfort noise generation)



The comfortable noise generator used in the WEBRTC is simple, generating a random noise matrix evenly distributed on [0,1], and then using the power spectrum of the noise to modulate the amplitude of the noise.

In general, WEBRTC's AEC algorithm is simple, practical, and easy to commercialize, on the other hand, the C code has some reservations.

The AEC algorithm in WEBRTC has been studied recently because of the need for work. According to the FULLAEC.M file inside the source code, in general, I think the AEC algorithm belongs to the segmented fast frequency domain adaptive filtering algorithm, Partionedblock Frequeney domain adaPtive filter (PBFDAF). Refer to Paez Borrallo J m Andotero m G

There are two points to note when using the AEC algorithm:


1) Delay to small , because the algorithm default filter length is divided into 12 blocks, 64 points per block, according to 8000 sampling rate, that is, 12*8ms=96ms data, and beyond this length is not processed.

2) The delay jitter is small , because the algorithm is the default 10 block also calculates the position of the reference data (that is, the filter energy the largest piece), so if the jitter is very large, if the reference data is not accurate, so that the echo can not be removed.





Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.