WEBRTC Echo Cancellation (AEC, AECM) algorithm introduction (RPM)

Source: Internet
Author: User

WEBRTC Echo Cancellation (AEC, AECM) algorithm introduction      &NBSP;WEBRTC echo Cancellation (AEC, AECM) algorithm mainly includes the following important modules: 1. Echo Delay Estimation 2.NLMS ( Normalized minimum mean square adaptive Algorithm) 3.NLP (nonlinear filtering) 4.CNG (Comfort Noise generation), the general classic AEC algorithm should also include double-ended detection (DT). Considering that the NLMs, NLP and CNG used by WEBRTC belong to the classical algorithm category, this paper mainly introduces WEBRTC echo time-delay estimation algorithm, which is also the distinguishing feature of the WEBRTC echo cancellation algorithm, which is different from the general algorithm (such as video conferencing algorithm). &NBSP;1) echo Time-delay estimation       echo delay length has a relatively large impact on the performance of the Echo Canceller (this does not take into account the problem of thread synchronization on the PC), too long filter taps can not actually be applied, so the delay estimation algorithm becomes more important. Commonly used and easy to think of the estimation algorithm is based on the correlation of time-delay estimation algorithm (learning the principle of communication should not be unfamiliar), in addition, the correlation algorithm in the speech coding has also been widely used, such as AMR series, g.729 series, g.718 and other encoders. In the speech signal autocorrelation pitch period, because the encoder is generally frame-based processing, the frame length is generally 10 or 20ms, in this time-delay range of the search pitch is small, but for the application of ECHO cancellation, the delay search range is relatively large, resulting in a high degree of computational complexity. On the handheld terminal, we need to consider the impact of changes in the mobile environment on the performance of the algorithm, such as whether the delay is random, the reflection path is linear or non-linear, and whether the computational capacity (battery) meets the requirements, it is more complex.        Back to WEBRTC echo time-delay estimation, which is based on the Gips chief Scientist Bastiaan algorithm. Here is the main idea of the algorithm: Set 1 to say that there is a voice, 0 for no voice (mute or very weak sound), the reference Terminal (remote) signal X (t) and the receiving end (proximal) signal y (t) may be combined in the following ways: (0,0), (0,1), (1,0), (),   (0,0) indicates that both the far end and the proximal side are relatively weak, and that the remote and proximal are relatively strong sounds, and that the C code of WEBRT is not possible by default in two other cases. At the time interval p, i.e. p=1,2,..., P,   band q,q=1,2,..., Q, the power spectrum of the input signal X plus window (e.g. Henning window) is represented by XW (P,Q), and a threshold XW (P,Q) _threshold is set for the power spectrum in each frequency band,If Xw (p,q)  >= Xw (p,q) _threshold &nbsp,   Xw (p,q) = 1, if Xw (p,q) <   &NBSP;XW (p,q) _threshold     XW (P,Q) = 0; Similarly, for Signal y (t), add window signal power spectrum Yw (P,Q) and Threshold Yw (P,Q) _threshold, if Yw (p,q) >= Yw (p,q) _threshold  ,   Then Yw (p,q) = 1; if Yw (P,q) < Yw (p,q) _threshold,         YW (p,q) = 0; Considering the convenience of actual processing, in WEBRTC C code, The FFT-transformed frequency domain power spectrum is divided into 32 sub-bands, so that the value of each specific sub-band XW (P,Q) can be represented by 1 bits, a total of 32 bits, a 32-bit data type can be expressed. WEBRTC defines 75 32-bit binary_far_history arrays of reference signals to hold historical remote reference signals, defines 16 32-bit binary_near_history arrays to hold historical near-end reference signals, and the nearest values are placed in an array labeled 0. The 32 bit with BINARY_NEAR_HISTORY[15] and the 75 32 bit bits in the binary_far_history array are respectively bitwise XOR, resulting in 75 32 bit bits of data, The physical significance of the 32 bit bit is to approximate the correlation of the two-frame signal using the power spectrum. Statistics 32-bit results in the number of 1 in bit_counts, followed by smoothing bit_counts to prevent delay mutation, get mean_bit_count, you can see   mean_bit_count  smaller, It indicates that the near-end data is more consistent with the remote data of the frame, and the delay is closer to the required delay value, which is expressed by value_best_candidate. The rest of the work is to protect the boundary value, if the value_best_candidate near the worst delay (preset), it indicates that the value is not reliable, this time delay data is not updated, if the data is reliable, then further using the first-order MARKVO model, This final update is deferred last_delay, as compared to the last time delay data.
Bastiaan's patent itself is more complex than the existing C code implementation, such as at the time of the XOR (0,0), (0,1), (1,0), (four) combinations can be attached to the cost function, and C code equivalent to the default (0,0), (1) The additional weight of the () to (0,1) , (1,0) The additional weight value is 0;
In addition, C code algorithm is in the frame order of the remote and near-end array xor, the actual application can also be every 1 frames or 2 frames to do XOR, this can expand the search scope.
In
general, the complexity of the time-delay estimation algorithm of WEBRTC is greatly simplified, especially for the mobile terminal, which is more sensitive to the computation volume. For practical applications, the algorithm also has a space for improvement.
2) NLMS (normalized minimum mean square adaptive algorithm)
Lms/nlms/ap/rls is a classic adaptive filtering algorithm, which only briefly introduces the NLMS algorithm used in WEBRTC.
set the remote signal to X (n), the proximal signal is D (n), W (n), then the error signal E (n) =d (n)-W ' (n) x (n) (here ' for the rank), nlms the coefficients of the filter update using the variable step method, that is, step u=u0/(Gamma+x ' (n) *x  Where u0 is the update step factor and gamma is the stabilizing factor, then the filter coefficients update equation is W (n+1) =w (n) +u*e (n) *x (n); NLMs is slightly more complex than the traditional LMS algorithm, but the convergence speed is obviously faster. Lms/nlms performance is inferior to AP and RLS algorithms.
It
is also worth mentioning that WEBRTC uses the segmented block frequency domain adaptive filtering (PBFDAF) algorithm, which is also a common algorithm for adaptive filters.
more information on adaptive filtering can be found in Simon Haykin's Adaptive filter principle.
3) NLP (nonlinear filtering)
The WEBRTC uses a Wiener filter. Here only the expression of the transfer function is given, the estimated power spectrum of the speech signal is Ps (W), the power spectrum of the noise signal is Pn (w), then the transfer function of the filter is H (w) =ps (w)/(Ps (W) +pn (W)).
4) CNG (Comfort noise generation)
The comfortable noise generator used by WEBRTC is relatively simple, first generating a random noise matrix evenly distributed on [0, 1], and then using the power spectrum of the noise to modulate the amplitude of the noise.

In general, WEBRTC's AEC algorithm is simple, practical, and easy to commercialize, on the other hand, the C code has some reservations.

The AEC algorithm in WEBRTC has been studied recently because of the need for work. According to the FULLAEC.M file inside the source code,

Overall, I think the AEC algorithm belongs to the segmented fast frequency domain adaptive filtering algorithm, partioned block Frequeney domain adaPtive filter (PBFDAF). Refer to Paez Borrallo J m and Otero m G for details

There are two points to note when using the AEC algorithm:

1) delay to small, because the algorithm default filter length is divided into 12 blocks, 64 points per block, according to 8000 sampling rate, that is, 12*8ms=96ms data, and beyond this length is not processed.

2) The delay jitter is small, because the algorithm is the default 10 block also calculates the position of the reference data (that is, the filter energy the largest piece), so if the jitter is very large, if the reference data is not accurate, so that the echo can not be removed.

Introduction to the Echo cancellation (AEC, AECM) algorithm for the

WebRTC (GO)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.