Brief analysis of WEBRTC echo cancellation module

Last Update:2018-02-27 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Welcome to Join WEBRTC Learning Group (659922087) to obtain free learning resources, mutual communication and growth.

WEBRTC of the Echo Cancellation (AEC, AECM) algorithms mainly include the following important modules: Echo delay estimation, NLMS (normalized minimum mean square adaptive algorithm), NLP (nonlinear filtering), CNG (Comfort noise generation). The General classic AEC algorithm should also include double-ended detection (DT).

Considering that the NLMs, NLP and CNG used by WEBRTC belong to the classical algorithm category, this paper mainly introduces WEBRTC echo time-delay estimation algorithm, which is also the distinguishing feature of the WEBRTC echo cancellation algorithm, which is different from the general algorithm (such as video conferencing algorithm).

1) Echo Delay estimation

The echo delay length has a great impact on the performance of the Echo Canceller (the problem of thread synchronization on the PC is not considered here), and the long filter tap is not practical, so the time delay estimation algorithm becomes more important. Commonly used and easy to think of the estimation algorithm is based on the correlation of time-delay estimation algorithm (learning the principle of communication should not be unfamiliar), in addition, the correlation algorithm in the speech coding has also been widely used, such as AMR series, g.729 series, g.718 and other encoders. In the speech signal autocorrelation pitch period, because the encoder is generally frame-based processing, the frame length is generally 10 or 20ms, in this time-delay range of the search pitch is small, but for the application of ECHO cancellation, the delay search range is relatively large, resulting in a high degree of computational complexity. On the handheld terminal, we need to consider the impact of changes in the mobile environment on the performance of the algorithm, such as whether the delay is random, the reflection path is linear or non-linear, and whether the computational capacity (battery) meets the requirements, it is more complex.

Back to WebRTC's echo time-delay estimate, it uses the Gips chief Scientist Bastiaan algorithm. The main ideas of the algorithm are as follows:

Set 1 for speaking, 0 for silent (mute or very weak), reference (distal) signal X (t) and receiver (proximal) signal y (t) may be combined in the following ways: (0,0), (0,1), (1,0), (x), (0,0) means far End and near end are relatively weak sound, (a) means the far end and the near side are relatively strong sound, WEBRT C code by default Two other cases is not possible.

Set at the time interval p, namely p=1,2,..., p, band q,q=1,2,..., Q, the input signal x plus windows (such as the Henning window) after the power spectrum with Xw (P,Q) to indicate that the power spectrum in each frequency band set a threshold Xw (P,Q) _threshold.

If Xw (p,q) >= Xw (p,q) _threshold, then Xw (p,q) = 1;

If Xw (P,Q) < Xw (p,q) _threshold, then Xw (p,q) = 0;

Similarly, for signal y (t), the window signal power spectrum yw (p,q) and threshold yw (p,q) _threshold.

If Yw (p,q) >= Yw (p,q) _threshold, then Yw (p,q) = 1;

If Yw (P,Q) < Yw (p,q) _threshold, then Yw (p,q) = 0;

Considering the practical processing convenience, in the C Code of WEBRTC, the frequency domain power spectrum after FFT transformation is divided into 32 sub-bands, so that the value of each specific sub-band XW (P,Q) can be represented by 1 bits, a total of 32 bits, only a 32-bit data type can be expressed.

WEBRTC defines 75 32-bit binary_far_history arrays of reference signals to hold historical remote reference signals, defines 16 32-bit binary_near_history arrays to hold historical near-end reference signals, and the nearest values are placed in an array labeled 0. The 32 bit with BINARY_NEAR_HISTORY[15] and the 75 32 bit bits in the binary_far_history array are respectively bitwise XOR to get 75 32 bit bits of data, and the physical meaning of the 32 bit is approximate Use the power spectrum to count the correlation of two-frame signals. Statistics 32-bit results in the number of 1 in the Bit_counts, the next with the bit_counts to smooth to prevent delay mutation, get mean_bit_count, you can see that the smaller the Mean_bit_count, it indicates that the near-end data and the frame of the remote data more consistent, The delay of the two is closer to the required delay value, expressed in value_best_candidate. The rest of the work is to protect the boundary value, if the value_best_candidate near the worst delay (preset), it indicates that the value is not reliable, this time delay data is not updated, if the data is reliable, then further using the first-order MARKVO model, This final update is deferred last_delay, as compared to the last time delay data.

Bastiaan's patent itself is more complex than the existing C code implementation, such as at the time of the XOR (0,0), (0,1), (1,0), (four) combinations can be attached to the cost function, and C code equivalent to the default (0,0), (1) The additional weight of the () to (0,1), ( 1,0) Additional weight value is 0, and the C code algorithm is in the frame order of the remote and near-end array xor, the actual application can also be used every 1 frames or 2 frames to do XOR, this can expand the search scope.

In general, the complexity of the time-delay estimation algorithm of WEBRTC is greatly simplified, especially for the mobile terminal, which is more sensitive to the computation volume. For practical applications, the algorithm also has a space for improvement.

2) NLMS (normalized minimum mean square adaptive algorithm)

Lms/nlms/ap/rls is a classic adaptive filtering algorithm, which only briefly introduces the NLMS algorithm used in WEBRTC. Set the remote signal to X (n), the proximal signal is D (n), W (n), then the error signal E (n) =d (n)-W ' (n) x (n) (here ' for the rank), nlms the coefficients of the filter update using the variable step method, that is, step u=u0/(Gamma+x ' (n) *x Where u0 is the update step factor, gamma is the stabilizing factor, then the filter coefficient update equation is W (n+1) =w (n) +u*e (n) *x (n); NLMs is slightly more complex than the traditional LMS algorithm, but the convergence speed is obviously faster. Lms/nlms performance is inferior to AP and RLS algorithms.

It is also worth mentioning that WEBRTC uses the segmented block frequency domain adaptive filtering (PBFDAF) algorithm, which is also a common algorithm for adaptive filters. More information on adaptive filtering can be found in Simon Haykin's Adaptive filter principle.

3) NLP (nonlinear filtering)

The WEBRTC uses a Wiener filter. Here only the expression of the transfer function is given, the estimated power spectrum of the speech signal is Ps (W), the power spectrum of the noise signal is Pn (w), then the transfer function of the filter is H (w) =ps (w)/(Ps (W) +pn (W)).

4) CNG (Comfort noise generation)

The comfortable noise generator used by WEBRTC is relatively simple, first generating a random noise matrix evenly distributed on [0, 1], and then using the power spectrum of the noise to modulate the amplitude of the noise.

In general, WEBRTC's AEC algorithm is simple, practical, and easy to commercialize, on the other hand, the C code has some reservations.

The AEC algorithm in WEBRTC has been studied recently because of the need for work. According to the FULLAEC.M file inside the source code, in general, I think the AEC algorithm belongs to the segmented fast frequency domain adaptive filtering algorithm, partioned block Frequeney domain adaPtive filter (PBFDAF). For specific reference Paez Borrallo J m and Otero m G Use this AEC algorithm to note two points:

1) delay to small, because the algorithm default filter length is divided into 12 blocks, 64 points per block, according to 8000 sampling rate, that is, 12*8ms=96ms data, and beyond this length is not processed.

2) The delay jitter is small, because the algorithm is the default 10 block also calculates the position of the reference data (that is, the filter energy the largest piece), so if the jitter is very large, if the reference data is not accurate, so that the echo can not be removed.

Brief analysis of WEBRTC echo cancellation module

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Brief analysis of WEBRTC echo cancellation module

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Brief analysis of WEBRTC echo cancellation module

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support