In-depth analysis of ilbc voice enhancement (enhancer)

Source: Internet
Author: User

Continue to learn ilbc codec...

 

1. ilbc enhancer Overview

Ilbc decoder contains a Speech Enhancement unit, which is located between the reconstruction residual signal and the synthesis filter. For details, see "in-depth analysis of ilbc decoder principle". This enhancement Unit acts on the residual signal and improves the speech perception quality by reducing the noise hidden in the frequent periods. Compared with the traditional post-filter enhancement algorithm, this algorithm greatly modifies the residual signal, so it avoids the decrease in sound quality caused by excessive enhancement.

The processing unit of enhancer is the sub-block of 80 samples. A total of 8 sub-blocks are required as enhance memory for calculation. The following table lists the processing rates:

 


Frame size

Forward blocks

Current blocks

Input/output

Delay

MS 20

6 (6*80 = 480)

2 (2*80 = 160)

160/160

40

30 MS

5 (5*80 = 400)

3 (3*80 = 240)

240/240

80

 

 

Ii. ilbc enhancer principle process

 

1. Estimate the pitch of each 80-samples sub-blocks.

Perform cross-correlation calculation in the range of sample latency [20,120], and find that the maximum correlation coefficient of the sample delay position is the pitch value.

 

2. Search for six pitch synchronization sequences near the pitch

Each pitch synchronization sequence is an 80-samples vector. Taking the pitch period equal to 40 as an example, we can find a sequence with the largest correlation with the current sub-block sequence from the positions of the first 40, 80, and 120 samples of the current sub-block. Of course, to ensure accuracy, you can perform a fine search for the first and second samples in each small range, such as [, 39, 42, 82] and [118,119,120,121,122] in this way, three forward pitch synchronization sequences can be obtained, and the other three are obtained in the backward direction based on the same steps. If the pitch synchronization sequence exceeds enhancer memory (640 sample points), set it to zero. For the position of the enhancer memory and baseline synchronization sequence of 20 ms/30 ms, see:

 

 

 

 

3. Calculate the current smooth residual signal using six pitch synchronization sequences

A linear combination of six pitch synchronization sequences can be used to obtain an approximate sequence of the current sub-block. The scale of the sequence is an enhanced residual signal.

 

4. Determine whether the smooth residual signal meets the criteria

Determine whether the gap between the enhanced residual signal and the Unreinforced residual signal is acceptable. If yes, the output can be made directly. If the gap is too large, step 5 and Step 6 are also required for Enhancement under the constraints.

 

5. Use constraints to calculate the mixing factor)

For Voice segments with strong periodicity, the gap between the enhancement sequence calculated by linear combination of six pitch synchronization sequences and the Unreinforced residual signal should be small; however, there is little correlation between the voice Transition Section and the noisy-like part, so the difference between the enhancement sequence and the Unreinforced residual signal will be large. If the output is directly, this is because the excessive cyclical introduction of Audible Noise will lead to a reduction in sound quality, so further processing is required.

 

The final output enhancement residual signal is actually a linear combination of the enhancement residual signal obtained in step 3 and the Unreinforced residual signal. The formula is as follows:

Z = A * y + B * pssq (0 ),

Assume that z is the final output residual signal, y is the enhancement residual signal obtained in step 3, pssq (0) is the Unreinforced residual signal, and A and B is the mixed factor.

The mixed factor is the optimal solution obtained by using the Laplace operator under two constraints and involves many mathematical formulas. I will not go into details here. If you are interested, refer to reference 1.

 

6. Use the hybrid factor to mix the smooth residual signal with the original Unreinforced residual signal, and output the enhanced residual signal.

 

Iii. iLBC Enhancer Summary

ILBC Enhancer essentially finds the periodic sequence of three 80 samples before and after the current Unreinforced residual signal (sub-block, then, the six pitch synchronization sequences are used to improve the sound quality of the current sub-block. If the current sub-block is the voice part, its periodicity is appropriately increased to make the sound more full; if the current sub-block has a non-periodic feature, it reduces its periodicity and noise impact.

 

 

References:

1. IETF: RFC3951.txt

2. Pan bosheng's research on advanced processing of iLBC decoding program

 

Welcome to the discussion!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.