Detailed analysis of g723 source code (4) detection weighting and pitch search

Last Update:2018-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

5. Perception weighting and pitch

Mem_shift

This function is used

Combine the previously saved 120 input signals with the current 240 signal value into a 360 buffer zone Buf,

And saves the last 120 input signals to prevdata,

Take the Buf's 60th to 299 sample blocks (that is, a frame of 240) for analysis.

Wght_lpc

Construct a perceptual weighted filter

The LPC coefficient is constructed as follows:

10 10

(1-Σ A (I) * 0.9 ^ I * Z ^-1)/(1-1-Σ A (I) * 0.5 ^ I * Z ^-1)

I = 1 I = 1

Error_wght

Construct 240 sample values and send them to the sensing weighted filter to obtain the sensing weighted voice signal.

The computation process is not detailed in detail. It is divided into two parts: IIR and fir,

The Code is based entirely on the formula of the weighted filter.

Then We splice some buffers to combine the previous 142 historical values with the current 240 sample values (142 is related to the pitch cycle, see below)

/* Construct the buffer */

For (I = 0; I <pitchmax; I ++)

Dpnt [I] = codstat. prevwgt [I];

For (I = 0; I <frame; I ++)

Dpnt [pitchmax + I] = databuff [I];

Vec_norm

Normalize Signals

Estim_pitch

Next we searched for the pitch.

The self-correlation method is used to search for the pitch period.

Calculate the number of a ^ 2 + B ^ 2> 2AB according to the number of keys.

We can know that the self-correlation value of a voice signal will reach the maximum value at its pitch.

(There is also a pitch estimation method, called the short-term average amplitude difference method. Unlike the self-correlation method, it is in the pitch, which is a valley value)

Women's-Hz, children-Hz); adult men's vocal cords are long and thick, so the voice is lower (60-Hz) and the sampling rate is

The 8000 pitch is 26-133 (8000/60 = 133 8000/300 = 26 ),

ITU searches for pitch periods from 18 to 142.

That is, the value of pitchmin pitchmax.

The following are some reference materials for sound frequency:

The base frequency range is about Hz ~ 300Hz

The deep bass emits a maximum frequency of 65.4Hz.

The maximum frequency of a high voice is 1177.2Hz.

Audible frequency range and auditory frequency range of humans and some animals

Name: audible frequency range: △f/Hz; auditory frequency range: △f/Hz

65 ~ 1 100 20 ~ 20 000

Dog 450 ~ 1 800 15 ~ 50 000

Cat 760 ~ 1 500 60 ~ 6 500

Bat 10 000 ~ 150 000 1 000 ~ 200 000

-Dolphins ~ 120 150 ~ 150 000

10 ~ 13 000 250 ~ 20 000

Fish 40 ~ 2 000 --- can the fish speak? I have never heard of it.

Back to the Code, 723 divides the voice frame into two parts to calculate the pitch period, each of which is 120 sample values.

J = pitchmax;

For (I = 0; I <subframes/2; I ++ ){

Line. OLP [I] = estim_pitch (dpnt, (word16) J );

Vadstat. polp [I + 2] = line. OLP [I];

J + = 2 * subfrlen;

}

Estim_pitch uses the self-correlation algorithm, that is, the index of the first peak point, that is, the pitch cycle.

Computing

N = 119 n = 119

(Σ s [N] * s [n-J]) ^ 2)/(Σ s [n-J] * s [n-J]) 18 <= j <= 142

N = 0 n = 0

We can see that the denominator is the energy molecule, which is an auto-correlation function.

To avoid expensive Division operations, the actual code is distorted during comparison,

Let's assume that the target search molecular is Da, the denominator is dB, the maximum searched molecular is Ma, and the denominator is MB.

When the code is to be compared, the actual DA * MB-DB * ma result is greater than zero to determine the corresponding value. The derivation is very simple.

Launch directly based on the nature of the inequality

The following describes the implementation process of the estim_pitch function.

The first step is to calculate the initial energy value. You do not need to calculate the denominator every time. You only need to update the energy in the loop (that is, add the header and remove the end)

/* Init the Energy Estimate */

PR = start-(word16) pitchmin + (word16) 1;

Acc1 = (word32) 0;

For (j = 0; j <2 * subfrlen; j ++)

Acc1 = l_mac (acc1, dpnt [pr + J], dpnt [pr + J]);

Add a header and end it to update the energy, that is, the denominator.

/* Energy update */

Acc1 = l_msu (acc1, dpnt [pr + 2 * subfrlen], dpnt [pr + 2 * subfrlen]);

Acc1 = l_mac (acc1, dpnt [pr], dpnt [pr]);

Computing self-correlation, which is also a part of molecules

/* Compute the cross */

Acc0 = (word32) 0;

For (j = 0; j <2 * subfrlen; j ++)

Acc0 = l_mac (acc0, dpnt [start + J], dpnt [pr + J]);

The following code is relatively difficult to understand,

Calculate the self-correlation square, obtain the numerator, and normalize it.

/* Compute exp and mant of the Cross */

Exp = norm_l (acc0); // LSC calculates the number of left shifts required by the normalized Numerator.

Acc0 = l_shl (acc0, exp); // LSC normalization molecule

Exp = SHL (exp, (word16) 1); // because of the square, the index is doubled ..

Cr = round (acc0 );

Acc0 = Rochelle mult (Cr, Cr); // The Square Value of LSC is calculated here.

Cr = norm_l (acc0); // The result obtained after the LSC is normalized again.

Acc0 = l_shl (acc0, Cr );

Exp = add (exp, Cr );

Cr = extract_h (acc0 );

Denominator, that is, energy Normalization

/* Do the same with energy * // note that the index symbol obtained after normalization is the opposite to the original value, so the maximum value is the minimum value.

Acc0 = acc1;

ENR = norm_l (acc0 );

Acc0 = l_shl (acc0, enr );

Division, corresponding to the exponent subtraction of the denominator

Exp = sub (exp, enr );

ENR = round (acc0 );

If the true value is greater than "1", after one right shift, it is normalized and the index is reduced accordingly (because it is left shift, remember this, otherwise you will think that the index should be increased by 1)

If (Cr> = ENR ){

Exp = sub (exp, (word16) 1 );

Cr = SHR (Cr, (word16) 1 );

}

Next is a piece of tedious comparison code. The author analyzed the following: The general idea is to compare the size.

// The LSC index is small, indicating that the value is large because the value is shifted left.

If (exp <= mxp ){

// The LSC is absolutely small. When the maximum auto-correlation value is saved and the corresponding index is greater than 1.25db, it should be 1.33 times larger than the maximum value.

If (exp + 1) <mxp) {// LSC, which is 4 times larger, directly retained the index

Indx = (word16) I;

Mxp = exp;

MCR = CR;

MNR = ENR;

Continue;

}

If (exp + 1) = mxp) // This is twice the size of LSC, which must be rounded to the same order of magnitude and shifted to the right

TMP = SHR (MCR, (word16) 1 );

Else

TMP = MCR; // The same as the LSC index, so you don't need to shift it. You can multiply it and subtract it to determine the size.

/* Compare with equal exponents */

Acc0 = l_mult (Cr, MNR );

Acc0 = l_msu (acc0, ENR, TMP );

If (acc0> (word32) 0 ){

If (word16) I-indx) <(word16) pitchmin) {// The LSC location difference is less than 18.

Indx = (word16) I;

Mxp = exp;

MCR = CR;

MNR = ENR;

}

Else {// if the location difference between LSC and else is greater than 18, you have to consider whether it is 1.33 times larger, but it seems to be 1.5 times larger...

Acc0 = l_mult (Cr, MNR );

Acc0 = l_negate (l_shr (acc0, (word16) 2 ));

Acc0 = l_mac (acc0, Cr, MNR );

Acc0 = l_msu (acc0, ENR, TMP );

If (acc0> (word32) 0 ){

Indx = (word16) I;

Mxp = exp;

MCR = CR;

MNR = ENR;

}

The final returned index value indx is the pitch Week.

Why? The pitch reflects the correlation of Speech data. In fact, the subsequent adaptive code book is searched based on the pitch,

Through the addition of the five excitation sources around the pitch cycle, the adaptive excitation is obtained. The author will analyze these in the next chapter, to be continued.

Lin shaochuan

In Hangzhou

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Detailed analysis of g723 source code (4) detection weighting and pitch search

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Detailed analysis of g723 source code (4) detection weighting and pitch search

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support