5. Perception weighting and pitch
Mem_shift
This function is used
Combine the previously saved 120 input signals with the current 240 signal value into a 360 buffer zone Buf,
And saves the last 120 input signals to prevdata,
Take the Buf's 60th to 299 sample blocks (that is, a frame of 240) for analysis.
Wght_lpc
Construct a perceptual weighted filter
The LPC coefficient is constructed as follows:
10 10
(1-Σ A (I) * 0.9 ^ I * Z ^-1)/(1-1-Σ A (I) * 0.5 ^ I * Z ^-1)
I = 1 I = 1
Error_wght
Construct 240 sample values and send them to the sensing weighted filter to obtain the sensing weighted voice signal.
The computation process is not detailed in detail. It is divided into two parts: IIR and fir,
The Code is based entirely on the formula of the weighted filter.
Then We splice some buffers to combine the previous 142 historical values with the current 240 sample values (142 is related to the pitch cycle, see below)
/* Construct the buffer */
For (I = 0; I <pitchmax; I ++)
Dpnt [I] = codstat. prevwgt [I];
For (I = 0; I <frame; I ++)
Dpnt [pitchmax + I] = databuff [I];
Vec_norm
Normalize Signals
Estim_pitch
Next we searched for the pitch.
The self-correlation method is used to search for the pitch period.
Calculate the number of a ^ 2 + B ^ 2> 2AB according to the number of keys.
We can know that the self-correlation value of a voice signal will reach the maximum value at its pitch.
(There is also a pitch estimation method, called the short-term average amplitude difference method. Unlike the self-correlation method, it is in the pitch, which is a valley value)
Women's-Hz, children-Hz); adult men's vocal cords are long and thick, so the voice is lower (60-Hz) and the sampling rate is
The 8000 pitch is 26-133 (8000/60 = 133 8000/300 = 26 ),
ITU searches for pitch periods from 18 to 142.
That is, the value of pitchmin pitchmax.
The following are some reference materials for sound frequency:
The base frequency range is about Hz ~ 300Hz
The deep bass emits a maximum frequency of 65.4Hz.
The maximum frequency of a high voice is 1177.2Hz.
Audible frequency range and auditory frequency range of humans and some animals
Name: audible frequency range: △f/Hz; auditory frequency range: △f/Hz
65 ~ 1 100 20 ~ 20 000
Dog 450 ~ 1 800 15 ~ 50 000
Cat 760 ~ 1 500 60 ~ 6 500
Bat 10 000 ~ 150 000 1 000 ~ 200 000
-Dolphins ~ 120 150 ~ 150 000
10 ~ 13 000 250 ~ 20 000
Fish 40 ~ 2 000 --- can the fish speak? I have never heard of it.
Back to the Code, 723 divides the voice frame into two parts to calculate the pitch period, each of which is 120 sample values.
J = pitchmax;
For (I = 0; I <subframes/2; I ++ ){
Line. OLP [I] = estim_pitch (dpnt, (word16) J );
Vadstat. polp [I + 2] = line. OLP [I];
J + = 2 * subfrlen;
}
Estim_pitch uses the self-correlation algorithm, that is, the index of the first peak point, that is, the pitch cycle.
Computing
N = 119 n = 119
(Σ s [N] * s [n-J]) ^ 2)/(Σ s [n-J] * s [n-J]) 18 <= j <= 142
N = 0 n = 0
We can see that the denominator is the energy molecule, which is an auto-correlation function.
To avoid expensive Division operations, the actual code is distorted during comparison,
Let's assume that the target search molecular is Da, the denominator is dB, the maximum searched molecular is Ma, and the denominator is MB.
When the code is to be compared, the actual DA * MB-DB * ma result is greater than zero to determine the corresponding value. The derivation is very simple.
Launch directly based on the nature of the inequality
The following describes the implementation process of the estim_pitch function.
The first step is to calculate the initial energy value. You do not need to calculate the denominator every time. You only need to update the energy in the loop (that is, add the header and remove the end)
/* Init the Energy Estimate */
PR = start-(word16) pitchmin + (word16) 1;
Acc1 = (word32) 0;
For (j = 0; j <2 * subfrlen; j ++)
Acc1 = l_mac (acc1, dpnt [pr + J], dpnt [pr + J]);
Add a header and end it to update the energy, that is, the denominator.
/* Energy update */
Acc1 = l_msu (acc1, dpnt [pr + 2 * subfrlen], dpnt [pr + 2 * subfrlen]);
Acc1 = l_mac (acc1, dpnt [pr], dpnt [pr]);
Computing self-correlation, which is also a part of molecules
/* Compute the cross */
Acc0 = (word32) 0;
For (j = 0; j <2 * subfrlen; j ++)
Acc0 = l_mac (acc0, dpnt [start + J], dpnt [pr + J]);
The following code is relatively difficult to understand,
Calculate the self-correlation square, obtain the numerator, and normalize it.
/* Compute exp and mant of the Cross */
Exp = norm_l (acc0); // LSC calculates the number of left shifts required by the normalized Numerator.
Acc0 = l_shl (acc0, exp); // LSC normalization molecule
Exp = SHL (exp, (word16) 1); // because of the square, the index is doubled ..
Cr = round (acc0 );
Acc0 = Rochelle mult (Cr, Cr); // The Square Value of LSC is calculated here.
Cr = norm_l (acc0); // The result obtained after the LSC is normalized again.
Acc0 = l_shl (acc0, Cr );
Exp = add (exp, Cr );
Cr = extract_h (acc0 );
Denominator, that is, energy Normalization
/* Do the same with energy * // note that the index symbol obtained after normalization is the opposite to the original value, so the maximum value is the minimum value.
Acc0 = acc1;
ENR = norm_l (acc0 );
Acc0 = l_shl (acc0, enr );
Division, corresponding to the exponent subtraction of the denominator
Exp = sub (exp, enr );
ENR = round (acc0 );
If the true value is greater than "1", after one right shift, it is normalized and the index is reduced accordingly (because it is left shift, remember this, otherwise you will think that the index should be increased by 1)
If (Cr> = ENR ){
Exp = sub (exp, (word16) 1 );
Cr = SHR (Cr, (word16) 1 );
}
Next is a piece of tedious comparison code. The author analyzed the following: The general idea is to compare the size.
// The LSC index is small, indicating that the value is large because the value is shifted left.
If (exp <= mxp ){
// The LSC is absolutely small. When the maximum auto-correlation value is saved and the corresponding index is greater than 1.25db, it should be 1.33 times larger than the maximum value.
If (exp + 1) <mxp) {// LSC, which is 4 times larger, directly retained the index
Indx = (word16) I;
Mxp = exp;
MCR = CR;
MNR = ENR;
Continue;
}
If (exp + 1) = mxp) // This is twice the size of LSC, which must be rounded to the same order of magnitude and shifted to the right
TMP = SHR (MCR, (word16) 1 );
Else
TMP = MCR; // The same as the LSC index, so you don't need to shift it. You can multiply it and subtract it to determine the size.
/* Compare with equal exponents */
Acc0 = l_mult (Cr, MNR );
Acc0 = l_msu (acc0, ENR, TMP );
If (acc0> (word32) 0 ){
If (word16) I-indx) <(word16) pitchmin) {// The LSC location difference is less than 18.
Indx = (word16) I;
Mxp = exp;
MCR = CR;
MNR = ENR;
}
Else {// if the location difference between LSC and else is greater than 18, you have to consider whether it is 1.33 times larger, but it seems to be 1.5 times larger...
Acc0 = l_mult (Cr, MNR );
Acc0 = l_negate (l_shr (acc0, (word16) 2 ));
Acc0 = l_mac (acc0, Cr, MNR );
Acc0 = l_msu (acc0, ENR, TMP );
If (acc0> (word32) 0 ){
Indx = (word16) I;
Mxp = exp;
MCR = CR;
MNR = ENR;
}
}
}
}
The final returned index value indx is the pitch Week.
Why? The pitch reflects the correlation of Speech data. In fact, the subsequent adaptive code book is searched based on the pitch,
Through the addition of the five excitation sources around the pitch cycle, the adaptive excitation is obtained. The author will analyze these in the next chapter, to be continued.
Lin shaochuan
In Hangzhou