We are currently working on a variety of Speech Codec Algorithm Learning and related Code Through profile analysis, it is found that the main calculation amount is the most time-consuming than some basic data operations.
Taking the speech synthesis filtering operation in amr_nb codec is quite time-consuming. Below is a short section Program Analyze the calculation workload:
Static word32 syn_filt (word32 A [], word32 X [], word32 y [], word32 LG, word32 mem []
, Word32 update)
{/* Synthesize the filter to reconstruct the speech */
Word32 TMP [50];/* malloc is slow */
Word32 S, A0, overflow = 0;
Word32 * YY, * yy_limit;
/* Copy mem [] to YY [] */
Memcpy (TMP, mem, 40 );
YY = TMP + m;
Yy_limit = YY + LG;
A0 = A [0];
/* Do the filtering .*/
While (yy <yy_limit)
{
S = * x ++ * a0;
S-= YY [-1] * A [1];
S-= YY [-2] * A [2];
S-= YY [-3] * A [3];
S-= YY [-4] * A [4];
S-= YY [-5] * A [5];
S-= YY [-6] * A [6];
S-= YY [-7] * A [7];
S-= YY [-8] * A [8];
S-= YY [-9] * A [9];
S-= YY [-10] * A [10];
If (labs (s) <0x7ffffff)
* YY = (S + 0x800l)> 12;
else if (S> 0)
{< br> * YY = 32767;
overflow = 1;
}< br> else
{< br> * YY =-32768;
overflow = 1;
}< br> YY ++;
}< br> memcpy (Y, & TMP [m], LG <2);
/* Update of memory if update = 1 */
If (update)
{
Memcpy (MEm, & Y [lg-M], 40 );
}
Return overflow;
}
Amr one frame divide into four subframe, each subframe have 40 samples, that is, lg = 40. the above procedures have been simplified (to prevent data saturation overflow, the above operations should be saturated), but after the simplification, there is still a lot of computing.
Multiplication of one frame: 4*40*11 = 1760 times. In this case, all methods perform the saturation protection operation Rochelle Mul, which will increase the workload.