Floating Point to fixed point operation

Source: Internet
Author: User
Tags integer numbers

One DSP Fixed-Point Arithmetic Operation
1-digit Calibration
In a fixed-point DSP chip, the number of fixed points is used for numerical calculation, and the operands are generally expressed by integer numbers. The maximum value of an integer is determined by the length of the DSP chip, which is generally 16 or 24 bits. Obviously, the longer the word length, the larger the range of numbers that can be expressed, the higher the accuracy. Unless otherwise stated, this book uses 16 characters as an example.
The number of DSP chips is expressed in the form of 2's complement. Each 16-digit value uses a single sign bit to indicate positive and negative numbers. 0 indicates positive values, and l indicates negative values. The remaining 15 digits indicate the value size. Therefore,
Binary Number 0010000000000011b = 8195
Binary Number 111111111100b =-4
For the DSP chip, the number involved in numerical calculation is the 16-bit integer number. However, in many cases, the number in the mathematical operation is not necessarily an integer. So how does the DSP chip process decimals? It should be said that the DSP chip itself is powerless. Does it mean that the DSP chip cannot process various decimals? Of course not. The key is that the programmer determines which of the 16 decimal places of a number. This is the number calibration.
By setting different decimal places in the 16-digit range, the decimal places of different sizes and different precision can be expressed. There are two types of data calibration: Q representation and s representation. Table 1.1 lists a 16-digit 16-digit Q representation, s representation, and the range of decimal values they can represent.
From table 1.1, we can see that if the decimal point is set to a different position, the number indicated by it is also different. For example,
Hexadecimal number: 2000 h = 8192, expressed as q0
Hexadecimal number 2000 h = 0.25, expressed in q15
However, for DSP chips, the processing method is the same.
From table 1.1, we can also see that different Q numbers have different ranges and accuracy. The larger the Q, the smaller the value range, but the higher the accuracy. On the contrary, the smaller the Q, the larger the value range, but the lower the accuracy. For example, the value range of q0 is 32768 to + 32767, and its precision is 1, while the value range of q15 is-1 to 0.9999695, and the accuracy is 1/32768 = 0.00003051. Therefore, for a fixed number of points, the value range and accuracy are a conflict. To express a large value range, a variable must sacrifice its precision, then the range of the number is reduced accordingly. In actual fixed-point algorithms, this must be fully taken into account to achieve optimal performance.
The Conversion Relationship between floating point numbers and fixed points can be expressed:
Convert a floating point number (X) to a fixed point number (XQ): XQ = (INT) x * 2q
Convert a fixed point (XQ) to a floating point (x): x = (float) XQ * 2-q
For example, if the floating point X = 0.5 and the calibration q = 15, the number of x q = l0.5 * 32768j = 16384. In the formula, LJ indicates the next integer. Conversely, a fixed number of points represented by Q = 15 is 16384, and its floating point number is 163 young * 2-15 = 16384/32768 = 0.5. When a floating point is converted to a fixed number of points, you can add 0.5 before the integer to reduce the tail truncation error.

Table 1.1 Q, S, and Numerical range
Q indicates the value of S in decimal format.
Q15 s0.15-1 ≤ x ≤ 0.9999695
Q14 s1.14-2 ≤ x ≤ 1.9999390
Q13 s2.13-4 ≤ x ≤ 3.9998779
Q12 s3.12-8 ≤ x ≤ 7.9997559
Q11 s4.11-16 ≤ x ≤ 15.9995117
Q10 s5.10-32 ≤x ≤ 31.9990234
Q9 s6.9-64 ≤ x ≤ 63.9980469
Q8 s7.8-128 ≤ x ≤ 127.9960938
Q7 S8.7-256 ≤ x ≤ 255.9921875
Q6 s9.6-512 ≤ x ≤ 511.9804375
Q5 s10.5-1024 ≤ x ≤ 1023.96875
Q4 s11.4-2048 ≤ x ≤ 2047.9375
Q3 s12.3-4096 ≤ x ≤ 4095.875
Q2 s13.2-8192 ≤ x ≤ 8191.75
Q1 s14.1-16384 ≤ x ≤ 16383.5
Q0 s15.0-32768 ≤ x ≤ 32767

2. Advanced Language: from floating point to fixed point
When writing DSP simulation algorithms, we generally use advanced languages (such as C) to compile simulation programs for convenience. The variables used in the program generally have both integer and floating point numbers. For example, in the 1.1 program, the variable I is an integer, the PI is a floating point, and the hamwindow is a floating point group.
Example 1.1 Hamming window computing
Int I; +
Float Pi = 3.14l59;
Float hamwindow [256];
For (I = 0; I <256; I ++) hamwindow [I] = 0.54-0.46 * Cos (2.0 * pI * I/255 );
If we want to implement the above program with a certain DSP chip, we need to rewrite the above program to the Assembly Language Program of the DSP chip. For the convenience of DSP program debugging and the algorithm performance when simulating fixed-point DSP Implementation, before writing a DSP assembly program, you generally need to rewrite the floating-point algorithm in advanced language to a fixed-point algorithm in advanced language. Next we will discuss the implementation of the fixed point for basic arithmetic operations.
2.1 addition/subtraction operations
The expression used to set the floating-point addition operation is:
Float x, y, z;
Z = x + y;
When converting floating-point addition/subtraction to fixed-point addition/subtraction, the most important thing is to ensure the calibration of the two operands.
Temp = x + temp;
Z = temp> (QX-qz), if QX> = qz
Z = temp <(qz-QX), if QX <= qz
In Example 1.4, if the result is greater than 16 bits
If X = l5000 and Y = 20000 is set, the floating point value is Z = x + y = 35000. Obviously, z> 32767.
QX = 1, Qy = 0, qz = 0, then the fixed-point addition is:
X = 30000; y = 20000;
Temp = 20000 <1 = 40000;
Temp = temp + x = 40000 + 30000 = 70000;
Z = 70000l> 1 = 35000;
Because the Q value of Z is 0, the fixed point value z = 35000 is a floating point value. Here Z is a long integer. When the result of addition or addition exceeds the 16-bit expression range, if the programmer knows this situation in advance and needs to maintain the accuracy of the operation, the 32-bit result must be maintained. If the program performs 16-digit computation, exceeding 16 bits actually results in overflow. If appropriate measures are not taken, data overflow will cause serious deterioration of the computing accuracy. Generally, the fixed-point DSP chip does not have the overflow protection function. When the overflow protection function is effective, once the overflow occurs, the ACC result of The accumulators is the largest saturation value (overflow is 7 fffh, overflow is 8001 H), so as to prevent serious deterioration of precision caused by overflow.
2.2 fixed-point simulation of multiplication in C Language
Set the floating-point multiplication expression:
Float x, y, z;
Z = xy;
Assuming that the calibration value of X is QX, Y is QY, and Z is qz
Z = xy
ZQ * 2-qx = XQ * yq * 2-(qx + Qy)
ZQ = (xqyq) 2qz-(qx + Qy)
Therefore, the multiplication indicated by the fixed point is:
Int x, y, z;
Long temp;
Temp = (long) X;
Z = (temp * Y)> (qx + Qy-qz );
Example 1.5 fixed-point multiplication.
If X = 18.4 and Y = 36.8, the floating point value is = 18.4*36.8 = 677.12;
According to the above section, QX = 10, Qy = 9, qz = 5, so
X = 18841; y = 18841;
Temp = 18841l;
Z = (18841l * 18841) >>( 10 + 9-5) = 354983281l >>14 = 21666;
Because the Z calibration value is 5, the fixed point Z = 21666, that is, the floating point Z = 21666/32 = 677.08.
2.3 Division operations
The expression used to set the floating-point division operation is:
Float x, y, z;
Z = x/y;
Assume that after statistics, the calibration value of divisor X is QX, the calibration value of divisor y is QY, and the calibration value of quotient Z is qz.
Z = x/y
ZQ * 2-qz = (XQ * 2-qx)/(yq * 2-qy)
ZQ = (XQ * 2 (qz-qx + Qy)/yq
Therefore, the Division expressed by the fixed point is:
Int x, y, z;
Long temp;
Temp = (long) X;
Z = (temp <(qz-qx + Qy)/y;
Example 1.6 point division.
Set x = 18.4, y = 36.8, and the floating point value to Z = x/y = 18.4/36.8 = 0.5;
According to the above section, QX = 10, Qy = 9, qz = 15;
Z = 18841, y = 18841;
Temp = (long) 18841;
Z = (18841l <(15-10 + 9)/18841 = 3o8690944l/18841 = 16384;
Because the calibration value of commercial Z is 15, the fixed point Z = 16384, that is, floating point Z = 16384/215 = 0.5.
2.4 determine the Q value of the program variable
In the examples described in the previous sections, since the values of X, Y, and Z are all known, the Q value is well determined when the floating point is changed to a fixed point. In the actual DSP application, all variables involved in the calculation are involved in the program. How can we determine the Q value of the variables in the floating point program? From the previous analysis, we can know that determining the Q value of a variable is actually determining the dynamic range of the variable. If the dynamic range is determined, the Q value is also determined.
Set the maximum value of the absolute value of the variable to max. Note that Max must be less than or equal to 32767. Take an integer n to satisfy
2n-1 <max <2N
Then there is
2-q = 2-15 * 2n = 2-(15-n)
Q = 15-n
For example, the value of a variable ranges from-1 to + 1, that is, Max <1, so n = 0, q = 15-n = 15.
Since the max of the variable can determine its Q value, how can the max of the variable be determined? Generally, there are two methods to determine the max of a variable. One is theoretical analysis, and the other is statistical analysis.
1. Theoretical Analysis
The dynamic range of some variables can be determined through theoretical analysis. For example:
(1) trigonometric function. Y = sin (x) or Y = cos (x), which is known by trigonometric function knowledge. Y <= 1.
(2) Hamming window. Y (n) = 0.54 A 0.46cos [N π N/(N-1)], 0 <= n <= N-1. Because-1 <= cos [2 π N/(N-1)] <= 1, so 0.08 <= Y (n) <= 1.0.
(3) FIR convolution. Y (n) = Σ H (k) x (n-k), where Σ H (K) = 1.0, and x (n) is the 12-bit quantization value of the analog signal, if there is x (n) <= 211, then Y (n) <= 211.
(4) the theory has proved that in the programming of Self-correlation linear prediction coding (LPC), the reflection coefficient Ki satisfies the following inequality: Ki <1.0, I = ,..., P and P are the order of LPC.
2. Statistical analysis
For variables that cannot determine the range theoretically, statistical analysis is generally used to determine the dynamic range. The so-called statistical analysis is to use enough input signal sample values to determine the dynamic range of variables in the program. Here, the input signal must have a certain number, and on the other hand, it must involve various situations as much as possible. For example, in voice signal analysis, a sufficient amount of voice signal sample values must be collected during statistical analysis. In addition, the collected voice sample values should include situations as much as possible. Such as the volume and type of voice (male and female voice ). Only in this way can the statistical results be typical.
Of course, statistical analysis cannot cover all possible situations after all. Therefore, some protective measures can be taken for the statistical results during program design, such as sacrificing some precision, the Q value is slightly higher than the statistical value, and the overflow protection function provided by DSP chip is used.
2.5 example of a C program for floating point to fixed point transformation
In this section, we use an example to illustrate how C programs transform from floating point to fixed point. This is a voice signal (0.3 ~ 3.4 kHz) for low-pass filtering C Language Program, the low-pass filtering cutoff frequency is 800Hz, the filter uses 19 points of limited impact response FIR filter. The sampling frequency of the voice signal is 8 kHz. Each voice sample value is stored in the INsp. dat file in a 16-bit integer value.
Example 1.7 voice signal 800Hz 19-point FIR low-pass filter C language floating point program.
# Include <stdio. h>
Const int length = 180
Void filter (INT Xin [], int xout [], int N, float H []);

Static float H [19] =
{0.01218354,-0.009012882,-0.02881839,-0.04743239,-0.04584568,
-0.008692503, 0.06446265, 0.1544655, 0.2289794, 0.257883,
0.2289794, 0.1544655, 0.06446265,-0.008692503,-0.04584568,
-0.04743239,-0.02881839,-0.009012882, o.01218354 };
Static int XL [Length + 20];

Void filter (INT Xin [], int xout [], int N, float H [])
{
Int I, J;
Float sum;
For (I = 0; I <length; I ++) x1 [n + I-1] = Xin [I];
For (I = 0; I <length; I ++)
{
Sums = 0.0;
For (j = 0; j <n; j ++) sum + = H [J] * X1 [I-j + n-1];
Xout [I] = (INT) sum;
For (I = 0; I <(N-l); I ++) x1 [n-i-2] = Xin [length-1-i];
}

Void main ()
File * FP1, * fp2;
Int frame, indata [length], outdata [length];
FP1 = fopen (INsp. dat, "rb ");
Fp2 = fopen (outsp. dat, "WB ");
Frame = 0;
While (feof (FP1) = 0)
{
Frame ++;
Printf ("frame = % d \ n", frame );
For (I = 0; I <length; I ++) indata [I] = getw (FP1 );
Filter (indata, outdata, 19, H );
For (I = 0; I <length; I ++) putw (outdata [I], fp2 );
}
Fcloseall ();
Return (0 );
}
Example 1.8 voice signal 800Hz point FIR low-pass filtering C language fixed point program.
# Include <stdio. h>
Const int length = 180;
Void filter (INT Xin [], int xout [], int N, int H []);
Static int H [19] = {399,-296,-945,-1555,-1503,-285, 2112, 5061,7503, 8450,
7503,5061, 2112,-285,-1503,-1555,-945,-296,399 };
Static int X1 [Length + 20];

Void filter (INT Xin [], int xout [], int N, int H [])
Int I, J;
Long sum;
For (I = 0; I <length; I ++) x1 [n + i-111 = Xin] [I];
For (I = 0; I <1 ength; I ++)
Sum = 0;
For (j = 0; j <n; j ++) sum + = (long) H [J] * X1 [I-j + n-1];
Xout [I] = sum> 15;
For (I = 0; I <(n-1); I ++) x1 [n-i-2] = Xin [length-i-1];
}
The main program is exactly the same as the floating point program. "
3 DSP Fixed-Point Arithmetic Operation
The numeric representation of the fixed-point DSP chip is based on the 2's complement representation. Each 16-digit value is represented by an l sign bit, an I integer, and 15-I decimal places. Therefore:
00000010.10100000
The value is:
21 + 2-1 + 2-3 = 2.625
This number can be expressed in Q8 format (8 decimal places). The value range is-128 to + l27.996, And the decimal precision of a Q8 point is 1/256 = 0.004.
Although hybrid notation is required in special cases (such as dynamic range and accuracy requirements. However, generally all decimal points in q15 format or integers in q0 format are used. This is especially true for signal processing algorithms that mainly involve multiplication and accumulation. decimals are multiplied by decimal places, and integers are multiplied by integers. Of course, overflow may occur when the product is accumulated. In this case, the programmer should understand the physical process in mathematics to pay attention to the possible overflow. Next we will discuss the fixed-point operations of the DSP for multiplication, addition, and division. The assembler takes TMS320C25 as an example.
3.1 fixed-point Multiplication
When two fixed points are multiplied, they can be divided into the following three conditions:
1. decimal place
Example 1.9 q15 * q15 = Q30
0.5*0.5 = 0.25
0.100000000000000; q15
* 0.100000000000000; q15
--------------------------------------------
00.010000000000000000000000000000 = 0.25; Q30
After the decimal digits of two q15 values are multiplied, a decimal number of Q30 is obtained, that is, two signed digits. Generally, the number of full precision obtained after multiplication does not have to be fully retained, but only 16-bit single precision is retained. Because of the high 16-bit less than 15-bit small data degree obtained after multiplication, in order to reach the 15-bit precision, the product can be moved to the left. below is the TMS320C25 program of the above multiplication:
Lt OP1; OP1 = 4000 H (0.5/q15)
Mpy OP2; OP2 = 4000 H (0.5/ql5)
Pac
SACH ans, 1; ans = 2000 h (0.25/q15)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.