floating point numberrefers to data that is floating in the decimal place. Typically expressed as:N = M. R^e (ex: 0.10111x2110)where n is a floating-point number, M is the mantissa (mantissa), E is the order code (exponent), and R is the cardinality of the Order (radix)R is a constant, generally 2,8,16. On a single computer, all data R is the same. Therefore, it is not necessary to represent it in each data.
the representation of a floating-point number:
The MS is the sign bit of the mantissa, 0 indicates positive, and 1 is negative. e is the order code, the integer n+1 bit, and the 1 bit sign bit. M is the mantissa, M bit. The mantissa is usually normalized (
that is, the mantissa is given in decimal fraction form,
and the absolute value of the mantissa should be greater than
1/r
, i.e. the first digit after the decimal point is not
0. ) indicates that the decimal point cannot be 0. Example: X = +0.0010111=0.10111x2^ ( -2) = 0.10111x2^ ( -0010) = 0.10111x2^ (1110)
Normalization
: in
order to represent the most valid data bits in the Mantissa and to have a unique representation of the floating-point number, the
mantissa should be normalized, i.e.
The mantissa is
given in decimal fraction form, and the absolute value of the mantissa should be greater than 1/r, i.e. the first digit after the decimal point is
not 0.
for the mantissa indicated by the complement, its normalized value satisfies m≤-1/r or
m≥1/r. Data that does not conform to this specification can be adapted to this format by modifying the order and moving the tail number at the same time
.
normalized after
the positive mantissa in the form of:
0.1XXX
...
X Negative mantissa
in the form of:
1.0XXX ...
X
It should be noted that when M=-1/2, for the original code, is normalized number, and for the complement, is not normalized number.
Example: Normalize the 0.0011 and 0.0011 representations. Solution: 0.0011=0.0011x2^0=0.1100x2^ (0-2)[-0.0011] complement =1.1101-The 0.0011 floating point representation is:
Example: in floating-point representation, e=4, [M]
complement
=0.1000b, respectively, the
size of the value represented when r=2 or r=16 is calculated? Solution: When r=2:n=mxr^e=0.1000bx2^4=0.5x2^4=8when r=16:n=mxr^e=0.1000bx16^4=0.5x16^4=32768when the mantissa of a floating-point number is 0 (regardless of the order value) or the value of the order is too small (the value of the order is less than the minimum value that can be represented in the machine), the floating-point number is treated as a 0 value, called
machine 0 , when it exceeds the representation range.
IEEE754 International standards, floating-point numbers are commonly used in two formats:single-precision floating-point number 32-bit, 8-bit code, mantissa 24-bit. double-precision floating-point number 64-bit, 11-bit code, mantissa 53-bit. in most computers, the mantissa of floating-point numbers is in complement, and the order code is expressed in complement or shift code .
Example
3.30 x=+1011 [x] Complement =01011 [x] Shift =11011x=-1011 [x] Complement =10101 [x] Move =00101
Shift Code features :1. The highest bit is the sign bit, 1 means positive, and 0 is negative. 2. The order code only performs the addition and subtraction operation, the result of the operation is corrected (+2N), and the symbol bit is reversed.3.0 has a unique encoding, i.e. [+0] shift =[-0] Move =1000 ... XXwhen the order code
≤–2n, called the Machine Zero, the order code is placed 000 ... 000, regardless of the mantissa size, is the floating point overflow processing.
the reason for the order code of floating-point number using the moving code•
Easy to compare the size of floating-point numbers. The order code is large, its corresponding
truth value is big, the order code is small, the corresponding truth is small. •
simplify the 0 circuit in the machine. When the order code is all
0
and the mantissa is
all
0
, the machine zero is indicated. the range of 32-bit fixed-point integers (complement) is: –2^31~ +2^31-1, with a precision of 31 bits. 32-bit floating-point number range: –2^127~ + (1-2-23) 2^127 with 24-bit accuracy.
Counting groups _ floating-point numbers