Some scripting languages, such as awk and Lua, use floating point numbers to store integers. That is to say, the integers we use in the language are represented by floating point numbers in the language. We know that there are usually some errors in floating point operations, so can integers be accurately expressed by floating point numbers? The answer is yes, but not all integers in the range can be accurately expressed. Due to the precision of floating point numbers, the distribution of floating point numbers is uneven.
First, we will briefly introduce the composition of a common floating point number in the computer. The most common floating point number representation is the ieee754 standard. For this standard, refer to [1].
The above is a commonly used floating point number. Generally, the sign occupies 1 bit, while the single-precision floating point exponent occupies 8 bits, and the significand occupies 23 bits. Because ieee754 uses normalized floating-point numbers, the maximum number of tails must be 1, so there is no need to store them. Therefore, there is a hidden bit, so the real valid number of tails is 24 bits. For more information, see [1.
When floating point numbers are used to represent integers, we need to obtain continuous integer distribution. We certainly hope that the higher the accuracy of the ending number, the better. Obviously, there is a limit on the number of ending digits, which limits the maximum integer value that can be expressed. For a floating point number, Set P to the valid number of digits of the ending number (where P contains a hidden bit), then the maximum ending number is 1 + 1-2 ^ (-p + 1 ), therefore, the maximum integer is (2-2 ^ (-p + 1) * 2 ^ (p-1) = 2 ^ p-1. Since floating point numbers are signed, we naturally think that the Integer Range is [-2 ^ p-1, 2 ^ p-1]. However, we still omit two numbers, namely 2 ^ P and-2 ^ p. Let's look at the floating point representation of these two integers. After normalization, 2 ^ P
= (1.0000... 000) * 2 ^ P, where p is 0 after the decimal point in the tail part. However, we know that our ending number can only store one p-1, one of which is hidden bit, but unfortunately, because the last number is 0, it does not affect the actual value, SO 2 ^ P can be accurately expressed. The same is true for-2 ^ p.
Note that some readers may think that since 1. 00 .. 000*2 ^ P can be accurately expressed. Obviously, for a single-precision floating point number p = 24, the exponential representation range is larger than 24, which is 8-bit, that is, it can reach + 127. Why 2 ^ P is the maximum integer that can be expressed. Because we only care about the continuous Integer Range, 2 ^ (p + 1) can be accurately expressed, but [2 ^ P, 2 ^ (p + 1)] there are still a lot of numbers that cannot be expressed, because the ending number is not enough.
Therefore, for ieee754 Single-precision and double-precision floating-point numbers, the range of integers that can be precisely expressed is
Floating Point |
Range |
Single precision |
[-2 ^ 24, 2 ^ 24] |
Double Precision |
[-2 ^ 53, 2 ^ 53] |
BelowCodeWe tested the boundary conditions of the two floating point numbers (the runtime environment is vc9 ):
# Include <cassert> int main () {const int max_decimal = 16777216; // 2 ^ 24 assert (static_cast <int> (static_cast <float> (max_decimal )) = max_decimal); Assert (static_cast <int> (static_cast <float> (max_decimal + 1 ))! = Max_decimal + 1); Assert (static_cast <int> (static_cast <float> (-max_decimal) =-max_decimal ); assert (static_cast <int> (static_cast <float> (-max_decimal-1 ))! =-Max_decimal-1); const _ int64 max_decimald = 9007199254740992; // 2 ^ 53 assert (static_cast <__int64> (static_cast <double> (max_decimald )) = max_decimald); Assert (static_cast <__int64> (static_cast <double> (max_decimald + 1 ))! = Max_decimald + 1); Assert (static_cast <__int64> (static_cast <double> (-max_decimald) =-max_decimald ); assert (static_cast <__int64> (static_cast <double> (-max_decimald-1 ))! =-Max_decimald-1 );}
Edit: the Unit precision range should be 2 ^ 24.
reference:
[1]
what every computer scientist shold know about floating-point Arithmetic