The range of integers that the floating point can accurately represent.

Last Update:2018-12-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Some scripting languages, such as awk and Lua, use floating point numbers to store integers. That is to say, the integers we use in the language are represented by floating point numbers in the language. We know that there are usually some errors in floating point operations, so can integers be accurately expressed by floating point numbers? The answer is yes, but not all integers in the range can be accurately expressed. Due to the precision of floating point numbers, the distribution of floating point numbers is uneven.

First, we will briefly introduce the composition of a common floating point number in the computer. The most common floating point number representation is the ieee754 standard. For this standard, refer to [1].

The above is a commonly used floating point number. Generally, the sign occupies 1 bit, while the single-precision floating point exponent occupies 8 bits, and the significand occupies 23 bits. Because ieee754 uses normalized floating-point numbers, the maximum number of tails must be 1, so there is no need to store them. Therefore, there is a hidden bit, so the real valid number of tails is 24 bits. For more information, see [1.

When floating point numbers are used to represent integers, we need to obtain continuous integer distribution. We certainly hope that the higher the accuracy of the ending number, the better. Obviously, there is a limit on the number of ending digits, which limits the maximum integer value that can be expressed. For a floating point number, Set P to the valid number of digits of the ending number (where P contains a hidden bit), then the maximum ending number is 1 + 1-2 ^ (-p + 1 ), therefore, the maximum integer is (2-2 ^ (-p + 1) * 2 ^ (p-1) = 2 ^ p-1. Since floating point numbers are signed, we naturally think that the Integer Range is [-2 ^ p-1, 2 ^ p-1]. However, we still omit two numbers, namely 2 ^ P and-2 ^ p. Let's look at the floating point representation of these two integers. After normalization, 2 ^ P
= (1.0000... 000) * 2 ^ P, where p is 0 after the decimal point in the tail part. However, we know that our ending number can only store one p-1, one of which is hidden bit, but unfortunately, because the last number is 0, it does not affect the actual value, SO 2 ^ P can be accurately expressed. The same is true for-2 ^ p.

Note that some readers may think that since 1. 00 .. 000*2 ^ P can be accurately expressed. Obviously, for a single-precision floating point number p = 24, the exponential representation range is larger than 24, which is 8-bit, that is, it can reach + 127. Why 2 ^ P is the maximum integer that can be expressed. Because we only care about the continuous Integer Range, 2 ^ (p + 1) can be accurately expressed, but [2 ^ P, 2 ^ (p + 1)] there are still a lot of numbers that cannot be expressed, because the ending number is not enough.

Therefore, for ieee754 Single-precision and double-precision floating-point numbers, the range of integers that can be precisely expressed is

Floating Point	Range
Single precision	[-2 ^ 24, 2 ^ 24]
Double Precision	[-2 ^ 53, 2 ^ 53]

BelowCodeWe tested the boundary conditions of the two floating point numbers (the runtime environment is vc9 ):

# Include <cassert> int main () {const int max_decimal = 16777216; // 2 ^ 24 assert (static_cast <int> (static_cast <float> (max_decimal )) = max_decimal); Assert (static_cast <int> (static_cast <float> (max_decimal + 1 ))! = Max_decimal + 1); Assert (static_cast <int> (static_cast <float> (-max_decimal) =-max_decimal ); assert (static_cast <int> (static_cast <float> (-max_decimal-1 ))! =-Max_decimal-1); const _ int64 max_decimald = 9007199254740992; // 2 ^ 53 assert (static_cast <__int64> (static_cast <double> (max_decimald )) = max_decimald); Assert (static_cast <__int64> (static_cast <double> (max_decimald + 1 ))! = Max_decimald + 1); Assert (static_cast <__int64> (static_cast <double> (-max_decimald) =-max_decimald ); assert (static_cast <__int64> (static_cast <double> (-max_decimald-1 ))! =-Max_decimald-1 );}
 
 
Edit: the Unit precision range should be 2 ^ 24.
 
 reference: 
 [1] 
 what every computer scientist shold know about floating-point Arithmetic

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The range of integers that the floating point can accurately represent.

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

The range of integers that the floating point can accurately represent.

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support