Storage of float and double data on computers

Source: Internet
Author: User
Tags float range

1. Scope

The float and double ranges are determined by the number of digits of the index.
The float index has eight digits, while the double index has 11 digits. The distribution is as follows:
Float:
1bit (symbol bit) 8 bits (index bit) 23 bits (tail bit)
Double:
1 bit (symbol bit) 11 bits (index bit) 52 bits (tail digit), as shown in.


Therefore, the float index range is-127 ~ + 128, while the double index range is-1023 ~ + 1024, and the index bit is divided by complement code.
The negative index determines the smallest non-zero number of absolute values that floating point numbers can express. The positive index determines the maximum number of absolute values that floating point numbers can express, that is, the value range of floating point numbers.
Float range:-2 ^ 128 ~ + 2 ^ 128, that is,-3.40e + 38 ~ + 3.40e + 38; the double value range is-2 ^ 1024 ~ + 2 ^ 1024, that is,-1.79e + 308 ~ + 1.79e + 308.

2. Precision
The precision of float and double is determined by the number of digits of the ending number. Floating point numbers are stored in the memory in scientific notation, And the integer part is always an implicit "1" (which will be discussed later) because it remains unchanged, therefore, precision cannot be affected.
Float: 2 ^ 23 = 8388608, a total of seven digits, which means that there can be a maximum of seven valid digits, but it is absolutely guaranteed to be 6 digits, that is, the float precision is 6 ~ A 7-digit valid number. If you do not understand it here, you can refer to the description here: Assume that there is a decimal number of 0.000001, which is a 6-digit number. If it is converted to a binary number, it must be multiplied by two to get an integer, set 0. for the time being, I will not look at it. I will only look at 000001. What is the number of 2 that can reach the length of 6 digits? The answer is that at 19 times, it can reach 6 digits, and the size is 524288. Obviously, it does not contain all 6 digits. Therefore, when it reaches 20 2 digits, it will reach 1048576 seven digits, obviously, it can contain all the six digits. Therefore, it is absolutely accurate to the last six digits after the decimal point. The ending part of float is 23 digits, 2 ^ 23 = 8388608, there are still no more than 7 digits, so there are a maximum of 7 valid digits. Double.

Double: 2 ^ 52 = 4503599627370496, a total of 16 digits. Similarly, the precision of double is 15 ~ 16-bit

The following uses float as an example to describe the conversion principle between a decimal float and a binary float.

Floating Point variables occupy 4 bytes (in bytes) in the computer memory, that is, 32-bit. Complies with IEEE-754 format standards.
A floating point number consists of two parts: base number M and exponent e.
Mantissa and exponent in ± mantissa × 2 exponent formulas are represented in binary format.

The base part uses a binary number to represent the actual value of the floating point number.
The index occupies 8-bit binary data, which indicates that the value range is 0-255.

However, the index should be positive and negative, so IEEE stipulates that the power calculated here must be less than 127, which is the real index. So the float index can range from-127 to 128.

The base part actually occupies a value of 24-bit. Because the maximum bit is always 1, the maximum bit is not stored, and only 23-bit is stored. Why is the highest bit always 1? See the description here:

{

 The number float 9.125 is represented as 9.125*10 ^ 0 in decimal notation.But in a computer, the computer only knows 0 and 1, so in the computer, it is represented by the binary method of Scientific Computing:

The binary value of 9 is 1001

The binary value of 0.125 is 0.001.

Therefore, 91.25 represents 1001.001.The scientific counting method used to represent binary data is 1.001001*2 ^ 3 (decimal point shifted to three places, as described later)

In a computer, any number can be expressed as 1. xxxxxx * 2 ^ n,

WhereXXXXXIndicatesTail part, N indicates the exponential part.

Because the highest bit of Orange 1 here, because any number represents 1 in this form, it is actually not saved during storage, this allows the 23-bit ending number of float to indicate the precision of 24bit, while the 52-bit ending number in double can express the precision of 53bit.

}

So far, the base part 23 digits plus the index Part 8 digits use 31 digits. As mentioned above, float occupies 4 bytes, namely 32-bit. Why is there another one? Another digit is the highest bit in four bytes, which is used to indicate the positive and negative values of floating point numbers. When the highest bit is 1, it is a negative number, and the highest bit is 0, it is a positive number.


Floating point data is stored in four bytes in the following table format:
Address + 0 address + 1 address + 2 address + 3
Float data storage format in the computer: (float in the figure below)

Seee eeee emmm mmmm mm

S: Symbol bit, indicating positive and negative floating point numbers, 1 is negative, 0 is positive
E: exponential bit. The index is added with the binary number of the value after 127.
M: The ending number, the base number of 24-bit (only 23-bit)

Idea: Here is a special case. When the floating point number is 0, the index and the base number are both 0, but the previous formula is not true.

Because 0 of 2 is 1, 0 is a special case. Of course, this special case does not need to be considered as interference, and the compiler will automatically identify it.

Through the above format, we will take an example to see the specific data stored in the computer-12.5:
Address + 0 address + 1 address + 2 address + 3
0xc1 0x48 0x00 0x00

Next, let's verify whether the above data represents-12.5, and then let's take a look at its conversion process.
Since floating point numbers are not stored in a direct format, they are composed of several parts. To convert floating point numbers, you must first separate the values of each part.

Address + 0 address + 1 address + 2 address + 3
Format: seeeeeeeee emmmmmmm Mmmmmmmm
Binary 11000001 01001000 00000000 00000000
Hexadecimal C1 48 00 00

Visible:
S: 1, which is a negative number.
E: Convert 10000010 to 10: 130,130-127 = 3, that is, the actual index is 3.
M: It is 10010000000000000000000. Here, a 1 is omitted on the left of the base number, and the actual base number is expressed as 1.10010000000000000000000.
At this point, the values of the three parts are merged. Now, we adjust the m value of the base part by exponent part E.

The adjustment method is as follows:

If the index e is negative, the decimal point of the base number shifts to the left. If the index e is positive, the decimal point of the base number shifts to the right. The number of digits that move the decimal point is determined by the absolute value of E.
Here, e is 3, and the Right Shift 3 is used as the result: 1100.10000000000000000000
The result is a binary floating point number of 12.5. If you convert it to a decimal number, 12.5 is displayed,

For how to convert, see the following:

1100 on the left of the decimal point is (1 × 2 ^ 3) + (1 × 2 ^ 2) + (0 × 2 ^ 1) + (0 × 2 ^ 0 ), the result is 12.
The right decimal point. 100... It is expressed as (1 × 2 ^-1) + (0 × 2 ^-2) + (0 × 2 ^-3) +... and the result is. 5.
The sum of the preceding two values is 12.5. Because S is 1, the negative value is used, that is,-12.5.
Therefore, the hexadecimal 0xc1480000 is a floating point-12.5.

The preceding figure shows how to convert the binary number in computer storage to the actual floating point number. The following shows how to replace a floating point number with the binary number in the computer storage format.
For example, convert 17.625 to float type.
First, convert 17.625 to a binary bit: 10001.101 (0.625 = 0.5 + 0.125, 0.5 is 1/2, 0.125 is 1/8)

Then shift the decimal point of 10001.101 to the left until only one digit before the decimal point is 1.0001101x2 to the 4 power (because the four digits are shifted to the right ). In this case, the base number m and the index e come out:
The base number is M. Because it must be 1 before the decimal point, IEEE requires that only the decimal point be recorded. Therefore, the base number is 0001101.
Exponential part E, which is actually 4, but must be added to 127, fixed to 131, that is, the binary number 10000011
Symbol part S. Because it is a positive number, S is 0.
To sum up, 17.625 of the float storage formats are:
0 10000011 00011010000000000000000
Convert to hexadecimal: 0x41 8d 00 00
Therefore, it still occupies 4 bytes.

In fact, x86 computers use small-end storage, that is, low-address storage of low-level data, and high-address storage of high-level data.

(10 should be 8d. The copied graph is not changed)

 

For details about this post, refer to the three posts, where the error details have been corrected. If this post is incorrect, correct it.

Http://blog.sina.com.cn/s/blog_8a18c33d01013bke.html

Http://blog.csdn.net/guqsir/article/details/7015267

Http://www.cnblogs.com/BradMiller/archive/2010/11/25/1887945.html

 

 

 

 

 

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.