The storage format of floating point numbers in memory

Source: Internet
Author: User
[CPP] View plain copy #include <iostream> using namespace std; int main (int argc, char* argv[]) {Float A = 1.0f;//floating-point number in memory is a cout<< (int&) saved by symbol + exponent + mantissa A<<endl       ; The//1.0f in memory is 0x3f800000, and the sizeof (int) byte starting with a address as an int type output 106535216 int b = 0x3f800000;   cout<<b<<endl;//106535216 cout<< (int) a<<endl;//1 return 0; In the example above: a floating-point form of 1.0, which is stored in memory:

0011 1111 1000 0000 0000 0000 0000 0000

Symbol part: 0 (pink background);

Index part: 127+0=127 (yellow background)

Base part: 0 (blue background)

Convert to decimal is: 106535216

--------------------------------------------------------------------------------------------------------------- -----------------------

Floating point numbers include float and double two types, float takes 32 digits, and double is 64. The binary storage format follows the IEEE754 standard. Take float as an example:

Sign bit: Positive number is 0, negative number is 1

Take float data 123.456 as an example to analyze the binary storage format:

First converts the decimal number 123.456 to the binary number: 1111011.01110100101111001

(0.456 How to convert to binary.) Constantly multiply by 2, sorted in order

such as: 0.734375 to binary system, the result is 101111.

0.734375 x 2 = 1.46875
0.46875 x 2 = 0.9375
0.9375 x 2 = 1.875
0.875 x 2 = 1.75
0.75 x 2 = 1.5
0.5 x 2 = 1.0)

1111011.01110100101111001 or 1. 11101101110100101111001 Times 2 of the 6 square

First, this is a positive number, then the sign bit is 0,

The order is 6, but it is converted to move code.

(How to ask for a 6 move code.) Here I do not too dig, I see everyone is direct 6+127=133, for 2 into the system to 10000101)

(The relationship between the code and the complement: [X] The relationship to the [x] complement is that the sign bit is the opposite of each other (only the sign bit is different))

The mantissa is 1. 11101101110100101111001 of the decimal part, i.e.

11101101110100101111001

In summary: 123.456 of the binary storage format is: 01000010111101101110100101111001


-------------------------------------------------The following describes the storage------------------------------------------------------of floating-point numbers

Floating point numbers:
A floating-point variable occupies 4 bytes (byte) in the computer's memory, that is, 32-bit. Follow the IEEE-754 format standard. A floating-point number consists of 2 parts: base m and index E.

±mantissax2exponent
(Note that the mantissa and exponent in the formula use binary notation)
The base section uses a 2-digit number to represent the actual value of the floating-point numbers.
The exponent part occupies the binary number of the 8-bit, which indicates that the range of values is 0-255.
The index should be positive, so the IEEE stipulates that the second party to be calculated here minus 127 is the real index. So the float index can be from 126 to 128.
The base portion is actually a value that occupies 24-bit, and because its highest bit is always 1, the highest bit is omitted from storage and only 23-bit in storage.
So far, the base portion of 23 bits plus the exponent portion of 8 bits uses 31 bits. So, as I said before, float is 4 bytes or 32-bit, so what's another one for? Another one, in fact, is the highest bit in 4 bytes to indicate the positive or negative of the floating-point number, when the highest bit is 1 o'clock, negative, and the highest is 0 o'clock, a positive number.

Floating-point data is stored in 4 bytes in the format of the following table:

Address+0 address+1 address+2 address+3

Contents seee eeee emmm MMMM MMMM MMMM MMMM MMMM
S: Floating-point number is positive or negative, 1 is negative, 0 is positive
E: Index plus the binary number of the value after 127
M:24-bit Base (store only 23-bit)

Note: There is a special case where the floating-point number is 0 o'clock, the exponent and base are 0, but the previous formula is not valid. Because 2 of the 0 Times is 1, 0 is a special case. Of course, this special case does not need to be disturbed, the compiler will automatically identify.

Example 1: How the binary number in the computer store is converted to the actual floating-point numbers
In the format above, we'll take a look at 12.5 specific data stored on your computer:
Address+0 address+1 address+2 address+3

Contents 0xC1 0x48 0x00 0x00
Next we verify that the data above indicates whether it is 12.5 or not, so we also look at its conversion process.

Because the floating-point number is not stored in direct format, he has several parts, so to convert the floating-point number, we first need to separate the values of the parts.

Address+0 address+1 address+2 address+3

Format seeeeeee emmmmmmm mmmmmmmm mmmmmmmm

Binary 11000001 01001000 00000000 00000000

16 C1 48 00 00

Visible:

S: 1, is a negative number.

E: Convert 10000010 to 10 to 130,130-127=3, i.e. the actual exponent portion is 3.

M: For 10010000000000000000000. Here, the left side of the base is omitted to store a 1, using the actual base representation of 1.10010000000000000000000

So we're going to pick up the three-part value, and now we'll adjust the value of the base part m by the value of the exponent part E. The adjustment method is: If index e is negative, the decimal point of the base shifts to the left, and if Index e is positive, the decimal point of the base shifts to the right. The number of digits moved by the decimal point is determined by the absolute value of index E.

Here, E is positive 3, using the right to move 3 to get: 1100.10000000000000000000 to times, this result is 12.5 binary floating-point number, convert him to 10 to see 12.5, how to convert, look at the following:

The 1100 to the left of the decimal point is (1x23) + (1x22) + (0x21) + (0x20) with a result of 12.

To the right of the decimal point. 100. expressed as (1x2-1) + (0x2-2) + (0x2-3) + ..., the result is. 5.

The above two values and 12.5, because S is 1, using a negative number, that is, 12.5.

So, the 16 binary 0xc1480000 is a floating-point number-12.5.

Example 2: Floating point numbers are converted to binary numbers in the computer storage format.
For example, 17.625 is converted to float type.
First, convert 17.625 to bits: 10001.101 (0.625 = 0.5+0.125, 0.5, 1/2, 0.125, 1/8, if you don't change the decimal part to binary, refer to other books)
Then move the 10001.101 to the left until the decimal point is left with only one digit of 1.0001101 x 2 (4 digits left). Now our base m and index e are out:

Base part m, because it must be 1 before the decimal point, so the IEEE stipulates that only the decimal point is good, so the base is 0001101.
Index part E, actual is 4, but must add 127, solid is 131, namely binary number 10000011
The symbol part s, because it is a positive number, so S is 0.
To sum up, 17.625 of the float storage format is:

0 10000011 00011010000000000000000

Convert to 16:0x41 8D 00 00

So, at first glance, float still takes up 4 bytes.

****************************************************************

Double in memory, double is 8 bytes 64 bits, where the highest 63 bits are sign bits, 1 indicates that the number is negative, 0 is positive; 62-52, a total of 11 digits are digits, 51-0 digits, and 52 digits is the tail digits.

Example 3: In accordance with IEEE floating-point notation, the following converts the double floating-point number 38414.4 to hexadecimal code.

The integer and the decimal parts are treated separately: the whole number of parts is directly hexadecimal: 960E. Processing of decimal places:
0.4=0.5*0+0.25*1+0.125*1+0.0625*0+ ...
Actually, it's never going to end. This is the famous problem of floating-point precision. So it's OK to add up to 53 digits in the previous integer part (hidden bit technology: the highest bit of 1

Do not write to memory).
If you're patient, count to 53 by hand: 38414.4 (10) =1001011000001110.0110101010101010101010101010101010101 (2)
The scientific notation is: 1.001. Multiplied by 2 of the 15-time side. Index is 15.
So look at the order code, a total of 11, you can say that the range is-1024 ~ 1023. Because the index can be negative, in order to facilitate the calculation, the rules are added 1023 first, here,

15+1023=1038. The binary representation is: 100 00001110
Sign bit: Positive--0.
Together (1 of the mantissa binary highest digits):
01000000 11100010 11000001 11001101 01010101 01010101 01010101 01010101

   hexadecimal numbers stored in reverse byte order are:   
  55   55   55  &NBSP;&NBSP;55&NBSP;&NBSP;&NBSP;CD&NBSP;&NBSP;&NBSP;C1&NBSP;&NBSP;&NBSP;E2&NBSP;&NBSP;&NBSP;40

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.