Concepts about floating point numbers

Source: Internet
Author: User

Concepts about floating point numbers

Floating point include
Float
And
Double
Two types,
Float
Account
32
Bit,
Double
Account
64
Bit. The binary storage format follows
Ieee754
Standard. To
Float
For example:



Symbol bit: positive number
0
, Negative number is
1


To
Float
Type data
123.456
For example, analyze the binary storage format:


First, convert the decimal number
123.456
Convert to binary number:
1111011. 01110100101111001


(Where
0.456
How to convert to binary? Multiply
2...
)

1111011. 01110100101111001
That is
11101101110100101111001.
Multiply
2
Of
6
Power


First, this is a positive number, and the symbol bit is
0


Level code is
6
.


(How to find
6
? I am not very familiar with it here.
5 + 127 = 133
,
2
Hexadecimal
10000101
)


The ending number is
11101101110100101111001.
The fractional part, that is

11101101110100101111001

To sum up:
123.456
The binary storage format of is:
0
1000010
111101101110100101111001

Use a piece of code to verify:

# Include <cstdlib>

# Include <iostream>

Using namespace STD;

Void printbinary (const unsigned char Val)

{

For (INT I = 7; I> = 0; I --)

If (Val & (1 <I ))

STD: cout <"1 ";

Else

STD: cout <"0 ";

}

Int main ()

{

Float d = 123.456;

Unsigned char * CP = reinterpret_cast <unsigned char *> (& D );

For (INT I = sizeof (float)-1; I> = 0; -- I)

{

Printbinary (CP [I]);

}

System ("pause ");

}


Note that,
X86
The architecture is the small-end mode, which is the low storage speed of the index data in the memory.
Address



Medium, while the number

The high data level is stored in the high memory address. So the above
For (INT I = sizeof (float)-1; I> = 0; -- I)
First print the high address section, that is, the binary high byte data.


Program Execution result:



0
1000010
111101101110100101111001


The analysis results are the same.


Double
Type and
Float
The binary storage format is the same.


The above part is my transfer from others' blogs. You can check it out. Well, let's raise a question,


Double;
A = 1.1;
Int B = (Int &);



Then output B. What is the result? 1? Of course not. The result is-1717986918. Why? Let's analyze it:

Double stores the fractional part with 52 bits. We calculate that the 52-bit storage should be 0001 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 1010
The last four digits should be noted, because they also need to be "rounded up" in the computer ". If the last bit omitted is 1, the first bit will be added for storage. So 1001 + 1 is 1010.
4. Replace the last 32 digits with a signed decimal number. The first digit is the symbol, indicating a negative number. The complement code is 0110 0110 0110 0110 0110 0110 0110 is 0110. That is, the number.


Another interesting thing is to look at the following code:

Double A = 3.0, B = 10.0, ans;

Ans = A/B;

One-step debugging to see what the ANS value will be, 0. 333333333... of course not, but 0. 299999999 ..., why? For an analysis, the C language uses the ieee794 floating point number, for example, the double type is 64-bit. Many numbers. For example, if 0.3 is expressed as an infinite repeating decimal number in binary format, it will be truncated. Unless you use a third-party exact Floating Point Library, as long as it is a C language, no matter which compiler is the result, it is best to use the width control such as % 5.2f to limit the output ending number when you output the data, in addition, you must not use = directly if you judge. Preferably ABS (a-B) <0.000001 and so on.

Another good method is to add an infinitely small decimal number to 3.0. For example, the minimum positive decimal number supported by your compiler is 0.000000001, in this case, this value is added to 3.0 During calculation, and then 10 is not involved. This can only solve some problems. If you want to stay secure, check the float. h header file in the standard library or the cfloat header file.

A friend said that at the beginning, the Excel and Windows calculators used ieee794 floating point numbers, and the maximum number of valid double numbers was 15.
However, a complicated simulated score algorithm was implemented in the Windows Calculator. You can also use a pair of p and q to save the values of the numerator and denominator, and think that the values encountered are all rational scores (irrational numbers can only be truncated ). For reference

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.