Binary storage mode and conversion of float floating-point number

Source: Internet
Author: User

Both int and float are 4-byte, 32-bit representations. Why is the range of float greater than int?

Float precision is 6~7 bit. 1.66*10^10 's numerical results are not 166 0000 0000, the larger the error.

These problems are caused by the way floating-point numbers are stored.

float and double are stored in accordance with IEEE specifications, and float complies with IEEE R32.24, while double follows R64.53.

Whether single-precision or double-precision is divided into three parts in storage:

    1. Sign: 0 means positive, 1 is negative
    2. Digital digits (Exponent): Used to store exponential data in scientific notation, with shift storage
    3. Tail part (MANTISSA): Part of the tail number

Where float is stored as shown in the following way:

The double-precision storage method is:

the steps to convert a float to a memory storage format are: (1first, the absolute value of this real number is converted into binary format. (2Moves the decimal point of the binary format real number to the left or right by n bits until the decimal point moves to the right of the first valid digit. (323 digits from the first digit to the right of the decimal point are placed in the 22nd to No. 0 digits. (4If the real number is positive, in the 31st place put "0", otherwise put"1". (5If n is left-shifted, the exponent is positive and the 30th position is placed in the1”。 If n is right-shifted or n=0, the 30th place puts the "0". (6If n is left-shifted, then n minus 1 is converted to binary and the left is added "0"Top seven, put in 29th to 23rd place.
If n is right-shifted or n=0, the n is binary and the left is added "0"Top seven, then everyone, then put the 29th to 23rd place."

R32.24 and R64.53 storage methods are used scientific notation to store data, such as 8.25 with the scientific notation of the decimal notation is: 8.25*, and 120.5 can be expressed as: 1.205*, the computer does not know the decimal data, he only know 0, 1, so in the computer storage, the first to change the above number to the binary scientific notation, 8.25 in binary notation can be expressed as 1000.01,120.5 in binary notation as: 1110110.1 the scientific notation of binary notation 1000.01 can be expressed as 1.0001* , 1110110.1 can be expressed as 1.1101101*, the scientific notation of any number is 1.xxx*, the end of the part can be expressed as XXXX, the first is 1, why do you want to say? Can be the decimal point before the 1 omitted, so the end of the 23bit part, can be represented by the accuracy of 24bit, the truth is here, that 24bit can be accurate to the decimal point after several, we know that 9 of the binary representation of 1001, so 4bit can be accurate decimal point 1 decimal points, 24bit can make the float can be accurate to 6 decimal places, and for the exponential part, because the exponent can be negative, 8-bit exponential potential of the exponential range should be:-127-128, so the index portion of the storage using shift storage, stored data is metadata + 127, here's a look at 8.25 and 120.5 in memory of the true storage mode.

First look at the next 8.25, with the binary scientific notation represented by: 1.0001*

According to the storage method above, the sign bit is: 0, denoted as positive, the digit is: 3+127=130, the number of bits is part, so 8.25 storage way as shown:

The single-precision floating point number 120.5 is stored as shown in the following way:

  1  Write the binary number 22nd to No. 0, and fill a "1 " to get 24 significant digits.      Place the decimal point on the leftmost "1  ". ( 2 ) takes out the value n represented by the 29th to 23rd bits. When the 30-bit is "0 " when n everyone is reversed.      When 30 bits is "1  " when n is increased by 1. ( 3 ) shifts the decimal point to the left n bits (when 30 bits is "0 "      ) or right-shift n-bit (when 30 bits is "1  ") to get a binary representation of the real number. ( 4 ) converts this binary literal to decimal and, according to the 31st bit, "0 " or " 1  "with a plus or minus sign. 

So if you give a piece of data in memory and tell you that it is a single-precision storage, how do you know the decimal value of the data? In fact, the above anti-push process, such as the following memory data: 0100001011101101000000000000, first we will now segment the data, 0 10000 0101 110 1101 0000 0000 0000 0000, the storage in memory is as shown:

Based on our calculations, we can calculate that a set of data is represented as: 1.1101101*=120.5

While the storage of double-precision floating-point numbers is similar to that of single-precision storage, the difference is the number of bits of exponential and trailing parts. So here is no longer a detailed introduction of the double-precision storage, only 120.5 of the last storage mode is given, you can carefully think about why this is the

Below I have this basic knowledge point to solve one of our doubts, please look at the following procedure, pay attention to observe the output results

            float f = 2.2f;
            Double d = (double) F;
            Console.WriteLine (d.tostring (" 0.0000000000000 "));
            f = 2.25f;
            d = (double) F;
            Console.WriteLine (d.tostring (" 0.0000000000000 "));

Perhaps the result of the output is confusing to everyone, The single-precision 2.2 conversion to double precision, after the precision of 13 bits after the decimal point becomes 2.2000000476837, and the single-precision 2.25 converted to double precision, became 2.2500000000000, why 2.2 after the conversion of the value changed and 2.25 did not change it? That's weird, isn't it? In fact, we can probably find the answer through the introduction of the two kinds of storage results above. First, let's look at 2.25 single-precision storage, very simple 0 1000 0001 001 0000 0000 0000 0000 0000, and 2.25 double-precision representation: 0 100 0000 0001 0010 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000, so the value will not change at the time of casting, and we'll look at 2.25, 2.2 with science The number method should be: The decimal decimal is converted to binary decimal method for decimal, take the integer part, so 0.282=0.4, so the binary fractional first bit is the integer part of 0.4 0,0.4x2=0.8, the second bit is 0,0.8*2=1.6, the third bit is 1,0.6x2 = 1.2, the fourth bit is 1,0.2*2=0.4, the fifth bit is 0, so it is never possible to multiply to = 1.0, the resulting binary is an infinite loop arrangement of 00110011001100110011 ... , for single-precision data, the tail number can only represent 24bit accuracy, so 2.2 of the float storage is:

But this way of storage, converted to decimal values, But will not be 2.2, should be decimal in the conversion to binary may be inaccurate, such as 2.2, and double type of data also has the same problem, so in floating-point representation will produce a bit of error, in the single-precision conversion to double precision, there will be errors, for the decimal data can be expressed in binary, such as 2.25 , this error will not exist, so there will be more strange output above.

Note:

decimal Binary representation problem first we need to figure out the following two questions: (1how decimal integers are converted into binary number algorithms is straightforward. For example, 11 is represented as a binary number: One/2=5More than1                       5/2=2More than1                       2/2=1More than0                       1/2=0More than10 End 112 binary is represented as (from bottom to top):1011here is a point: as long as the result of the meeting except for the end of 0, we think, all the integers divided by 2 is not sure to be able to finally get 0.
In other words, will all integers be converted into binary numbers without an infinite loop? Absolutely not, integers can always be expressed in binary precision, but decimals are not necessarily. (2how decimal decimals are converted to binary arithmetic is multiplied by 2 until there are no decimals. To give an example,0. 9 = binary number0.9*2=1.8Take the whole number of parts1 0.8(1.8 of the number of decimal parts) *2=1.6Take the whole number of parts1 0.6*2=1.2Take the whole number of parts1 0.2*2=0.4Take the whole number of parts0 0.4*2=0.8Take the whole number of parts0 0.8*2=1.6Take the whole number of parts1 0.6*2=1.2Take the whole number of parts0 ......... 0.92 binary representation (from top down):1100100100100...... Note: The above calculation process loops, which means* * It is never possible to eliminate fractional parts, so the algorithm will go indefinitely. It is clear that the binary representation of decimals is sometimes impossible to be precise.
In fact, the reason is very simple, decimal system can be accurately expressed in 1/3? The same binary system cannot accurately represent 1/Ten。 This also explains why floating-point subtraction appears""Loss of precision" problem.

Binary storage mode and conversion of float floating-point number

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.