Story between int, float, and double

Source: Internet
Author: User
Tags truncated

Sorry, I used such a "two" question, but I hope the content is not too much.
Actually, those who have learned programming are not familiar with these three things. Int is also called an integer. In. net, it refers to int32, which is a 32-bit signed integer variable. Float, single-precision floating point number, 32-bit length, 1-bit symbol bit, 8-bit index bit and 23-Bit Data bit, also known as single in. net. Double, 64-bit length double-precision floating point number, 1-bit symbol bit, 11-bit index bit, 52-Bit Data bit. The relationship between them is: int can be converted to float and double stably, float can only be forced to int, but can be implicitly converted to double, double can only be forcibly converted to float and Int.
Before explaining the problem, it is also necessary to review some of the knowledge learned when computer composition principles are represented by binary complement and floating point. I want to convert a decimal to a binary system without having to worry about it. I just want to solve the problems of positive zero and negative zero for the convenience of computation. modern computer technology, the memory is stored in the binary complement form. Of course, this is nothing special, but there are some discrete and vertices that need special definitions, such as-(2 ^ 31 ), this number is expressed as 1000 in the complement code of the int... (31 zeros), this nested complement code calculation formula does not get the result (in fact, it is really the result without considering carry, but it is always quite strange ). Furthermore, floating point numbers are actually represented by scientific notation starting with 0.

After talking nonsense, there are several interesting problems.

1 Int I = Int32.maxvalue;
2 Float F = I;
3 Int J = ( Int ) F;
4 Bool B = I = J;

Here, B is false. In this operation, if we change float to long, implicit conversion is performed for the first time, and forced conversion is performed for the second time, the result will be true. At first glance, float. maxvalue is several times larger than Int. maxvalue. However, this implicit conversion causes data loss. Int. maxvalue. The value is equal to 2 ^ 31-1. It is written as a binary complement in the form of 01111... (31) when the number is expressed as the scientific notation of the float count, it will be written as + 0.1111... (23, 1) * 2 ^ 31. For those 31, the last 8 of them were ruthlessly abandoned by float. Therefore, when this float is forcibly converted back to the int type, the binary complement of the corresponding int type is changed to 0111... (23: 1) 00000000, this number is 255 different from the original int, so it is not equal.
Then another question is raised. What kind of int is converted to float and then changed back, which is equal to the previous value? This problem actually lies in the 23-bit float data bit. For an int, it is written in binary form and becomes an arrangement of 0 and 1 with 32 lengths. For this arrangement, as long as the distance between the first 1 and the last 1 is no more than 23, it is converted to float and then converted back, the two values will be equal. This problem is irrelevant to the size, and the set is not continuous under the entire set of Int.

1 Double D =   0.6 ;
2 Float F = ( Float ) D;
3 Double D2 = F;
4 Bool B = D = D2;

Here, B is also false. In this operation, if another D is equal to 0.5, the result will be true. At first glance, the number 0.6 is so short that both double and float can be expressed, so the conversion will be converted back and the result should be equal. In fact, this is because we have been thinking in decimal format for too long. If we convert 0.6 to binary decimal, we can see that the result is 0.10011001 ...... (1001 cycles ). This is an infinite loop decimal. Therefore, no matter float or double, it cannot completely save its precise value (the computer does not know the score, hehe) When storing 0.6. In this case, because float saves 23 digits, the 52-bit double storage causes some data to be lost when the double is converted to float. When the double is not converted back, the lost values are supplemented with 0, therefore, the subsequent double value is no longer the same as the previous double value.
In this case, another problem occurs. What kind of double is converted into float and then converted back? What values are equal? In fact, this problem is surprisingly similar to that of INT (whether it is similar to float ), however, we still need to consider that double has three more exponent bits than float. A large number of double values can indicate that float is not supported.
Another mathematical question is: What decimal places are represented as binary instead of infinite decimal places? This question can be said to be a question in the scope of mathematics, but it is relatively simple and the answer is obvious. For all the last decimal digits ending with 5, can be converted into a finite decimal point of the binary (although this decimal point may be long to no spectrum ).
Finally, there is an interesting question. We have just said that after 0.6 is represented as a binary decimal number, it is 0.1001 and 1001 is an infinite repeating decimal number in the circular section. When we save it as a floating point number, it will be truncated at a certain position (such as the 23-bit float and the 52-bit double of float), so the binary number in the memory actually exists and is converted back to decimal, is it bigger or smaller than the original decimal number? The answer is it depends. When a person computes decimal digits, It is rounded to five. It is quite easy for the computer to calculate binary decimal digits, that is, 0 to one. For float, it must be truncated to 23 bits. If the value is 1 on the 24 bits, it will cause carry. In this case, the stored value is greater than the real decimal value. If it is 0, it is removed, and the stored value is smaller than the real decimal value. Therefore, this can be reasonably explained by converting 0.6d to float and then to double. Its value is 0.60000002384185791, which is larger than 0.6, the reason is that the binary scientific notation of 0.6 indicates that 24th bits are 1, resulting in carry.

at this point, there is still a problem, that is, for floating point numbers, although the hardware provides computing support, but how does it convert each other from decimal to another, who did it? (assembler or compiler ). This is highlighted in the fact that the number in memory is significantly different from 0.6. However, either language can be used for debugging and input, display it correctly as 0.6 to the user ( Program Member). The best example is the double and tostring methods, if I write double D = 0.59999999999999999999999999999, D. tostring () is 0.6. It is true that for double, the N-long number I wrote is the same as what 0.6 is stored in the memory, but the computer, what if I changed the number that is not equal to 0.6 to 0.6 and displayed to me? You are welcome to discuss this question and ask for your advice.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.