Floating Point precision loss

Last Update:2014-02-08 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Floating Point Number in C #, float and double: float is System. the alias of a Single, which is a 32-bit number between-3.402823e38 and + 3.402823e38. It complies with the IEC 60559: 1989 (IEEE 754) Standard of the binary floating point algorithm. double is a System. the alias of Double. It is a 64-bit number between-1.79769313486232e308 and + 1.79769313486232e308. It complies with the IEC 60559: 1989 (IEEE 754) Standard of the binary floating point algorithm. We know that, the computer only recognizes 0 and 1, so the values are stored in the memory in binary mode. (For which intelligent human brain and computer are, individuals are more inclined to choose the human brain. Computers are just quick computing and don't bother with it !) Therefore, to know how the values are stored in the memory, convert them into binary values (here, the values in the range ). According to the IEEE 754 standard, any binary floating point number V can be expressed as: V = (-1 ^ s) * M * (2 ^ e ). Among them, s 0 {0, 1}; M 0 [1, 2); e indicates the offset index. Take 198903.19 (10) as an example. The first binary value is 110000100011110111.0011000010100011 (2) (16 decimal digits are captured). The scientific notation is equivalent to 1.100001000111101110011000010100011*(2 ^ 17) (The integer is 1), that is, 198903.19 (10) = (-1 ^ 0) * 1.100001000111101110011000010100011*(2 ^ 17 ). The integer part can use the "except 2 remainder method", and the fractional part can use the "Multiply 2 integer method ". From the results, we can see that after decimal part 0.19 is converted to binary, the number of decimal places exceeds 16 digits (I have already counted the 32 digits after the decimal point, but in fact this number of digits is infinite ). Because it is impossible to obtain the correct value, the floating point Precision loss problem is derived here:/* program segment 1 */float num_a = 198903.19f; float num_ B = num_a/2; Console. writeLine (num_a); Console. writeLine (num_ B); in this program code, we expected the correct results to be 198903.19 and 99451.595. But the result is !!! The reason is as follows... here we will introduce another method to convert the decimal part. If you are interested, you can see that the result must be accurate to N decimal places, then you only need to multiply the fractional part by the N power of 2 (for example, N = 16, 0.19*(2 ^ 16) to 12451.84 ). Take the integer part (12451), convert it to binary by the integer method, and get 11000010100011. If there are less than N digits, use 0 to fill it up. The result 0.19 is accurate to 16 bits and expressed as 0.0011000010100011 in binary format. It can be seen that if the fractional part is multiplied by the N power of 2, an integer can be obtained, then this decimal point can be accurately expressed in binary; otherwise, it cannot. (The principle is very simple. Based on the binary decimal to decimal method, this result can be obtained after the reverse push.) in the memory, the storage formats of float and double are consistent, only the occupied space is of different sizes. Float occupies 32 bits in total: from left to right, with 1st bits representing the sign position and 1 bits taking up; 2nd-9 bits representing the index position and 8 bits taking up; 10th-32 bits representing the ending bits, 23. Double occupies a total of 64 bits, from left to right 1st bits are also the symbol bit, occupying 1 bits; 2nd-12 bits are the index bits, accounting for 11 bits; 13th-64 bits are the ending bits, takes 52 places. Where, the sign bit (that is, the above s, the same below), 0 represents a positive number, and 1 represents a negative number. For float, the value range of the eight-digit index bit is 0-255 (10), because the index (e, the same below) can be positive or negative, the exponent value is an unsigned integer. According to the standard, the storage adopts the offset value (the offset value is 127) method, and the storage value is exponential + 127. For example, 0111 0011 (2) indicates the index-12 (10) (-12) + 127 = 115), 1000 1011 (2) indicates the index 12 (10) (12 + 127 = 139 ). {In addition, IEEE 754 (also applies to double): When the index is all 0, if the end number is all 0, it indicates ± 0 (plus or minus depends on the symbol bit). If the end number is not all 0, in calculation, the index is equal to-126, and the ending number is restored to 0 without adding the first 1. the decimal point of xxxxxx indicates a decimal point closer to 0. When the index is all 1, if the ending number is all 0, it indicates ± infinity (plus or minus depends on the symbol bit). If the ending number is not all 0, it indicates that this is not a number (NaN ). Data comes from non-standard floating point number}. For double and 11-bit exponent bits, the offset value used for storage is 1023. The ending number, because all values can be converted to 1.xxx * (2 ^ N) (the Accuracy Problem is temporarily ignored here), so the ending number only saves the fractional part (the top 1 is not stored in the memory, improves the precision of one position ). Taking float 198903.19 as an example, the binary value is 1.100001000111101110011000010100011*(2 ^ 17), the value is positive, the symbol bit is 0, the index is 17, and the value is saved as 144 (17 + 127 = 144 ), that is, 10010000 (a total of 8 bits, less than 8 bits complement with 0 at a high position); the decimal point is 10000100011110111001100 (23 bits are intercepted); the final result is 01001000 01000010 00111101 11001100, in inverted byte order, convert to hexadecimal: CC 3D 42 48 float f_num = 198903.19f; var f_bytes = BitConverter. getBytes (f_num); Console. writeLine ("float: 198903.19"); Console. writeLine (BitConverter. ToString (f_bytes); Console. writeLine (string. join ("", f_bytes.Select (I => Convert. toString (I, 2 ). padLeft (8, '0'); in the same format, double 198903.19 finally gets: 01000001 00001000 01000111 10111001 10000101 00011110 10111000 01010010 (why is the last two results 10 instead of 01, refer to floating point rounding). Convert to hexadecimal in descending byte order: 52 B8 1E 85 B9 47 08 41 double d_num = 198903.19d; var d_bytes = BitConverter. getBytes (d_num); Console. writeLine (& quot; double: 198903.19 & quot;); Co Nsole. writeLine (BitConverter. toString (d_bytes); Console. writeLine (string. join ("", d_bytes.Select (I => Convert. toString (I, 2 ). padLeft (8, '0'); return to the problem of loss of precision. Because the decimal places cannot be fully calculated, the memory uses truncated precision to store the converted binary data, as a result, the saved result is not a correct value. Looking back at the example in Section 1, num_a is actually saved in the memory as: 01001000 01000010 00111101 11001100, converted to decimal: 198903.1875; num_ B is actually saved as: 01000111 11000010 00111101 11001100 in the memory, convert to decimal: 99451.59375; check num_ B first. Because the value stored in the memory by num_a is incorrect, use it for calculation, the result 99.9% is incorrect. Therefore, the num_ B result is not the expected 99451.595. Then why does 198903.1875 change to 198903.2, and 99451.59375 to 99451.595? We know that the two values 198903.1875 and 99451.59375 are stored in the memory, so only the output may be changed. In fact, this is a trick by Microsoft. we have a saying: "attack by virus". This probably means that, since the stored values are already incorrect, at the time of output, it will intelligently guess and determine what the original correct value is, and then output the value of the guess, maybe it will be really guessed! (I wrote an article that I have read before and forgot the address. This is probably the meaning. If this is not the reason, you should be entertaining after meals .)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Floating Point precision loss

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Floating Point precision loss

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support