float p3x = 80838.0f;
float p2y = -2499.0f;
Double v321 = p3x * P2Y;
Console.WriteLine (v321);
Very simple, immediately written calculation The result is-202014162, no problem, C # did not produce such a result? It's impossible, open the Visualstudio,copy code to try, sure enough, the result is 202014162. Is this all over? Apparently not! You change the compile-time option from anycpu to x64 try it ~ (server environment is 64 drops Oh!!) It turns out to be 202014160, right, 202014160. A bit don't believe, run again two times, still is-202014160. Well, figured out, because of the error of the floating-point operation--202014160 the result is reasonable. Well, try C + + again. Test environment Intel (R) i7-3770 CPU, Windows OS 64. Visual Studio 2012 default settings.
float p3x = 80838.0f;
float p2y = -2499.0f;
Double v321 = p3x * P2Y;
Std::cout.precision (15);
Std::cout << v321 << Std::endl;
Well, it's like x86 and x64 are the logical result-202014160. It's strange.
The reasonable operation result, should be 202014160, the correct operation result is 202014162, the rationality is the floating point precision insufficient cause (after the article explained rationality). If you multiply by two double doubles, you get the correct and reasonable result of the operation. Let's not dwell on the appropriateness of the two words "right and reasonable" I used. The question is why is the X64 and X86 results inconsistent under C #?
With the same code as C + +, x86,x64 (debug, this will be said later), the consistent result-202014160, easy to understand and reasonable. Why? Look at the compiled code (the key part of the interception)
//c# x86
...
Float p3x = 80838.0f;
0000003b mov dword ptr [ebp-40h],479de300h
Float p2y =- 2499.0f;
00000042 mov dword ptr [ebp-44h],0c51c3000h
Double v321 = p3x * P2Y;
00000049 fld dword ptr [ebp-40h]
0000004c Fmul DWORD ptr [ebp-44h]
0000004f fstp Qword ptr [ebp-4ch]
...
//c# X64
...
Float p3x = 80838.0f;
00000045 movss xmm0,dword ptr [00000098h]
0000004d Movss DWORD ptr [rbp+3ch],xmm0
float p2y = -2499.0f;
00000052 movss & nbsp; Xmm0,dword ptr [000000a0h]
0000005a movss DWORD ptr [RBP+38H],XMM0
Double v321 = p3x * P2Y;
0000005f movss Xmm0,dword ptr [rbp+38h]
00000064 mulss xmm0,dword ptr [rbp+3Ch]
00000069 cvtss2sd xmm0,xmm0
0000006d movsd Mmword ptr [rbp+30h],xmm0
...
C + + x86/x64 generates similar code (which is why C + + x86/x64 is consistent with c#x64 results) by using floating-point multiplication (MULSS) and then turning double (CVTSS2SD). From the above assembly code can be seen in C # X86 generate code for the instructions FLD/FMUL/FSTP, and so on. where FLD/FMUL/FSTP and so on are made by the FPU floating-point operator, the FPU uses a 80-bit register for floating-point operations and then intercepts 32-bit or 64-bit based on the float/double. Non-FPU situation is the use of SSE 128-bit registers (float actually used only 32 of them, the calculation is also in 32-bit calculation), which is the ultimate cause of the above problems, detailed analysis see the end of the article explained.
Floating point operation Standard IEEE-754 recommended standard implementations provide floating-point extensible precision format (Extended precision), which is supported by the Intel x86 processor with the FPU (float point unit) floating-point operations processor. C # 's floating point is supported by this standard, and its official documentation also mentions that floating-point operations may produce values that are more precise than the return type (as the return value precision above the float's precision). It also shows that if the hardware supports scalable floating-point precision, all floating-point operations will be performed with this precision to improve efficiency , for instance x*y/z, the value of the x*y may be outside the range of the double, but the actual situation may be divided by Z to pull the result back into the double range, so that the result of the FPU will get an exact double value, not the FPU's infinity.
That is the result of the resulting two floating-point numbers multiplied in the case of non-FPU, 32-bit computation results in error, and FPU is calculated using 80-bit, so the result is very high precision, reflected in the case of this article is single-digit number 2. So how do you avoid this?
For C + + There is a solution that disables extensible precision, VS2012 C + +, compilation options can be set (in code generation) optional,/fp:[precise | fast | strict], in this case, release x86 under the precise or Strict will get a reasonable result ( -202014160), fast will produce the correct result ( -202014162), fast debug/release under the results are not the same OH (release under the optimization). X64 the results can be tested on your own (debug/release) to see what the VS compiled intermediate code looks like.
But for C #, no solution has been found yet.
So everyone in the code to ensure that the actual operation of the environment/test environment/development environment consistency (including the OS structure, compilation options, etc.) Ah, otherwise inexplicable problems will occur (this is the development of the environment and the operation of the environment inconsistent with the problem, tangled up a long time to find that this is the reason ; Do not forget that this may be the cause when you encounter floating-point operations; In addition, special attention is paid to the float/double mix.
Reference:
[1] C # Language specification floating point types
[2] Are floating-point numbers consistent in C #? Can they be?
[3] The FPU instruction Set
--------------------------------------------------------------------------------
80838.0f * -2499.0f =-202014160.0 Description of the floating-point operation process
The 32-bit floating-point number is represented in the computer by a 1-bit sign bit (s)-8-bit exponential bit (E)-23-digit valid digit (M).
32-bit float = ( -1) ^s * (1+m) * 2^ (e-127), where e is the exponent actually converted to 1.xxxxx*2^e, M is the front xxxxx (save 1 bits)
80838.0f = 1 0011 1011 1100 0110.0 = 1.00111011110001100*2^16
Valid bit m = 0011 1011 1100 0110 0000 000
Digit E = 16 + 127 = 143 = 10001111
Internal representation 80838.0 = 0 [1000 1111] [0011 1011 1100 0110 0000-000]
= 0100 0111 1001 1101 1110 0011 0000 0000
= 9d E3 00//The memory value that you see when you are actually debugging can be E3 9d 47 because of the small-end notation used in the debugging environment: low byte row memory lower address end, high row memory address
-2499.0 =-100111000011.0 =-1.001110000110 * 2^11
Valid bit m = 0011 1000 0110 0000 0000 000
Digit E = 11+127=138= 10001010
Symbol bit s = 1
Internal Representation-2499.0 = 1 [10001010] [0011 1000 0110 0000 0000-000]
=1100 0101 0001 1100 0011 0000 0000 0000
=C5 1c 30 00
80838.0 *-2499.0 =?
First is the index e = 11+16 = 27
Digit E = e + 127 = 154 = 10011010
Effective phase multiplication result is 1.1000 0001 0100 1111 1011 1010 01//You can do it yourself actually.
There can only be 23 digits in the real world, followed by truncated 1000 0001 0100 1111 1011 1010 01
Multiplication results internal representation =1[10011010][1000 0001 0100 1111 1011 101]
= 1100 1101 0100 0000 1010 0111 1101 1101
= CD-A7 DD
result =-1.1000 0001 0100 1111 1011 101 *2^27
=-11000 0001 0100 1111 1011 1010000
=-202014160
Turn it into a double or 202014160.
If the FPU is the case, the above significant bit result will not be truncated, that is,
FPU result =-1.1000 0001 0100 1111 1011 101001 *2^27
=-11000 0001 0100 1111 1011 1010010
=-202014162
The full text, if this article has flaws in the place welcome to correct.