**Comparing floating-point numbers**

All of these can draw a corollary that you should very, very rarely go directly to compare the equality between floating-point numbers. It's usually better to be more or less than that, but when you're interested in equality you should always consider whether you actually want to be nearly equal: A number is always the same as the other one. One simple way to do this is to use a number minus another number, use Math.Abs to find the absolute difference, and then check whether the error is low enough to be tolerable.

There are also some cases of pathology, which are caused by JIT optimization. View the following code:

` ` **static**
**void**
`main(` **string**
`[] args) ` |

` ` **float**
`g = sum (0.1f, 0.2f); ` |

` ` `console.writeline (f==g);` |

` ` **static**
**float** `sum (` **float** `f1, ` **float**
`f2) ` |

`它应该总是打印true, 对不？错，很不幸。当在debug模式下运行时，jit不能像正常那样做一些` |

`优化处理，它将打印true.当正常运行时jit可以将sum 的结果存储的比一个` **float**
`可以实际表示` |

`的数更加精确 - 它可以使用默认x86 80位表示，例如，对sum 本身，返回值和本地变量。` |

`查看ecma cli 规范，第一部分， 12.1.3 章节来获得更多细节。取消上面的注释，让jit的` |

`行为稍微谨慎一些 - 结果将会是true - 尽管在当前的实现可以让结果是true,但是不应该被` |

`信赖.(在上面语句中将g强制转换成` **float**
`也可以有同样的效果，尽管它看起来像一个` |

`这是另外的避免对浮点数做相等比较的原因，尽管你非常确定结果应该是一样的。` |

`(译者注: .net 平台的运行结果总是true.)` |

**How does. net Format floating-point numbers?**

There is no built-in way to view the exact decimal value of a floating-point number in. NET, although you can do it with some work. (See the Code at the end of this article that will implement this feature.) By default,. NET formats a number of double types into 15 decimal places, formatting a number of float types into 7 decimal places. (In some cases the scientific notation will be used; View the MSDN Standard numeric format string page for more information.) If you use the Round-trip mode specification ("R"), it formats the number into the shortest format, and when intercepted (in the same type), it becomes the initial number. If you store floating-point numbers as strings and the exact values are important to you, you should define the use of a roundtrip mode specification, otherwise you are very likely to lose data.

**What does a floating-point number look like in memory?**

As mentioned above, a floating-point number basically has a sign bit, an exponent and a mantissa. All of these are integers, and their three unions accurately determine the representation of numbers. There are many floating-point number categories: canonical, below normal, infinity and Non-numeric (Nan, not a number). Most numbers are normalized, meaning that the first digit of the binary tail is 1, which means you don't actually need to store it. For example, binary number 1.01101 can only be represented by. 01101-the beginning of 1 is assumed, and if it is 0, a different exponent will be used. That technique works only if the number is available in a suitable exponential range. Numbers that are not in that range (very, very small numbers) are called non-normal numbers and assume no start bit. "Not a digit" (Nan, not a number) is like the result of 0/0, and so on. Nan has many different categories, and there are some old behaviors. Abnormal numbers are sometimes referred to as non canonical numbers.

Sign bit, exponent and mantissa in the higher than the other means is an unsigned integer, the stored values are in order first sign bit, then refers to digits, and finally the mantissa. The "real" index has an offset value-for example, a double number with an index of 1023 offsets, so when you come back to calculate the actual value, a value of 1026 for a storage index becomes 3. The following table shows the meaning of each combination of sign bits, exponents, and Mantissa, using double as an example. The same principle applies to float, with only a few different values (such as different offset values). Note that the exponential value given here refers to the stored index, which precedes the application of the offset value. (That's why the offset value is displayed in the Value column.) )

Sign bit (s, 1-bit) |
Stored Indices (e, 11-bit) |
Mantissa (M, 52 bits) |
Number Type |
Value |

Any |
Non-Zero |
Any |
Normal |
(-1) ^{s} x 1.m (binary) x 2^{e-1023} |

0 |
0 |
0 |
0 |
+0 |

1 |
0 |
0 |
0 |
+0 |

0 |
2047 |
0 |
Infinity |
Is Infinity |

1 |
2047 |
0 |
Infinity |
Negative infinity |

0 |
2047 |
Non-Zero |
Non-numeric |
N/A |

**Examples that can work**

Consider the following 64-bit binary number:

0100000001000111001101101101001001001000010101110011000100100011

As a double number, can be split into:

Symbol bit: 0

Digit: 100,000,001,002 =1028 Decimal

Tail Digit: 0111001101101101001001001000010101110011000100100011

So that's the value of a normal number.

(-1) ^{0} x 10111001101101101001001001000010101110011000100100011 (binary) x 2^{1028-1023}

can also be simpler to express as

1.0111001101101101001001001000010101110011000100100011 (binary) x 2^{5}

^{Or}

101110.01101101101001001001000010101110011000100100011

In decimal, this is 46.42829231507700882275457843206822872161865234375, but. NET will show 46.428292315077 by default or use the round trip The format specification is represented as 46.428292315077009.

**Sample code**

Doubleconverter.cs : This is a fairly simple class that allows you to convert a double number to its exact decimal number, expressed as a string. Note that although Infinity decimal numbers do not always have an infinite sophomore notation, all of the infinite binary numbers have an infinity decimal representation (since 2 is the factor of 10, in essence). This class is very simple to use = Simply call Doubleconverter.toexactstring (value) and then the exact string representation of value is returned.

**NaNs**

NaNs is a strange beast. There are two types of nans-signals and quiet (signalling and quiet, which may be inaccurate) or briefly expressed as Snan and Qnan. In the reigning mode concept, a quiet Nan has a high mantissa, and a signal nan clears it. Quiet NaNs to mark the exact operation is undefined, and the signal NaNs is used to define the other (the operation is illegal, not just an indeterminate output).

The strangest things most people want to know nans not equal to themselves. For example, the Double.nan==double.nan result is false. Instead, you need to use Double.nans to check whether a value is not a number. Luckily, most people can't meet nans except in this article.

**Conclusion**

Binary floating-point arithmetic is good as long as you know what's going on and don't expect you to enter decimal numbers in your program as decimal values, and you don't expect the calculation of binary floating-point numbers to produce exact results. Although two numbers are precisely represented by the type you are using, the results of the operations involving these two numbers will not have to be accurately represented. This can be easily seen through division (for example, 1/10 is not exactly represented, but 1 and 10 are precisely represented) but it can happen in any operation-although it seems unlikely to happen like addition and subtraction.

If you specifically need an exact decimal number, consider using the decimal type instead-but this takes into account the cost of performance. (A very fast-designed test shows that multiplication of doubles type numbers is **40 times times faster** than decimals type multiplication; do not pay extra attention to this situation, but make it a hint to make binary floating-point operations much faster in the current hardware environment than decimal floating-point operations.) )