First, let's look at a piece of code (in VC, in C ++ ):
int a = 2; int b = 3; int c = 6; if(1.0 / a + 1.0 / 3 + 1.0 / c >= 1.0) cout << "Yes" << endl; else cout << "No" << endl;
Is yes output or no output?
Answer:
In the console, after several seconds, no is output.
The problem arises. During the debug process, the value of "1.0/A + 1.0/3 + 1.0/C" is indeed 1.0000000000000.
So why does if return 0?
The analysis is actually a problem of precision, 1.0/3 = 0.333333333333 ......, 1.0/C = 0.16666666666666 ........ If the exact addition is of course 1, do not forget that for a computer, a digital computer that can only process limited discrete data, it can only be approximately calculated.
Is there any way to compare floating point numbers? Let's take a look at how to store floating point numbers in computers.
Float and double range and precision
1. Scope
The float and double ranges are determined by the number of digits of the index.
The float index has eight digits, while the double index has 11 digits. The distribution is as follows:
Float:
1bit (symbol bit) 8 bits (index bit) 23 bits (tail bit)
Double:
1bit (symbol bit) 11 bits (index bit) 52 bits (tail bit)
Therefore, the float index range is-127 ~ + 128, while the double index range is-1023 ~ + 1024, and the index bit is divided by complement code.
The negative index determines the smallest non-zero number of absolute values that floating point numbers can express. The positive index determines the maximum number of absolute values that floating point numbers can express, that is, the value range of floating point numbers.
Float range:-2 ^ 128 ~ + 2 ^ 128, that is,-3.40e + 38 ~ + 3.40e + 38; the double value range is-2 ^ 1024 ~ + 2 ^ 1024, that is,-1.79e + 308 ~ + 1.79e + 308.
2. Precision
The precision of float and double is determined by the number of digits of the ending number. Floating point numbers are stored in the memory in scientific notation, And the integer part is always an implicit "1". Since it remains unchanged, it cannot affect the accuracy.
Float: 2 ^ 23 = 8388608, a total of seven digits, which means that there can be a maximum of seven valid digits, but it is absolutely guaranteed to be 6 digits, that is, the float precision is 6 ~ 7 valid digits;
Double: 2 ^ 52 = 4503599627370496, a total of 16 digits. Similarly, the precision of double is 15 ~ 16 bits.
The following describes how to solve the comparison of Floating Point Numbers (the following content is from netizens and feels good ):
Floating Point comparison (1)
In mathematical operations, it is often used to determine whether two numbers are equal.
A statement like A = B can be well processed for Integers to solve all the problems.
However, floating point numbers are different.
First, the Binary Expression of floating point numbers in computers determines that most floating point numbers cannot be accurately expressed.
Most of today's computers are digital computers, not analog machines. The Discrete Data Representation Method of digital machines naturally cannot accurately express most of the data volume.
Second, the floating point precision of the computer is only seven digits under the float type of single precision. During floating point operation, this precision often leads to an error between the calculation result and the actual expected result.
For the first two reasons, it is difficult to use a = B to determine whether the two floating point numbers are the same.
Naturally, we can think of a discriminant method like FABS (A-B) <Epsilon
But is this method secure?
It is also insecure.
First, Epsilon is an absolute data, that is, the absolute error in error analysis.
Using a fixed number is not acceptable for the entire number field that can be expressed by the float type.
For example, if the Epsilon value is 0.0001 and the values of A and B are about 0.0001, it is obviously inappropriate.
In addition, when the size of A and B is 10000, It is not suitable, because 10000 and 10001 can also be considered equal.
Only when A or B is near 1 or 0
Since the absolute error is not acceptable, we will naturally think of the relative error.
Bool isequal (float a, float B, float relerror)
{
Return (FABS (a-B)/A) <relerror )? True: false;
}
This write is not complete, because it is compared with the first fixed parameter, then
When isequal (a, B, relerror) and isequal (B, A, relerror), different results may be obtained.
At the same time, if the first parameter is 0, it may be a division of 0 overflow.
This can be transformed
Select the division number as the absolute value of A and B.
Bool isequal (float a, float B, relerror)
{
If (FABS (a) <FABS (B) Return (FABS (a-B)/A)> relerror )? True: false;
Return (FABS (a-B)/B)> relerror )? True: false;
};
Is the relative error perfect?
No. In some special cases, the relative error cannot represent all
For example, when determining whether the three points in the space are collocated or not, you can use the method to determine the distance between the three points and the line segments formed by the other two points.
It is not enough to only use the relative error. When the distance between a line segment may be very long or long, the distance between the point and the line segment, and the length of the Line Segment, only when the relative error and absolute error are combined
A relatively complete comparison algorithm should be as follows:
Bool isequal (float a, float B, float abserror, float relerror)
{
If (A = B) Return true;
If (FABS (a-B) <abserror) return true;
If (FABS (A> B) Return (FABS (a-B)/A> relerror )? True: false;
Return (FABS (a-B)/B> relerror )? True: false;
}
This is complete.
Floating Point Number comparison (2)
The above method is only the initial comparison method between floating point numbers.
Advanced methods: floating point comparison (2) This article -- how to convert the comparison between two floating point numbers to the comparison between two integers.
Let's first look at positive numbers.
According to the IEEE memory structure, the index is at a high level and the ending number is at a low level.
When the memory structure of floating point numbers is compared according to integers, the situation is also true.
Therefore, if we compare them here, the efficiency of using them as integers will be very high, for example
Float F1 = 1.23;
Float F2 = 1, 1.24
F1> F2 was established
(Int &) F1> (Int &) F2 is also true
Furthermore, by carefully studying the floating point structure of IEEE, we can find that the floating point Precision problem mentioned in floating point comparison-not all floating point numbers can be accurately expressed.
The floating point numbers that can be precisely expressed are actually limited, that is, the IEEE enumerated for various situations are 2 ^ 32. What cannot be expressed occupies the majority
In the case of 32-bit IEEE, the ending number is 23 bits (meaning that the first digit is 1)
For floating-point numbers that can be accurately expressed, if we consider these 23 digits as integers, adding 1 means we can find the smallest floating-point number larger than the current floating-point number.
On the contrary, we calculate the difference between two floating point numbers and corresponding integers, the obtained integer indicates how many floating-point numbers can actually be expressed between two floating-point numbers (the corresponding indexes are the same, which is also effective when the indexes are different)
In this way, we can use (Int &) F1-(Int &) F2 to compare the two positive floating point numbers.
The result of the difference is actually a relative error.
This relative error is not equivalent to the relative error in the general sense.
It expresses the number of floating-point numbers that can be accurately expressed between two floating-point numbers.
In this way, it is more effective to control the comparison between two floating point numbers by specifying this threshold.
For two positive floating point numbers
Bool isequal (float F1, float F2, int absdelta)
{
If (ABS (Int &) F1-(Int &) F2) <absdelta) return true;
}
ABS instead of FABs is used here. The calculation gap in ASM is also very large.
The comparison of two negative numbers is the same.
Only the integer corresponding to the negative memory is added with 1, and the corresponding result is a smaller negative number.
However, there is no direct comparison between negative numbers and integers, because according to the IEEE Memory Structure
The positive and negative numbers are different, and the corresponding integers cannot be consecutive.
The smallest positive number is 0, and the corresponding integer is 0x00000000.
The smallest negative value is-0, and the corresponding integer is 0x80000000.
Don't be surprised-0
There are two 0's In the IEEE expression. One is + 0 and the other is-0.
Interestingly, + 0 and-0 are equal according to the F1 = F2 judgment.
By comparison, we can find that,
+ 0 and Positive floating point numbers can be directly compared by converting them into integers.
-0 and negative floating point numbers can be directly compared by converting them into integers.
If we can connect them, the direct comparison of the entire integer method will be complete.
By comparing the structure of negative numbers, we can find a simple method:
Subtract-0 from the integer corresponding to the negative memory, and they are connected consecutively.
In addition, the better result is that after all the negative numbers are subtracted, the corresponding integers are also negative.
In this way, the entire integer comparison becomes continuous, and it is valid throughout the floating point range.
The final comparison algorithm is:
// Function: bool isequal (float F1, float F2, int absdelta)
// Function: compare whether two floating point numbers are similar
// Input: two floating point numbers involved in the comparison between F1 and F2
// How many floating-point numbers are allowed between two floating-point numbers in absdelta? The number of floating-point numbers that can be accurately expressed exists, which is equivalent to the relative error.
// Output: True, two floating point numbers are equal; false two floating point numbers are not equal
// Note: Only applicable to the IEEE 32-bit floating point number Structure
Bool isequal (float F1, float F2, int absdelta)
{
Int I1, I2;
I1 = (F1> 0 )? (Int &) F1): (Int &) F1-0x80000000 );
I2 = (F2> 0 )? (Int &) F2): (Int &) F2-0x80000000 );
Return (ABS (i1-i2) <absdelta )? True: false;
}