The original is in my public number:geekartt .
In the process of summing up a computer, the addition of a large number and a decimal will result in truncation error due to the finite precision of the floating-point numbers. Therefore, in the construction of the computational grid, we should try to avoid such a situation, the calculation is unified in a relatively close order of magnitude. So, when it comes to adding a series of numbers, a good trick is to arrange the numbers from large to small, and then add them one at a time.
And if you must make such a large number with the sum of decimals, a visual idea is: the large and fractional parts of the high-level addition, the remaining fractional part as a separate "complete" part of the addition. The official name for this intuitive idea is called the Kahan summation method.
Assume that the current floating-point variable can hold a 6-bit value. Then the theoretical value of the sum of the values 12345 and 1.234 should be 12346.234. However, since only 6-bit values can be stored at this moment, the correct theoretical value is truncated to 12346.2, resulting in a 0.034 error. When there are many such large numbers and decimals added, the truncation error will accumulate gradually, resulting in a large deviation of the final calculation results.
Kahan algorithm:
def kahansum (input): var sum = 0.0 var c = 0.0 for i = 1 to input.length does var y = input[i]-c //In Itially, c is zero; Then it compensates previous accuracy. var t = sum + y //Low-order digits of y is lost C = (t-sum)-y //recover the low-order digits of Y, with Negative symbol sum = t next i return sum
In the above pseudo-code, the variable C represents the fractional complement of all compensation, more strictly, should be negative complement all points. With the accumulation of all points of this complement, when these truncation errors accumulate to a certain magnitude, they are not truncated at the time of summation, so that the accuracy of the whole summation process can be controlled relatively well.
This is illustrated by a specific theoretical example. For example, using 10000.0 + pi + e to illustrate, we still assume that floating-point variables can only hold 6-bit values. At this point, the specific written summation should be:10000.0 + 3.14159 + 2.71828, their theoretical results should be 10005.85987, about equal to 10005.9.
However, due to truncation error, the first summation 10000.0 + 3.14159 can only get the result 10003.1; The result is added to 2.71828 and 10005.81828, is truncated to 10005.8. At this point the result is 0.1.
Using the Kahan summation method, we run the process (remember that our floating-point variables hold 6-bit values),
First summation:
y = 3.14159-0.00000t = 10000.0 + 3.14159 = 10003.14159 = 10003.1 //low-order digits has LOSTC = (10003.1- 10000.0)-3.14159 = 3.10000-3.14159 =-(. 0415900) //recover the negative parts of compensation errors sum = 1003.1
Sum of the second time:
y = 2.71828-(-.0415900) = 2.75985 //Add Previous compensated parts to current small Numbert = 10003.1 + 2.75987 = 10005.85987 = 10005.9 //As the low-order digits has been accumulated large enought, it won ' t be canceled by Big Numberc = (10005.9-10003.1)-2.75987 = 2.80000-2.75987 =. 040130sum = 10005.9
The above is theoretical analysis. Another example of Python code that can be run is to make it easier for interested friends to do research. This example was once seen by Google's chief scientist Vincent Vanhoucke on Udacity's deep learning course. This summation is: on the basis of 10^9 , plus 10^ ( -6), repeat 10^6 times, minus 10^9, namely 10^9 + 10^6*10^ (-6)-10^9 , the theoretical value should be 1.
Python Code:
Summ = 1000000000for indx in xrange (1000000): Summ + = 0.000001 Summ = 1000000000print Summ
After running, you can get the result is 0.953674316406. It can be seen that the cumulative amount of truncation error is very considerable after the 10^6 of the sum of times.
If we use the Kahan summation method to make improvements, we can get:
Python Code with Kahan method:
Summ = 1000000000c = 0.0for indx in xrange (1000000): y = 0.000001-c t = summ + y c = (t-summ)-y Summ = Tsumm-= 1000000000print Summ
After running, we can be delighted to see the correct result: 1.0.
If you like my article or share, please long press the QR code below to follow my public number, thank you!
A summation algorithm for large numbers and decimals