The square root countdown algorithm (fast inverse square root) is often associated with a hexadecimal constant 0x5f3759df . The algorithm is used to quickly calculate the inverse of the square root, which is 4 times times faster than the float(1/sqrt (x)) method. The algorithm was developed by the silicon Chart company in the 90 's, and later appeared in the source code of John Carmark's Quake III Arena .
This is an ancient algorithm, the earliest discussion in the 2001 China's CSDN Forum. And the code may not be suitable for contemporary 64bits machines, since the long data length on the 64bits machine now has changed from 4 bytes to 8 bytes. But I think that the generation of new generations of the discussion of some old classic issues is of practical value beyond the meaning of. That's it, no arguing.
Code Overview
The following fast square root countdown code comes from Quake III Arena.
1 floatQ_RSQRT (floatNumber )2 {3 Longi;4 floatx2, y;5 Const floatThreehalfs =1.5F;6 7x2 = number *0.5F;8y =Number ;9i = * (Long*) &y;//Evil floating point bit level hackingTeni =0X5F3759DF-(I >>1);//What's the fuck? Oney = * (float*) &i; Ay = y * (Threehalfs-(x2 * y * y));//1st Iteration - //y = y * (Threehalfs-(x2 * y * y)); //2nd iteration, this can removed - the returny; -}
In order to solve the reciprocal value of the square root, we first need to produce an approximate value. Then, the error of approximate value is reduced by using the method of numerical calculation. The selection of approximate values greatly affects the number of iterations of subsequent numerical calculations. Prior to this, the approximate value was chosen by looking up the table and using the so-called lookuptables. However, it provides a faster method than the table, and the speed is four times times the inverse of the regular square root. Although the method has some loss in precision, it makes up for the shortcoming in the calculation speed. The algorithm is designed for floating-point storage rules under the IEEE 754 standard, and Chris Lomot and Charles Mceniry demonstrate their use in other floating-point storage standards.
Algorithm Run example
Here's an example that makes x = 0.15625, and we want to calculate the value. The first step of the program is as follows:
each bit of 0011_1110_0010_0000_0000_0000_0000_0000//x and I represents 0001_1111_0001_0000_0000_0000_0000_ 0000 // right shift one after result: (i >> 1)0101_1111_0011_0111_0101_1001_1101_1111 // Magic number 0X5F3759DF0100_0000_0010_0111_0101_1001_1101_1111 //0X5F3759DF-(i >> 1 ) Results of
If the number above is stored in the IEEE 754 standard for floating-point numbers, the individual values are as follows:
0_01111100_01000000000000000000000 //1.25 * 2^-30_00111110_00100000000000000000000 // 1.125 * 2^-650_10111110_01101110101100111011111 //1.432430 ... * 2^+630_10000000_ 01001110101100111011111 //1.307430 ... * 2^+1
We magically found that the last number is already close to the exact value of 2.52982! What's the fuck, why??
algorithm principle
The algorithm calculates the value by following several steps:
- Reinterpret a 32bits floating-point variable into an integer variable to calculate the approximate value of the log2 (x)
- Calculate the approximate value of log2 (1/√x) with the approximate value above
- The above calculation result is strongly reversed to 32bits floating-point variable, and the initial value of iteration is obtained
- After a simple Newton iteration, the results are high enough to be accurate.
- Cast floating-point data into integral type, approximate solution to numerical value
The previous blog post specifically describes the method of floating-point number representation under the IEEE 754 standard, which is briefly reviewed below. In order to be able to store non-zero floating-point numbers, the first step is to write the floating-point x into a normalized binary form:
Here ex is an integer,mx ∈[0, 1) is obvious. We put 1.b1b2b3 ... A binary representation called the mantissa 1 + mx , where 1 to the left of the decimal point is called the leading digit. It is clear that the preamble is always 1, so it is ignored in the stored procedure of the floating-point number.
Let's calculate three numbers now:
- Sx , sign bit. When x>0 , equals 0, ex = ex + B , which offsets the exponent. In order to store both positive and negative exponents. IEEE 754 standard in real plus an offset as the storage exponent, for a single-precision floating point number b=127;
- mx = mx x &NBSP l , l = 223
first a radicals must be a non-negative number, so the sign bit must be 0, then the logarithm of the base 2 is available. Because mx ∈[0, 1), so there is . σ is used to adjust the approximate accuracy of σ≈0.0430357
If we cast the floating-point number to an integer, the offset exponent becomes the high 8 bits of the integer, and the tail part becomes the lower 23 bits of the integer.
The numeric value of this integer is:
From the above we can see that the integer value Ix is actually the result of log2 (x) translation and scaling, and it is easy to put log2 (x) To say it again:
- Calculation of initial value of iteration
OK, so far we've learned what happens to the strong turn of floating-point numbers into integers. The most important step of Carmack is to find an iterative initial value, which is close to the real result, which makes iterative convergence fast. So what we're going to see next is the origin of the so-called magic number, and how the initial value of this iteration is calculated.
Make y = 1/√x , then there is
According to the conclusions just obtained, there are:
After finishing, get:
What have we found? The inverse of the square root and the radicals strong turn into an integer has this approximate relationship! so the so-called magic number is 3/2l (b-σ) hexadecimal notation, so there is the following code:
I 0x5f3759df1 );
The second is to calculate ½ Ixby right-shift operation.
After the above operation, the algorithm will re-strong the calculated integer back to the floating-point number, so that the initial value of the iteration. Let's take a look at how iterations work.
Make
y is the solution to the following equation:
Newton Iterative method gives the following method to iterate the root of the equation:
This is the result of the previous iteration, which is the iterative initial value for the first iteration. So the pattern is written as follows:
y = y* (threehalfs-x2*y*y);
Repeating the above steps, the results of each calculation as input, will continue to approximate the real results.
Since the initial values we selected by the previous method are already very close to the real value, there is no need to do more iterations, so the source code will even comment out the process of the second iteration.
Reference:
[1] Https://en.wikipedia.org/wiki/Fast_inverse_square_root
Square root Inverse rate algorithm (Carmack open method)