In 3D graphics programming, the reciprocal of the square root or square root is often required. For example, the length of a vector or the normalization of a vector. In the C mathematical function librarySQRTIt has ideal precision, but it is too slow for 3D game programs. We hope to further improve the speed while ensuring sufficient accuracy.
CarmackInQuake3The following algorithm was used. When it first appeared in public, it almost shook everyone. It is said that this algorithm was not actually invented by Carmack. Its real author isNVIDIAOfGary tarolli(Unconfirmed ).
/// Calculate the reciprocal of the square root of X // float invsqrt (float X) {float xhalf = 0.5f * X; int I = * (int *) & X; I = 0x5f3759df-(I> 1); // calculate the first approximate root x = * (float *) & I; X = x * (1.5f-xhalf * x * X); // return X ;}
The essence of this algorithm isNewton Iteration Method(Newton-Raphson method (NR), while the foundation of NR isTyleseries(Taylor series ). Nr is a method for finding the approximate root of an equation. First, we need to estimate a value that is closer to the root of the equation, and then calculate a more approximate value based on the formula, and repeat it until satisfactory accuracy can be obtained. The formula is as follows:
Function: Y = f (x) first derivative: y' = f' (x) Then equation: F (x) X [n + 1] = x [n + 1] = x [N]-f (x [N])/F' (X [N])
The most important thing about nr is to estimate the first approximate root. If the approximate root is close enough to the actual root, a satisfactory solution can be obtained only after a few iterations.
Now let's look at how we can solve our problems using Newton's method. The reciprocal of the square root is actually the solution of equation 1/(x ^ 2)-A = 0. Expand the equation according to the formula of the Newton Iteration Method:
x[n+1]=1/2*x[n]*(3-a*x[n]*x[n])
Put 1/2 in brackets, and the last line of the above function is obtained.
Next, we will try to estimate the first approximate root. This is also the most amazing part of the above functions. It calculates an approximate root which is very close to the real root through some method. Therefore, it only needs to use an iteration process to obtain a satisfactory solution. How does it do it? All the mysteries of this line are:
I = 0x5f3759df-(I> 1); // calculate the first approximate Root
Super inexplicable statement, isn't it? But if you think about it, you can understand it. We know that under the IEEE standard,FloatData of the type is expressed in this way on a 32-bit System (this is generally the case, but many details are omitted. If you are interested, You can Google it ):
Bits: 31 30... 031: Symbol bit 30-23: a total of 8 bits, storage index (e) 22-0: a total of 23 bits, storage ending number (m)
Therefore, the 32-bit floating point number is represented by a decimal real number: M * 2 ^ e. Start the root and then the reciprocal is: m ^ (-1/2) * 2 ^ (-E/2 ). Now it is very clear. Statement I> 1: divide the index by 2 to implement the 2 ^ (E/2) Part. Instead, subtract it with a constant to get the m ^ (1/2) and reverse all the exponent symbols at the same time.
As for0x5f3759dfWell, I can only say that it is indeed a superMagic number.
The magic number can be deduced, but I am not going to discuss it here, because it is too cumbersome. To put it simply, the principle is as follows: Because the ending m in the floating point number of IEEE omits the first 1, the actual ending number is 1 + M. If you have not dozed off in a college math class, when you see (1 + M) ^ (-1/2), you should immediately think of its tyleseries, the first entry of the expansion is a constant. The following is a simple derivation process:
For the real number r> 0, if it is in the floating point representation of IEEE, The exponent is E, and the ending number is m, then: R ^ (-1/2) = (1 + M) ^ (-1/2) * 2 ^ (-E/2) unfold (1 + M) ^ (-1/2) according to the Taylor series. Take the first item and get: original formula = (1-m/2) * 2 ^ (-E/2) = 2 ^ (-E/2)-(m/2) * 2 ^ (-E/2) (M/2) * 2 ^ (E/2) is (r> 1), and in IEEE representation, the exponent symbol simply adds an offset, and the first half of the formula is just a constant, so the original formula can be converted to: original formula = C-(m/2) * 2 ^ (E/2) = C-(r> 1), where C is a constant, so you only need to solve the equation: R ^ (-1/2) = (1 + M) ^ (-1/2) * 2 ^ (-E/2) = C-(r> 1) Find the C value that minimizes the relative error.
The above deduction process is my personal understanding and has not been confirmed. Chris lomont discussed in detail the solution to the last equation in his paper and tried to find the best constant C on the actual machine. If you are interested, you can find a link to his thesis at the end of the article.
Therefore, the so-called magic number does not fall from a galaxy in the n-element universe to the Earth due to space-time distortion, but a mathematical theory that existed several hundred years ago. As long as you are familiar with NR and Taylor series, you and I have the ability to make similar optimizations.
Some tests have been performed on gamedev.net. The relative error of this function is about 0.177585%, and the speed is improved by more than 20% than that of SQRT in the C standard library. If an iterative process is added, the relative error can be reduced to the level of the e-004, but the speed will also be reduced to almost the same as SQRT. It is said that in doom3, Carmack further optimized the algorithm by searching for tables, with almost perfect accuracy and a faster speed than the original version (working hard to get the source code, ).
It is worth noting thatChris lomontIn the calculation, the best constant in theory (with the highest accuracy) is0x5f37642fIn the actual test, if only one iteration is used, the effect is also the best. But the strange thing is that after two Nr times, the accuracy of the solution under this constant will be greatly reduced (heaven knows what is going on !). After actual tests,Chris lomontThe best constant is0x5f375a86. If you change to a 64-bitDoubleIf the version is used, the algorithm is the same, while the Optimal constant is0x5fe6ec85e7de30da(Another sweaty magic number--B ).
This algorithm depends on the internal representation of floating point numbers and the byte sequence, so it is not portable. If it is put on the Mac, it will crash. If you want to be portable, use SQRT. But the algorithm IDEA is universal. You can calculate the square root algorithm.
The square root algorithm used by Carmack in quake3 is given below. Carmack has donated all the source code of quake3 to open source, so you can rest assured to use it without worrying about receiving a lawyer's letter.
/// The function used by Carmack in quake3 to calculate the square root // float carmsqrt (float X) {Union {int intpart; float floatpart;} convertor; Union {int intpart; float floatpart;} convertor2; convertor. floatpart = x; convertor2.floatpart = x; convertor. intpart = 0x1fbcf800 + (convertor. intpart> 1); convertor2.intpart = 0x5f3759df-(convertor2.intpart> 1); Return 0.5f * (convertor. floatpart + (x * convertor2.floatpart ));}
Another higher-speed SQRT implementation based on the same algorithm is as follows. It simply divides the index by 2 and does not consider the ending number. To understand this code, you must know that in the IEEE floating point format, e is obtained by adding 127 to the actual index. For example, if the real number is 0.1234*2 ^ 10, in the floating point representation, the value of E (23-30) is actually 10 + 127 = 137. Therefore, the following code processes the 127 offset, which is the constant 0x3f800000. I have not actually tested this function, so I have no comment on its advantages and disadvantages, but it is estimated that its accuracy will be much lower.
floatFaster_Sqrtf(float f){floatresult;_asm{mov eax, fsub eax, 0x3f800000sar eax, 1add eax, 0x3f800000mov result, eax}return result;}
In addition to the NR-based method, other common fast algorithms also includePolynomial approximation. The following function is taken from 3D game programming masters. It uses a polynomial to approximate the original length equation, but I don't know how the formula used by the author is derived (if you know it, please let me know, Thank you ).
/// This function calculates the distance from (0, 0) to (x, y). The relative error is 3.5% // int fastdistance2d (int x, int Y) {x = ABS (x); y = ABS (y); int Mn = min (x, y); Return (x + y-(Mn> 1) -(Mn> 2) + (Mn> 4);} // The distance from the function compute (0, 0) to (x, y, z, the relative error is 8% // float fastdistance3d (float FX, float fy, float FZ) {int temp; int x, y, z; // make sure all values are positive x = int (FABS (FX) * 1024); y = int (FABS (FY) * 1024); Z = int (FABS (Fz) * 1024); // sort if (Y <X) Swap (X, Y, temp) if (z <Y) Swap (Y, Z, temp) if (Y <X) Swap (X, Y, temp) int Dist = (Z + 11 * (Y> 5) + (x> 2 )); return (float) (Dist> 10 ));}
Another method is calledDistance estimates(Distance evaluation ?), As shown in:
The vertices on the positive octagonal side depicted by the red line are:
octagon(x,y) = min((1/√2) * (|x|+|y|), max(|x|,|y|))
Evaluate the lengths of the vectors V1 and V2, then:
√(x^2+y^2) = (|v1|+|v2|)/2 * octagon(x,y)
So far, we have been discussing the floating-point embedding algorithm. Next we will turn to the integer embedding algorithm. Some people may think that it has no significance to the Integer Data Request root, because it will get a result similar to 99 ^ (1/2) = 9. This is usually the case, but when we use a fixed number of points (the number of points is still applied to many systems, such as Nintendo's GBA handheld devices ), the attention algorithm of integers is very important. The algorithm for square expansion of integers is as follows. I am not going to discuss it here (the fact is that I am not careful about it, because I will not use--B in the short term), but you canJames uleryFind a very detailed derivation process.
//// For reading purposes, I added a line break in the macro definition below // # define step (shift) if (0x40000000l> shift) + sqrtval <= Val) {Val-= (0x40000000l> shift) + sqrtval; sqrtval = (sqrtval> 1) | (0x40000000l> shift);} else {sqrtval = sqrtval> 1 ;} //// calculate the square root of a 32-bit integer // int32 xxglusqrtfx (int32 Val) {// Note: This fast square root function // only works with an even q_factorint32 sqrtval = 0; STEP (0); Step (2); Step (4); Step (6); Step (8); Step (10); Step (12); Step (14 ); STEP (16); Step (18); Step (20); Step (22); Step (24); Step (26); Step (28); Step (30 ); if (sqrtval <Val) {++ sqrtval;} sqrtval <= (q_factor)/2; Return (sqrtval );}
SQRT has been extensively discussed in gamedev.net since 2003 (it can be seen that I am very Mars, and of course there are other people still in Pluto, ). The attempt to explore this topic is entirely out of my interest and curiosity (in other words, ignorance ). In fact, with the increase of FPU and the hardware support for vector operations, most systems provide quick SQRT implementation. If it is to process a large number of vectors, it is said that the fastest way is to use SIMD (it is said that it is only, I do not know), can be synchronized to calculate four vectors.