First declare: This article for the network reprint, address: http://www.cnblogs.com/pkuoliver/archive/2010/10/06/sotry-about-sqrt.html
========================================================== ========================
Source code: http://blog.redfox66.com/post/story-about-sqrt.aspx
Okay, I admit it to my title party, but now that you are here, let's take a look at it and make sure you have gains.
We usually have some data operation operations and need to call SQRT, exp, abs and other functions. Have you ever thought about how these function systems are implemented? Take the most common SQRT function as an example. How can the system implement this frequently called function?
Although you may not have thought about this problem at ordinary times, it's not that easy to say that it's just like "getting a gun in front of a while, it's not that simple, we use the binary method to test the number of squares in the middle of an interval. If it is large, we will try the number in the middle of the Left interval. If it is small, try again with the number in the middle of the right interval. For example, to obtain the result of SQRT (16), first try (0 + 16)/2 = 8*8 = 64, 64 is larger than 16, then shift to the left, try (0 + 8) /2 = 4, 4*4 = 16. You get the correct result SQRT (16) = 4. Then you write out the program in the following steps:
Float sqrtbybisection (float N) // use the binary method.
{
If (n <0) // if the value is less than 0, you need to process it as needed.
Return N;
Float mid, last;
Float low, up;
Low = 0, up = N;
Mid = (low + up)/2;
Do
{
If (mid * mid> N)
Up = mid;
Else
Low = mid;
Last = mid;
Mid = (up + low)/2;
} While (ABS (mid-last)> EPS); // precision control
Return mid;
}
Then let's take a look at the difference between the performance and precision of the system functions (the time unit is not a second or a millisecond, but a CPU tick, no matter what the Unit is, it will be comparable if it is unified)
It can be seen that the results of the Bipartite method are exactly the same as those of the system method, but the performance is hundreds of times worse. Why is there such a big difference? Is there any better way for the system? Isn't it .... Oh, by the way, let's recall our previous high-number course. Our teacher once taught us "Newton iterative method to quickly find the square root", or this method can help us. The specific steps are as follows:
Obtain the approximate value of root number A: first, just guess an approximate value of X, and then make X equal to the average of x and A/X. After six or seven iterations, the value of X is quite accurate.
For example, I want to calculate the value of Root 2. If I guess the result is 4, although the error is outrageous, you can see that after using the Newton iteration method, this value quickly approaches the Root 2:
(4
+ 2/4)/2 = 2.25
(2.25 + 2/2.25)/2 = 56944 ..
(
1. 56944... + 2/1. 56944...)/2 = 1. 42189 ..
(1. 42189 .. + 2/1. 42189 ..)/2 =
1. 41423 ..
....
The principle of this algorithm is very simple. We just constantly use the tangent of (x, f (x) to approach the root of the equation x ^ 2-A = 0. The root number A is actually a positive root of x ^ 2-A = 0, and the derivative of this function is 2X. That is to say, the tangent slope at any point (x, f (x) of the function is 2X. Then, X-f (x)/(2x) is an approximate value closer to X. Substitution
F (x) = x ^ 2-A to get x-(x ^ 2-a)/(2x), that is, (x + A/X)/2.
The related code is as follows:
Float sqrtbynewton (float X)
{
Float val = x; // The final
Float last; // Save the previous calculated value
Do
{
Last = val;
Val = (Val + x/Val)/2;
} While (ABS (Val-last)> EPS );
Return val;
}
Then let's take a look at the performance test:
Wow, the performance has improved a lot, but there is still such a big gap with system functions. Why? I think about it for a long time. Suddenly one day, I saw a magic method on the Internet, so I had this article today. I didn't talk much about it. Read the code first:
float InvSqrt(float x)
{
float xhalf = 0.5f*x;
int i = *(int*)&x; // get bits for floating VALUE
i = 0x5f375a86- (i>>1); // gives initial guess y0
x = *(float*)&i; // convert bits BACK to float
x = x*(1.5f-xhalf*x*x); // Newton step, repeating increases accuracy
x = x*(1.5f-xhalf*x*x); // Newton step, repeating increases accuracy
x = x*(1.5f-xhalf*x*x); // Newton step, repeating increases accuracy
return 1/x;
}
Next, let's take a look at the performance test:
This is really a qualitative change, and the result is even better than the system... Brother is really shocked !!! Brother vomited blood !!! A function triggers a blood attack !!! Blood, blood...
Why is the speed so fast? Don't worry. Let's take a look at the following story:
Quake-III arena
(Raytheon hammer 3) is one of the classic games in 1990s. This series of games not only have good pictures and content, but also can run smoothly even if the computer configuration is low. This is due to the developer of its 3D engine, John camark (John
Carmack ). In fact, as early as the DOS era in Early 1990s, John Carmack launched the amazing castle when he could make a small animation on a PC.
Wolfstein, then encourage, doom, doomii,
Quake... pushes the 3-D technology to its extreme every time. His 3D engine code is extremely efficient, and almost every computation command on a PC is being squeezed. At the beginning, Ms direct3d had to listen to his opinions and modified a lot of APIs.
Recently, quake developer id software
Comply with the GPL protocol, open the original code of the QUAKE-III, let the world have the honor to witness the original code of the legendary 3D engine Carmack. This is the original code of the QUAKE-III:
Http://www.fileshack.com/file.x? FID = 1, 7547
(The following is the official download URL. You can search “quake3-1.32b-source.zip to find a lot of Chinese webpages. Ftp://ftp.idsoftware.com/idstuff/source/quake3-1.32b-source.zip)
We know that the more underlying functions are called, the more frequently. The 3D engine is still a mathematical operation. Then find the underlying mathematical functions (in game/code/q_math.c ),
It must be carefully written. There are a lot of interesting functions, many of which are amazing. We can't finish learning them for years. This code is found in game/code/q_math.c. The function of this Code is to split a data square and calculate it down. It is tested that this code is four times faster than (float) (1.0/SQRT (x:
float Q_rsqrt( float number )
{
long i;
float x2, y;
const float threehalfs = 1.5F;
x2 = number * 0.5F;
y = number;
i = * ( long * ) &y; // evil floating point bit level hacking
i = 0x5f3759df - ( i >> 1 ); // what the fuck?
y = * ( float * ) &i;
y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
#ifndef Q3_VM
#ifdef __linux__
assert( !isnan(y) ); // bk010122 - FPE?
#endif
#endif
return y;
}
The function returns 1/SQRT (x), which is more useful in image processing than SQRT (X.
Note that this function is only used for one generation! (In fact, there is no need for stacking and direct calculation ). Compilation and experiment. This function not only works well, but also works four times faster than the standard SQRT () function! You know, the built-in functions of the compiler are strictly and carefully compiled and optimized!
This concise function, the core and most confusing, is to mark "What the fuck ?" A sentence
I = 0x5f3759df-(I
> 1 );
Add y = y * (threehalfs-(X2 * y ));
The square operation is completed in two sentences! It is also noted that the core sentence is the fixed point shift operation, which is extremely fast! This is extremely efficient, especially on a lot of CPU CPUs with no multiplication instructions.
In fact, the principle of the algorithm is not complex, that is, the Newton Iteration Method. X-f (x)/F' (X) is used to constantly approach the root of f (x) =.
That's right. The square root of the square is calculated in such a loop iteration. But where kake3 is really cool, he chose a mysterious constant 0x5f3759df.
To calculate the estimated value, that is, the line that we add the comment, the calculated value of that line is very close to 1/SQRT (N ), in this way, we only need two Newton iterations to achieve the precision we need. Well, if this is not Nb, Let's see:
Chris, mathematician at Pudu University
Lomont found it interesting after reading it. He decided to study the mysteries of this guess value produced by carmark. Lomont is also a cool man. After careful research, it theoretically exports an optimal guess value, which is very close to the number of camark,
0x5f37642f. Kamakok, is he an alien?
The legend is not over here. Lomont was very satisfied with the computation results, so he played the game with his calculated starting value and the mysterious number of camark to see who can obtain the square root faster and more accurately. The result is that Mark won...
No one knows how Mark found this number.
Finally, lomont got angry. He tried to use the brute force method to get a number and finally found a number that is a little bit better than a card number. Although the results produced by these two numbers are actually very similar, the brute force number is 0x5f375a86.
For this reason, lomont wrote the next paper "Fast inverse square root ". Thesis:
Http://www.math.purdue.edu /~ Clomont/Math/papers/2003/invsqrt.pdf
Http://www.matrix67.com/data/InvSqrt.pdf
Reference: <IEEE Standard 754 for binary floating-point Arithmetic> <fast
Inverse square root>
Finally, the simplest 1/SQRT () function is provided:
float InvSqrt(float x)
{
float xhalf = 0.5f*x;
int i = *(int*)&x; // get bits for floating VALUE
i = 0x5f375a86- (i>>1); // gives initial guess y0
x = *(float*)&i; // convert bits BACK to float
x = x*(1.5f-xhalf*x*x); // Newton step, repeating increases accuracy
return x;
}
You can try to compile and experiment on a PC, 51, AVR, 430, arm, to be surprised at its efficiency.
There was a piece of news two days ago, to the point that Ryszard sommefeldt saw such a piece of code long ago (probably from the source of quake III
Code ):
float InvSqrt (float x)
{
float xhalf = 0.5f*x;
int i = *(int*)&x;
i = 0x5f3759df - (i>>1);
x = *(float*)&i;
x = x*(1.5f - xhalf*x*x);
return x;
}
At first glance, he was surprised by the fact that he wanted to visit the senior citizen, but he could not find anyone while searching for it. At the same time, others were looking for it, even though he could not find the source, but Chris lomont
I wrote a thesis (in PDF) to parse this code algorithm (using the Newton's method, the Newton method; more importantly, the second half describes how to find out the magic
0x5f3759df ).
PS. This function is important because finding the reciprocal of the root number is in 3D (the part of vector operation)
It is often used in it. If you use the original SQRT () and then use the countdown, the speed is about four times slower than the previous version... XD
Ps2.
During their pursuit, some people mentioned a file called mit hackmem, which was written by the strong MIT in the 1970 s (hack memo), mostly
Algorithm, some code is written by PDP-10 ASM, and a few are C code (someone sorted out a list)
Well, the story is over here. I hope you will have some gains :). I have also provided the source code for download. If you are interested, you can try it on your own.