Underlying algorithm efficiency of the square root sqrt () function

Source: Internet
Author: User
Tags float number
Although you may not have thought about this problem at ordinary times, it's not that easy to say that it's just like "getting a gun in front of a while, it's not that simple, we use the binary method to test the number of squares in the middle of an interval. if it is large, we will try the number in the middle of the left interval. if it is small, try again with the number in the middle of the right interval. For example, to obtain the sqrt (16) result, you first try (0 + 16)/864, 8 *, 64 is larger than 16,

We usually have some data operation operations and need to call sqrt, exp, abs and other functions. Have you ever thought about how these function systems are implemented? Take the most common sqrt function as an example. how can the system implement this frequently called function?

Although you may not have thought about this problem at ordinary times, it's not that easy to say that it's just like "getting a gun in front of a while, it's not that simple, we use the binary method to test the number of squares in the middle of an interval. if it is large, we will try the number in the middle of the left interval. if it is small, try again with the number in the middle of the right interval. For example, to obtain the result of sqrt (16), first try (0 + 16)/2 = 8*8 = 64, 64 is larger than 16, then shift to the left, try (0 + 8) /2 = 4, 4*4 = 16. you get the correct result sqrt (16) = 4. Then you write out the program in the following steps:

// Use the binary float SqrtByBisection (float n) {// if (n <0) return n; float mid, last; float low, up according to the processing you need; low = 0, up = n; mid = (low + up)/2; do {if (mid * mid> n) up = mid; else low = mid; last = mid; mid = (up + low)/2;} // precision control while (abs (mid-last)> eps); return mid ;}

Then let's take a look at the difference between the performance and precision of the system functions (the time unit is not the second or the millisecond, but the CPU Tick, no matter what the unit is, it will be comparable if it is unified ). The results of the bipartite method are exactly the same as those of the system method, but the performance is hundreds of times worse. Why is there such a big difference? Is there any better way for the system? Isn't it .... Oh, by the way, let's recall our previous high-number course. our teacher once taught us "Newton iterative method to quickly find the square root", or this method can help us. the specific steps are as follows.

Obtain the approximate value of root number a: First, just guess an approximate value of x, and then make x equal to the average of x and a/x. after six or seven iterations, the value of x is quite accurate. For example, I want to calculate the value of root 2. If I guess the result is 4, although the error is outrageous, you can see that after using the Newton iteration method, this value quickly approaches the root 2:

(       4  + 2/4        ) / 2 = 2.25 (     2.25 + 2/2.25     ) / 2 = 1.56944.. ( 1.56944..+ 2/1.56944..) / 2 = 1.42189.. ( 1.42189..+ 2/1.42189..) / 2 = 1.41423.. ....

The principle of this algorithm is very simple. We just constantly use the tangent of (x, f (x) to approach the root of the equation x ^ 2-a = 0. The root number a is actually a positive root of x ^ 2-a = 0, and the derivative of this function is 2x. That is to say, the tangent slope at any point (x, f (x) of the function is 2x. Then, x-f (x)/(2x) is an approximate value closer to x. Input f (x) = x ^ 2-a to get x-(x ^ 2-a)/(2x), that is, (x + a/x)/2.

The related code is as follows:

Float SqrtByNewton (float x) {// The final float val = x; // Save the previous calculated value float last; do {last = val; val = (val + x/val)/2;} while (abs (val-last)> eps); return val ;}

The performance of the Newton iteration method has improved a lot, but there is still such a big gap with the system functions. why? I think about it for a long time. Suddenly one day, I saw a magic method on the Internet, so I had this article today. I didn't talk much about it. read the code first:

float InvSqrt(float x){float xhalf = 0.5f*x;int i = *(int*)&x; // get bits for floating VALUE i = 0x5f375a86- (i>>1); // gives initial guess y0x = *(float*)&i; // convert bits BACK to floatx = x*(1.5f-xhalf*x*x); // Newton step, repeating increases accuracyx = x*(1.5f-xhalf*x*x); // Newton step, repeating increases accuracyx = x*(1.5f-xhalf*x*x); // Newton step, repeating increases accuracyreturn 1/x;}

This is really a qualitative change, and the result is even better than the system. Why is the speed so fast? Don't worry. let's take a look at the following story:

Quake-III Arena (Thunder Hammer 3) is one of the classic games in 1990s. This series of games not only have good pictures and content, but also can run smoothly even if the computer configuration is low. This is due to John Carmack, the developer of its 3D engine ). In fact, as early as the DOS era in early 1990s, when John Carmack launched Castle Wolfstein, which was so shocking as long as he could make a small animation on the PC, he then began to work on doom, doomII, quake... every time, the 3-D technology is pushed to the extreme. His 3D engine code is extremely efficient, and almost every computation command on a PC is being squeezed. At the beginning, MS Direct3D had to listen to his opinions and modified a lot of APIs.

Recently, QUAKE developer id software complies with the GPL protocol and discloses the original code of the QUAKE-III, so that the world has the honor to witness the original code of the legendary 3D engine of Carmack. This is the original QUAKE-III code: http://www.fileshack.com/file.x? Fid = 7547. We know that the more underlying functions are called, the more frequently. The 3D engine is still a mathematical operation. Then finding the underlying mathematical functions (in game/code/q_math.c) must be carefully written. There are a lot of interesting functions, many of which are amazing. we can't finish learning them for years. This code is found in game/code/q_math.c. The function of this code is to split a Data square and calculate it down. It is tested that this code is four times faster than (float) (1.0/sqrt (x:

float Q_rsqrt( float number ){long i;float x2, y;const float threehalfs = 1.5F;x2 = number * 0.5F;y   = number;i   = * ( long * ) &y;   // evil floating point bit level hackingi   = 0x5f3759df - ( i >> 1 ); // what the fuck?y   = * ( float * ) &i;y   = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration// y   = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed#ifndef Q3_VM#ifdef __linux__ assert( !isnan(y) ); // bk010122 - FPE?#endif#endifreturn y;}  

The function returns 1/sqrt (x), which is more useful in image processing than sqrt (x. Note that this function is only used for one generation! (In fact, there is no need for stacking and direct calculation ). Compilation and experiment. this function not only works well, but also works four times faster than the standard sqrt () function! You know, the built-in functions of the compiler are strictly and carefully compiled and optimized!

This concise function, the core and most confusing, is to mark "what the fuck ?" I = 0x5f3759df-(I> 1 );

Add y = y * (threehalfs-(x2 * y ));

The square operation is completed in two sentences! It is also noted that the core sentence is the fixed point shift operation, which is extremely fast! This is extremely efficient, especially on a lot of CPU CPUs with no multiplication instructions.

In fact, the principle of the algorithm is not complex, that is, the Newton iteration method. x-f (x)/f' (x) is used to constantly approach the root of f (x) =.

Yes, the square root of the square is usually calculated in this loop iteration, but the real cool B of carmark (author of quake3) is that he chose a mysterious constant 0x5f3759df to calculate the estimated value, that is, the line that we add the comment. the calculated value of the line is very close to 1/sqrt (n), so we only need two Newton iterations to achieve the precision we need. Well, if this is not NB, let's see:

Chris Lomont, a mathematician at the Pudu University, thought it was interesting after reading it. he decided to study the mysteries of this guess value produced by carmark. Lomont is also a cool man. after careful research, it theoretically exports an optimal guess value, which is very close to the number of carmark, 0x5f37642f. Kamakok, is he an alien?

The legend is not over here. Lomont was very satisfied with the computation results, so he played the game with his calculated starting value and the mysterious number of camark to see who can obtain the square root faster and more accurately. The result is that Mark won... no one knows how Mark found this number.

Finally, Lomont got angry. he tried to use the brute force method to get a number and finally found a number that is a little bit better than a card number. although the results produced by these two numbers are actually very similar, the brute force number is 0x5f375a86.

For this reason, Lomont wrote the next paper "Fast Inverse Square Root ". Thesis: http://www.math.purdue.edu /~ Clomont/Math/Papers/2003/InvSqrt.pdf, http://www.matrix67.com/data/invsqrt.htm.

Finally, the simplest 1/sqrt () function is provided:

float InvSqrt(float x){float xhalf = 0.5f*x;int i = *(int*)&x; // get bits for floating VALUE i = 0x5f375a86- (i>>1); // gives initial guess y0x = *(float*)&i; // convert bits BACK to floatx = x*(1.5f-xhalf*x*x); // Newton step, repeating increases accuracyreturn x;}  

You can try to compile and experiment on a PC, 51, AVR, 430, ARM, to be surprised at its efficiency.

There was a piece of news about Ryszard Sommefeldt a long time ago (probably from the source code of Quake III ):

float InvSqrt (float x) {float xhalf = 0.5f*x;int i = *(int*)&x;i = 0x5f3759df - (i>>1);x = *(float*)&i;x = x*(1.5f - xhalf*x*x);return x;}

At first glance, he was surprised by the fact that he wanted to visit the senior citizen, but he could not find anyone while searching for it. at the same time, others were looking for it, even though he could not find the source, however, Chris Lomont wrote a paper (in PDF) to parse the code algorithm (using the Newton's Method and the Newton Method; more importantly, the second half explains how to find the magic 0x5f3759df ).

PS. this function is important because it is often used in 3D operations (vector operations) to calculate the reciprocal of the root number. if you use the original sqrt () and then use the reciprocal, the speed is about four times slower than the previous version... XD

PS2. during their pursuit, some people mentioned a file called mit hackmem, which was written by the strong MIT in the 1970 s (hack memo), mostly algorithm, some code is written by the PDP-10 asm, and a few are C code (someone sorted out a list ).

Well, the story is over here. I hope you will have some gains :)

This article is available at http://www.nowamagic.net/librarys/veda/detail/184.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.