# "Reprint" A murder caused by a sqrt function

Source: Internet
Author: User

Well, I admit I'm heading the party, but since you're here, take a serious look and make sure you get it.

We usually have some operations of data operations, need to call sqrt,exp,abs and other functions, then you have not thought: how do these functional systems are implemented? Take the most commonly used sqrt function, how does the system implement this frequently called function?

Although it is possible that you do not normally think about this problem, but is the so-called "cramming, unhappy also light", you "wrinkly, stratagem", this is not too simple, with a two-point method, in an interval, each take the square of the middle number to test, if large, then try the middle of the left interval; Take the middle number of the right interval to try it again. For example sqrt (16) results, you first try (0+16)/2=8,8*8=64,64 than 16, then move to the left, try (0+8)/2=4,4*4=16 just, you get the correct results sqrt (16) = 4. Then you rinsed the program:

Then look at the difference between the performance and the accuracy of the system function (where the time unit is not the second or the millisecond, but the CPU Tick, regardless of what the unit is, the uniformity is comparable)

It can be seen that the dichotomy is exactly the same as the system's method, but the performance is hundreds of times times worse. Why is there such a big difference? Is there any better way to do this? Don't.... Oh, by the way, remember the high number of lessons we had, once the teacher taught us "Newton iterative method to quickly find square root", or this method can help us, the specific steps are as follows:

Find the approximate value of the square root a: first guess a random approximation x, and then keep x equal to the average of X and a/x, iterate six or seven times after the value of x is quite accurate.
For example, I would like to ask what the root 2 equals. If I guessed the result to be 4, although the wrong is outrageous, but you can see the use of Newton iterative method after the value quickly approaching the root of the square root 2:
(4 + 2/4)/2 = 2.25
(2.25 + 2/2.25)/2 = 1.56944..
(1.56944..+ 2/1.56944 ...) /2 = 1.42189..
(1.42189..+ 2/1.42189 ...) /2 = 1.41423..
....
The principle of this algorithm is very simple, we just constantly use (x,f (x)) tangent to approximate the root of the equation x^2-a=0. Square root A is actually a positive real roots of x^2-a=0, the derivative of this function is 2x. That is, the tangent slope of the function at any point (x,f (x)) is 2x. So, x-f (x)/(2x) is an approximation that is closer than X. Substituting f (x) =x^2-a gets x (x^2-a)/(2x), i.e. (x+a/x)/2.

The relevant code is as follows:

`float Sqrtbynewton (float x) {float val = x;//final float last;//Save the last computed value Do{last = Val;val = (val + x/val)/2;} while (ABS (Val-last) > EPs); return Val;}`

Then we'll look at the performance test:

Wow, performance has improved a lot, but compared with the system function, there is still so big gap, this is why AH? Think of Ah, think for a long time still baffled its solution. Suddenly one day, I saw on the Internet a magical method, so there is today's article, nonsense not to say, look at the code first:

`float invsqrt (float x) {Float Xhalf = 0.5f*x;int i = * (int*) &x;//Get bits for floating VALUE i = 0x5f375a86-(I>&G T;1); Gives initial guess y0x = * (float*) &i; Convert bits back to Floatx = x* (1.5f-xhalf*x*x); Newton step, repeating increases accuracyx = x* (1.5f-xhalf*x*x); Newton step, repeating increases accuracyx = x* (1.5f-xhalf*x*x); Newton step, repeating increases Accuracyreturn 1/x;}`

And then we'll look at the performance test for the last time:

This is really a qualitative change, the result is even better than the system ... Elder brother really is shocked!!! Brother Vomiting blood!!! A function caused a murder!!! A bloodbath, a murder

Now you don't understand the "ghost function", why is it so fast? No hurry, first look at the following story:

QUAKE-III Arena (Thor Hammer 3) is one of the classic games of the 90 's. This series of games is not only a good picture and content, and even if the computer configuration is low, it can be extremely smooth operation. This is thanks to John Carmack, the developer of its 3D engine, Carmack. In fact, as early as in the early 90 DOS era, as long as the PC can make a small animation can be amazing, John Carmack launched the breaking Castle Wolfstein, and then re-excitation, Doom, doomii, Quake ... Every time I push the technology to the extreme. His 3D engine code is extremely efficient and is almost all about pressing a PC for every operational instruction. Ms Direct3D also had to listen to his comments, modified a lot of APIs.

Recently, Quake's developer id Software complied with the GPL, exposing QUAKE-III's original code, allowing the world to witness the original code of the Carmack legendary 3D engine. This is the original code of QUAKE-III:
http://www.fileshack.com/file.x?fid=7547

(Below is the official download URL, search "Quake3-1.32b-source.zip" can find a lot of Chinese pages. Ftp://ftp.idsoftware.com/idstuff/source/quake3-1.32b-source.zip)

We know that the lower the lower the function, the more frequent the call. The 3D engine is still a mathematical operation in the final analysis. So finding the lowest-level mathematical arithmetic function (in game/code/q_math.c) must be well-written. There are a lot of interesting functions, many of them are amazing, we can not learn a few years of time. This piece of code was found in GAME/CODE/Q_MATH.C. Its function is to take a number open squared and get down, tested this code is 4 times times faster than (float) (1.0/SQRT (x)):

`float q_rsqrt (float number) {Long i;float x2, y;const float threehalfs = 1.5f;x2 = number * 0.5f;y   = number;i   = * (long *) &y;   Evil floating point bit level hackingi   = 0x5f3759df-(i >> 1),//What the fuck?y   = * (float *) & ; I;y   = y * (Threehalfs-(x2 * y * y));//1st iteration//y   = y * (Threehalfs-(x2 * y * y));//2nd ITE Ration, this can is Removed#ifndef q3_vm#ifdef __linux__ assert (!isnan (y)); Bk010122-fpe? #endif #endifreturn y;}  `

The function returns 1/SQRT (x), which is more useful than sqrt (x) in image processing.
Notice that this function only uses a single iteration! (In fact, there is no use of overlapping, direct operation). Compile, experiment, this function not only works very well, but also 4 times times faster than the standard sqrt () function! You know, the compiler comes with the function, but after a rigorous careful compilation optimization Ah!
This concise function, the core, but also the most puzzling, is to mark "What the fuck?" A sentence
i = 0x5f3759df-(i >> 1);

Plus y = y * (Threehalfs-(x2 * y * y));
Two sentences to complete the radical operation! and notice that the core sentence is fixed-point shift operation, the speed is very fast! Especially in many RISC-structured CPUs without multiplication instructions, this is extremely efficient.

The principle of the algorithm is actually not complex, is Newton iterative method, with X-f (x)/F ' (x) to constantly approximation f (x) =a root.

Yes, the average square root is so iterative, but Carmack (quake3 author) real Cow B is the place where he chooses a mysterious constant 0x5f3759df to calculate that guess, which is the line we annotated, which is very close to 1/SQRT (n), So we need only 2 Newton iterations to achieve the precision we need. Well, if this is not NB, then look at:

Chris Lomont, a mathematician at Purdue University, looked at the fun and decided to study the mystery of the Carmack's guess. Lomont is also a cow, after careful study from the theory also deduced a best guess value, and carmack number very close, 0x5f37642f. Carmack, is he an alien?

The legend does not end here. Lomont calculated the results are very satisfied, so take their own calculated starting value and Carmack mysterious numbers to do the game, to see whose numbers can be faster and more accurate to obtain square root. The result is that Carmack won ... No one knows how Carmack found the number.

Finally Lomont anger, the use of violent methods a number of a number to try, and finally found a better than the Carmack number of a little less than the number, although in fact, the results of these two numbers are very similar, the violence of the figure is 0x5f375a86.

Lomont wrote the next paper, "Fast inverse Square Root". Paper:
Http://www.math.purdue.edu/~clomont/Math/Papers/2003/InvSqrt.pdf
Http://www.matrix67.com/data/InvSqrt.pdf

Reference: <ieee standard 754 for Binary floating-point arithmetic><fast inverse SQUARE root>

Finally, the most streamlined 1/sqrt () function is given:

`float invsqrt (float x) {Float Xhalf = 0.5f*x;int i = * (int*) &x;//Get bits for floating VALUE i = 0x5f375a86-(I>&G T;1); Gives initial guess y0x = * (float*) &i; Convert bits back to Floatx = x* (1.5f-xhalf*x*x); Newton step, repeating increases Accuracyreturn x;}  `

You can try the PC, 51, AVR, 430, ARM, the above compile and experiment, surprised its efficiency.

Two days ago there was a news, to the effect that Ryszard Sommefeldt a long time ago to see this kind of code (possibly from Quake III source code):

`float invsqrt (float x) {Float Xhalf = 0.5f*x;int i = * (int*) &x;i = 0x5f3759df-(i>>1); x = * (float*) &i;x = x* (1.5f-xhalf*x*x); return x;}`
He looked at the surprise of heaven, want to meet the senior man, but all the way down to find people, but also there are other people looking, although also did not find the source, but Chris Lomont wrote a paper (in PDF) parsing this code algorithm (with Newton's Meth OD, Newton's method; The more important part is how to find the magical 0x5f3759df in the second half.
PS. This function is important because it is often used in the 3D operation (part of the vector operation) to find the inverse of the square root, and if you use the most primitive sqrt () and then the countdown, the speed is about four times times slower than the above version. Xd
PS2. In their quest, someone referred to a document called MIT Hackmem, which was a note (hack memo) made by the 1970 's MIT strong, mostly algorithm, some of which were written by PDP-10 ASM, and a few were C C Ode (Someone has collated a list)

Well, the story ends here, I hope you can have a harvest:), I will also provide download source code, interested friends can run their own to try.

"Reprint" A murder caused by a sqrt function

Related Keywords:

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

## A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

• #### Sales Support

1 on 1 presale consultation

• #### After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

• Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.