Sqrt Function Analysis

Source: Internet
Author: User
Tags float number

We usually have some data operation operations and need to call sqrt, exp, abs and other functions. Have you ever thought about how these function systems are implemented? Take the most common sqrt function as an example. How can the system implement this frequently called function?

Although you may not have thought about this problem at ordinary times, it's not that easy to say that it's just like "getting a gun in front of a while, it's not that simple, we use the binary method to test the number of squares in the middle of an interval. If it is large, we will try the number in the middle of the Left interval. If it is small, try again with the number in the middle of the right interval. For example, to obtain the result of sqrt (16), first try (0 + 16)/2 = 8*8 = 64, 64 is larger than 16, then shift to the left, try (0 + 8) /2 = 4, 4*4 = 16. You get the correct result sqrt (16) = 4. Then you write out the program in the following steps:

Float SqrtByBisection (float n) // use the binary method.
{
If (n <0) // if the value is less than 0, you need to process it as needed.
Return n;
Float mid, last;
Float low, up;
Low = 0, up = n;
Mid = (low + up)/2;
Do
{
If (mid * mid> n)
Up = mid;
Else
Low = mid;
Last = mid;
Mid = (up + low)/2;
} While (abs (mid-last)> eps); // precision control
Return mid;
} Then let's take a look at the difference between the performance and precision of the system functions (the time unit is not second or millisecond, but CPU Tick, no matter what the Unit is, it will be comparable if it is unified)


It can be seen that the results of the Bipartite method are exactly the same as those of the system method, but the performance is hundreds of times worse. Why is there such a big difference? Is there any better way for the system? Isn't it .... Oh, by the way, let's recall our previous high-number course. Our teacher once taught us "Newton iterative method to quickly find the square root", or this method can help us. The specific steps are as follows:

Obtain the approximate value of root number a: first, just guess an approximate value of x, and then make x equal to the average of x and a/x. After six or seven iterations, the value of x is quite accurate.
For example, I want to calculate the value of Root 2. If I guess the result is 4, although the error is outrageous, you can see that after using the Newton iteration method, this value quickly approaches the Root 2:
(4 + 2/4)/2 = 2.25
(2.25 + 2/2.25)/2 = 56944 ..
(1. 56944... + 2/1. 56944...)/2 = 1. 42189 ..
(1. 42189... + 2/1. 42189...)/2 = 1. 41423 ..
....
The principle of this algorithm is very simple. We just constantly use the tangent of (x, f (x) to approach the root of the equation x ^ 2-a = 0. The root number a is actually a positive root of x ^ 2-a = 0, and the derivative of this function is 2x. That is to say, the tangent slope at any point (x, f (x) of the function is 2x. Then, x-f (x)/(2x) is an approximate value closer to x. Input f (x) = x ^ 2-a to get x-(x ^ 2-a)/(2x), that is, (x + a/x)/2.

The related code is as follows:

Float SqrtByNewton (float x)
{
Float val = x; // The final
Float last; // Save the previous calculated value
Do
{
Last = val;
Val = (val + x/val)/2;
} While (abs (val-last)> eps );
Return val;
} Then let's take a look at the performance test:

 

Wow, the performance has improved a lot, but there is still such a big gap with system functions. Why? I think about it for a long time. Suddenly one day, I saw a magic method on the Internet, so I had this article today. I didn't talk much about it. Read the code first:

Float InvSqrt (float x)
{
Float xhalf = 0.5f * x;
Int I = * (int *) & x; // get bits for floating VALUE
I = 0x5f375a86-(I> 1); // gives initial guess y0
X = * (float *) & I; // convert bits BACK to float
X = x * (1.5f-xhalf * x); // Newton step, repeating increases accuracy
X = x * (1.5f-xhalf * x); // Newton step, repeating increases accuracy
X = x * (1.5f-xhalf * x); // Newton step, repeating increases accuracy

Return 1/x;
} Then let's take a look at the next performance test:

This is really a qualitative change, and the result is even better than the system... Brother is really shocked !!! Brother vomited blood !!! A function triggers a blood attack !!! Blood, blood...

Why is the speed so fast? Don't worry. Let's take a look at the following story:

Quake-III Arena (THUNDER hammer 3) is one of the classic games in 1990s. This series of games not only have good pictures and content, but also can run smoothly even if the computer configuration is low. This is due to John Carmack, the developer of its 3D engine ). In fact, as early as the DOS era in Early 1990s, when John Carmack launched Castle Wolfstein, which was so shocking as long as he could make a small animation on the PC, he then began to work on doom, doomII, quake... every time, the 3-D technology is pushed to the extreme. His 3D engine code is extremely efficient, and almost every computation command on a PC is being squeezed. At the beginning, MS Direct3D had to listen to his opinions and modified a lot of APIs.

Recently, QUAKE developer id software complies with the GPL protocol and discloses the original code of the QUAKE-III, so that the world has the honor to witness the original code of the legendary 3D engine of Carmack. This is the original code of the QUAKE-III:
Http://www.fileshack.com/file.x? Fid = 1, 7547

(The following is the official download URL. You can search “quake3-1.32b-source.zip to find a lot of Chinese webpages. Ftp://ftp.idsoftware.com/idstuff/source/quake3-1.32b-source.zip)

We know that the more underlying functions are called, the more frequently. The 3D engine is still a mathematical operation. Then finding the underlying mathematical functions (in game/code/q_math.c) must be carefully written. There are a lot of interesting functions, many of which are amazing. We can't finish learning them for years. This code is found in game/code/q_math.c. The function of this Code is to split a data square and calculate it down. It is tested that this code is four times faster than (float) (1.0/sqrt (x:

Float Q_rsqrt (float number)
{
Long I;
Float x2, y;
Const float threehalfs = 1.5F;

X2 = number * 0.5F;
Y = number;
I = * (long *) & y; // edevil floating point bit level hacking
I = 0x5f3759df-(I> 1); // what is the fuck?
Y = * (float *) & I;
Y = y * (threehalfs-(x2 * y); // 1st iteration
// Y = y * (threehalfs-(x2 * y); // 2nd iteration, this can be removed

# Ifndef Q3_VM
# Ifdef _ linux __
Assert (! Isnan (y); // bk010122-FPE?
# Endif
# Endif
Return y;
} The function returns 1/sqrt (x), which is more useful in image processing than sqrt (x.
Note that this function is only used for one generation! (In fact, there is no need for stacking and direct calculation ). Compilation and experiment. This function not only works well, but also works four times faster than the standard sqrt () function! You know, the built-in functions of the compiler are strictly and carefully compiled and optimized!
This concise function, the core and most confusing, is to mark "what the fuck ?" A sentence
I = 0x5f3759df-(I> 1 );

Add y = y * (threehalfs-(x2 * y ));
The square operation is completed in two sentences! It is also noted that the core sentence is the fixed point shift operation, which is extremely fast! This is extremely efficient, especially on a lot of CPU CPUs with no multiplication instructions.

In fact, the principle of the algorithm is not complex, that is, the Newton Iteration Method. x-f (x)/F' (x) is used to constantly approach the root of f (x) =.

Yes, the square root of the square is usually calculated in this loop iteration, but the real cool B of carmark (author of quake3) is that he chose a mysterious constant 0x5f3759df to calculate the estimated value, that is, the line that we add the comment. The calculated value of the line is very close to 1/sqrt (n), so we only need two Newton iterations to achieve the precision we need. Well, if this is not NB, Let's see:

Chris Lomont, a mathematician at the Pudu University, thought it was interesting after reading it. He decided to study the mysteries of this guess value produced by carmark. Lomont is also a cool man. After careful research, it theoretically exports an optimal guess value, which is very close to the number of carmark, 0x5f37642f. Kamakok, is he an alien?

The legend is not over here. Lomont was very satisfied with the computation results, so he played the game with his calculated starting value and the mysterious number of camark to see who can obtain the square root faster and more accurately. The result is that Mark won... no one knows how Mark found this number.

Finally, Lomont got angry. He tried to use the brute force method to get a number and finally found a number that is a little bit better than a card number. Although the results produced by these two numbers are actually very similar, the brute force number is 0x5f375a86.

For this reason, Lomont wrote the next paper "Fast Inverse Square Root ". Thesis:
Http://www.math.purdue.edu /~ Clomont/Math/Papers/2003/InvSqrt.pdf
Http://www.matrix67.com/data/InvSqrt.pdf

Reference: <IEEE Standard 754 for Binary Floating-Point Arithmetic> <fast inverse square root>

Finally, the simplest 1/sqrt () function is provided:

Float InvSqrt (float x)
{
Float xhalf = 0.5f * x;
Int I = * (int *) & x; // get bits for floating VALUE
I = 0x5f375a86-(I> 1); // gives initial guess y0
X = * (float *) & I; // convert bits BACK to float
X = x * (1.5f-xhalf * x); // Newton step, repeating increases accuracy
Return x;
} You can try to compile and experiment on a PC, 51, AVR, 430, ARM, and above to be surprised at its efficiency.

There was a piece of news about Ryszard Sommefeldt a long time ago (probably from the source code of Quake III ):

Float InvSqrt (float x)
{
Float xhalf = 0.5f * x;
Int I = * (int *) & x;
I = 0x5f3759df-(I> 1 );
X = * (float *) & I;
X = x * (1.5f-xhalf * x );
Return x;
}
 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.