[Turn] John Carmack's fast inverse square root algorithm

Source: Internet
Author: User
Tags square root

Talk about John Carmack's fast inverse square root algorithm the original address http://213style.blogspot.com/2014/07/john-carmack.html This topic is very simple, how to quickly calculate the inverse square root 1. Inverse square root function, but how to quickly calculate this formula?

The inverse square root movement occupies an important position in the 3D graph area, mainly used for calculating the light source and reflection, and the calculation process

It is necessary to calculate the normalized Vector, which requires an inverse square root calculation, because there is no special hard body to handle directly before 1990. t&l, until the famous GeForceTo change, in 3D games, some weapons that can consistent through multiple enemies also need to be used to calculate the inverse square root, so it is not only the part that is related to the drawing, but also some of the graph-related algorithms that are used, so in most 3D games there is a unique inverse square-root function. In the numerical calculation domain, the inverse square root calculation can get the value of square root, for example, in 8051 of this type can only calculate the entire number of the micro-processor, to calculate the square root will use the inverse square root, because the inverse square root of the existence of fast algorithm, the most famous is in the Quake III inside the q_rsqrtfunction, this function is very famous, the large non-square root of the calculation is directly paste the original code of this function directly into their own program, and then in the need to modify, for example, to be used in 8051 and need to add a fixed point of the technique, now shows that the Carnet of the hard anti-square root is q_rsqrt The result of the algorithm is hard, so let's take a look at the original code in Quake III q_rsqrt: Figure 2. Located in Quake III q_rsqrtThe Chinese part of the function's original code is added below. The entire function contains two parts:The first part, which is calculated from the whole number to an initial value that is close to the solution second, using Newton-raphson method to increase the precision of the solution The essence of the inner algorithm of the function is to obtain the initial value of the solution in the first part with the integer calculation, in the second part, there is nothing to learn the value of the method of reading should be very easy to understand, but for the students who have not learned, after the first part of the lecture, will slightly mention the Newton-raphson solution   , because the second part is not heavy, understanding the first part is the point, the first part of the technique involves the need to understand the floating point of the storage structure and the table, so first look at how the floating point is stored in the computer. Map 3. A storage structure for the floating point number of n bits 4. A floating point chart of Figure 3. It shows how a floating point is stored in memory, sometimes called floating-point bits level, and you can see a clue that the floating point is an integer, or that each floating point has a unique integer map of the graph, and it's 4. Explain how the floating point operator can read an integer as a floating point number, since the inverse square root cannot have a negative value so figure 4. s must be equal to 0, so it is not necessary to consider a floating point number of symbols, just before mentioned that the actual floating point is an integer, that is to say, if the figure 3.  The floating point storage structure to use the whole number of points to see that it is actually figure 5. The number of floats and the response to the integer is 5. There is always a unique relationship between a floating point and the number of responses, therefore, you can map the floating point number to the whole number of the response, from the whole point of view to run the calculation, to perform this technique, first with the floating point of the two Binary logarithm, that is, the two sides take 2 as the base log function: Map 6. Take log2 on both sides, and both Y and X are in Figure 4. The equation for Figure 6. Both Y and X are disassembled as mantissa and exponent, and the rest of the work is to figure out the relationship between Y and X from the whole, so you have to figure out the log2 before you can apply the diagram 5. The equation, and Figure 4. In the floating point of the expression in the form of M is a value greater than 0 less than 1, so figure 6. To take the log2 off, you can use an approximate formula.
Map 7. The approximate binary logarithm formula is shown in Figure 7. As shown, using the approximate Binary logarithm formula, you can map 6. The log function is taken away and replaced with an approximate formula, this approximation of the step is very important, through this step to be able to obtain the price of 1. The integer calculation of the equation, its figure 8. The complete push-through process is displayed. Map 8. Use the integer equation to approximate the inverse square root of the graph 8. The last formula is the most important conclusion of this algorithm, and the result of this whole calculation is that the inverse square root is used in the form of an integer, so the final conclusion can be used in Figure 9. To explain. Map 9. Finally, the final explanation of Figure 1. In the mysterious conversion of the reason, you can see that the method of the Iy back to the floating point of view, it is the approximate solution of this approximate solution is the best initial value, the latter application of the method is only to increase the accuracy of the solution is shown in Figure 9. Finally, to understand figure 1. The value of 5F3759DF in 16 is simple, as long as the number of B, L and Sigma into the graph 9. The inverse square root approximation integer algorithm will get this magic number. Map 10.The mysterious 0X5F3759DF magic numbers are actually just 32-bit floats.The result of substituting the inverse square root integer algorithm, and the Sigma can adjust the number of magic numbers may be slightly changed figure 10. Explain the John Carmack fast inverse square root algorithm in the origin of the Magic number, because in the next is directly with the N-bit floating point, the reader can also find a double floating value will be how many algorithms in the back part of the algorithm mentioned earlier, there is nothing worth studying In part, the method of increasing the precision of the solution by means of the methods the reader should be able to understand, because the integer algorithm is only an approximate solution, so it is necessary to use the method of the "trimming" the solution, the following gives the Newton method to push the resulting overlapping formula. Map 11. The part of solving the core of the iterative algorithm is presented by the method of the Newton, and a test program with a GUI interface is given below, and a little introduction is made to the efficiency added by the Tick measurement method. Map 12. Two ways to perform each100 Wanji with GetTickCountThe number of ticks measured from this GUI can be found that the x86 CPU is a "unstable" processor, that is, it is impossible to accurately measure the cycle of x86cpu, mainly because the x86 CPU is very complex, there are many fast prediction of the structure of the inside of the , so the same program will run fast and slow, using this program can be very clear that this phenomenon, on the x86 platform, Carmack fast inverse square root algorithm than the standard math program 1/SQRT (x) is calculated. maybe 1.5x ~ 4x, x86 CPU is very smart, if there have been some calculated values, the CPU mayDirectly using the value inside the cache instead of re-calculating it, it might be hard to understand what the CPU actually does, but from the test results it shows that the x86 CPU is not a stable processor, perhaps this algorithm can be taken to other platforms to try to see.
Map 13. Test results show that although x86 CPUs are not stable CPUs, the carmack approach is actually very fast, at least it's over 1.5x. The Dial widget in the program is a component of Qt itself, and with a rolling wheel the Dial widget can move the speed change, Carmack's fast inverse square root algorithm is at least 1.5x fasterproves why this algorithm is calledJohn Carmack's unusual Fast inverse square root Note: This article requires a lot more material, the main time spent in understanding the algorithm and the test program writing and read some of the related to the square root and the anti-square root of the text to read, not always  It's a write-up. Supplement: Now in the x86 inside the SSE instruction set, there isRsqrtss, the reader can also use this command to get the initial valueThis instruction is basically a fast inverse square root algorithm of the hard version, one can use the initial value of the method to increase the accuracy of the Newton, the reader is interested to try to see their own. Map 14. Use SSE directive rsqrtss to calculate the inverse square root, the inside of the combination of language to give the CL and GCC version of the language reader to test it will know that the use of RSQRTSS calculation speed does not get much advantage, because the SSE command in the end is to use the x86 inside the floating point calculation , but in the language is very simple, with the RSQRTSS can be directly obtained approximate solution, with the approximate solution to do the initial value of the method to increase the accuracy of the solution. No understanding of the x86 rigid structure can be software optimization guide for the AMD64 processorsappendix A microarchitecture for AMD Athlon D Opteron processors is not familiar with GCC and lazy to see English, you can refer to the Linux Device Driver programming driver design 7-8-line translation

[go] talk about John Carmack's fast inverse square root algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.