C \ C ++ code optimization suggestions

Source: Internet
Author: User

C \ C ++ code optimization suggestions

First, we need to clarify that we should not optimize the code too early. This is the advice of many masters.

The code must be correct before optimization.
This does not mean writing a full-featured ray tracing algorithm in 8 weeks, and then optimizing it in 8 weeks.
Perform Performance Optimization in multiple steps.
First, write the correct code. When you realize that this function may be called frequently, you can perform obvious optimization.
Then, find the algorithm bottleneck and solve it (by optimizing or improving the algorithm ). In general, improving algorithms can significantly improve bottlenecks-maybe using a method you haven't come up with before. Optimization is required for all frequently called functions.

People I know who write very efficient code say they have optimized the code twice as long as they write code.

Redirection and branch execution are costly. If possible, use as few as possible..
Function calls require two jumps, plus stack memory operations.
Iteration rather than recursion is preferred.
Use inline functions to process short functions to eliminate function call overhead.
Move the function calls in the loop out of the loop (for example, change for (I = 0; I <100; I ++) DoSomething (); to DoSomething () {for (I = 0; I <100; I ++ ){... }}).
If... Else if... Else if... Else if... The execution of a long branch chain to the final branch requires many jumps. If possible, it is converted into a switch declaration statement, and the compiler sometimes converts it into a single jump for a table query. If the switch statement is not feasible, place the most common scenario at the beginning of the if branch chain.

Think carefully about the order of the lower mark of the function.
Arrays of two or higher levels are stored in memory in one-dimensional mode, which means (for arrays of C/C ++) array [I] [j] and array [I] [j + 1] are adjacent, however, array [I] [j] and array [I + 1] [j] may be far apart.
Access the data stored in the actual memory in an appropriate way can significantly improve the efficiency of your code execution (sometimes it can increase by an order of magnitude or more ).
A Modern processor loads more data than a single value from the primary memory to the processor cache. This operation will obtain the entire block of request data and adjacent data (the size of a cache row. This means that, once array [I] [j] is already in the processor cache, array [I] [j + 1] is probably already in the cache, array [I + 1] [j] may still be in the memory.

Avoid or reduce the use of local variables.
Local variables are usually stored on the stack. However, if the number is small, they can be stored in the CPU register. In this case, the function not only provides faster access to the data stored in the register, but also avoids the overhead of initializing a stack frame.
Do not convert large amounts of data into global variables.

Reduce the number of function parameters.
The same reason as reducing the use of local variables-they are also stored on the stack.

Pass struct by reference instead of passing values
In ray tracing, I still cannot find a scenario where the struct needs to be passed through the value transfer method (including some simple structures such as Vector, Point, and Color ).

If your function does not require a return value, do not define one.

Avoid data conversion whenever possible.
The integer and floating-point commands usually operate on different registers, so a copy operation is required for conversion.
A short INTEGER (char and short) still uses an entire register, and they need to be filled with 32/64 bits, and then need to be converted to a small byte again when stored back to memory (however, this overhead must be a little more than the memory overhead of a larger data type ).

Pay attention to the definition of C ++ objects.
Use class initialization instead of assigning values (Color c (black); faster than Color c; c = black)

Make the class constructor as lightweight as possible.
Especially common simple types (such as color, vector, point, etc.), these classes are often copied.
These Default constructors are generally executed implicitly, which may not be what you expect.
Use class initialization list (Use Color: Color (): r (0), g (0), B (0) {}, instead of initializing function Color: Color () {r = g = B = 0 ;}.)

If yes, use the displacement operation >>and <<to replace integer multiplication and division.

Be careful when using table lookup Functions
Many people encourage the conversion of complex functions (such as trigonometric functions) into tables that are pre-compiled. For ray tracing, this usually leads to unnecessary memory lookup, which is expensive (and continues to grow ), and this is as fast as calculating a trigonometric function and getting a value from the memory (especially when the triangular lookup disrupts the cache access of the cpu ).
In other cases, table search is useful. For GPU programming, table lookup is preferred instead of complex functions.

For most classes, + =,-=,= And/=, instead of using + ,-,, And? /
These simple operations require creating an anonymous temporary intermediate variable.
Example: Vector v = Vector (, 0) + Vector (, 0) + Vector (, 1 );? Five temporary anonymous vectors are created: Vector (, 0), Vector (, 0), Vector (, 1), Vector (, 0) + Vector, 0), and Vector (, 0) + Vector (, 0) + Vector (, 1 ).
Simple conversion of the above Code: Vector v (, 0); v + = Vector (, 0); v + = Vector (, 1 ); only two temporary vectors are created: Vector (0, 1) and Vector (0, 0, 1 ). This saves 6 function calls (3 constructor and 3 destructor ).

+? Is preferred for basic data types? ,? -? ,?? ,? And? /, Instead of + =? ,? -=? ,?= And/=

Deferred definition of local variables
To define an object variable, you usually need to call the struct function (constructor ).
If a variable is only required in some cases (for example, in an if declaration statement), it is defined only when it is needed. In this way, the constructor is called only when it is used.

For objects, use the prefix operator (++ obj) instead of the suffix operator (obj ++)
This may not be a problem in your ray tracing algorithm.
Objects need to be copied once using the suffix operator (this also leads to additional constructor and destructor calls), while the prefix constructor does not need a temporary copy.

Use template with caution
Different instance implementations are optimized.
The standard template library has been well optimized, but I suggest you avoid using it when implementing an interactive ray tracing algorithm.
With your own implementation, you know how to use the algorithm, so you know how to implement it most effectively.
Most importantly, my experience tells me that debugging STL libraries is very inefficient. This is usually not a problem unless you use the debug version for performance analysis. You will find that STL constructor, iterator, and other operations occupy 15% of your running time, which will make your analysis performance output more difficult.

Avoid dynamic memory allocation during computing
Dynamic Memory is useful for storage scenarios and other data during runtime.
However, in many (MOST) systems, the dynamic memory allocation requires obtaining the lock that controls the access distributor. In reality, the use of dynamic memory for multi-threaded applications results in performance degradation due to additional processors, because the need to wait for the distributor lock and release the memory.
Even for single-threaded applications, allocating memory on the stack is much more costly than allocating memory on the stack. The operating system also needs to perform some operations to calculate and find suitable memory blocks.

Find the information about your system memory cache and use them
If a data structure is suitable for a cache row, only one retrieval operation is required to process the entire class from the memory.
Make sure that all data structures are aligned with the size of the cache row (If your data structure and the size of a cache row are both 128 bytes, it is still possible that one byte in your struct is in one cache row, while the other 127 byte is in another cahce row ).

Avoid unnecessary data Initialization
If you need to initialize a large segment of memory, consider using memset.

End the loop as soon as possible and return function calls as soon as possible
Consider the intersection of a ray and a triangle. Generally, the ray will cross the triangle, so we can optimize it here.
If you decide to cross the ray and the triangle panel. If the X-ray and panel crossover values are negative, you can return immediately. This allows you to skip the centroid Coordinate Calculation of more than half of the X-ray triangle cross. This is a big savings. Once you know that this cross does not exist, you should immediately return the cross computing function.
Similarly, some cycles should end as soon as possible. For example, when a shadow Ray is set, it is usually not necessary for near cross. Once a similar cross exists, the crossover calculation should return as soon as possible. (The cross meanings here are not quite clear. They may be professional words)

Simplify your equation on the paper
In many equations, computation can be canceled or in some conditions.
The compiler cannot find this simplification, but you can. Canceling some expensive operations on an internal loop can offset your optimization efforts in other places for several days.

The mathematical differences between integers, fixed points, 32-bit floating points, and 64-bit double-precision numbers are not as big as you think
In modern CPUs, floating-point operations and integer operations have almost the same efficiency. In computing-intensive applications (such as ray tracing), this means that the overhead difference between integer and floating point computing can be ignored. That is to say, you do not need to optimize the arithmetic integer.
Double-precision floating-point operations are not slower than single-precision floating-point operations, especially on 64-bit machines. I tested the ray tracing algorithm on the same machine and used double to run faster than floats. In turn, the test also saw the same phenomenon (the original article here is: I have seen ray tracers run faster using all doubles than all floats on the same machine. I have also seen the reverse ).

Constantly Improve Your mathematical computing to eliminate expensive operations
Sqrt () can often be optimized, especially when comparing the square root of two values.
If you need to process the Division x Operation repeatedly, consider calculating the value of 1/x and multiplying it. This has made significant improvements in vector normalization (3 Division) operations, but I recently found it a bit hard to determine. However, this still improves if you want to perform three or more Division operations.
If you are executing a loop, those parts that do not change in the loop must be extracted to the external part of the loop.
Consider checking whether your calculated value can be modified in a loop (instead of restarting cyclic computing every time ).

The next blog will provide a more detailed explanation of each article. Coming!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.