Reference manual:
Http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011Update/compiler_c/index.htm
Note: this series of articles is personal notes. If any of them is incorrect, please refer to the relevant official documents. If any errors are found, I will try to update and modify them as much as possible. In addition, the following content does not guarantee that all versions of the compiler are correct, and the implementation of the compiler may also have some changes. For details, refer to the official documentation.
For more information, see the additional description in http://blog.csdn.net/gengshenghong/article/details/7082448.
Note: here we will mainly discuss the "term" and understanding of the basic optimization technologies related to loops. (Updated continuously)
Loop is a very important part of the program, because in the optimization theory, there are a lot of research on loop optimization.
Refer:
Loop optimization (Wiki ):
Http://en.wikipedia.org/wiki/Loop_optimization
1. Loop unwinding/loop unrolling/loop unroll loop Expansion
Http://en.wikipedia.org/wiki/Loop_unrolling
Loop unwinding, also known as loop unrolling, is a loop transformation technique that attempts to optimize a program's execution speed at the expense of its binary size (space-time tradeoff ). the transformation can be undertaken manually by the programmer or
By an optimizing compiler.
In short, loop expansion is a type of loop conversion technology that increases the size of binary files to optimize the program execution speed. It can be done manually or by the compiler. Loop expansion aims to increase the program speed by reducing or eliminating the commands that control the loop.
Of course, there are many situations for loop expansion. Refer to Wikipedia to understand the details. The simplest case of loop expansion is the expansion of a static loop, as shown below:
int x;for (x = 0; x < 100; x++){ delete(x);}
One possible expansion is:
int x; for (x = 0; x < 100; x+=5){ delete(x); delete(x+1); delete(x+2); delete(x+3); delete(x+4);}
The code above is easy to understand. The number of iterations of a loop is reduced from 100 to 20. Obviously, this will reduce the number of times that the cyclically controlled judgment (judgment on X) commands are executed, to improve performance. Of course, in fact, simply relying on the above loop expansion often cannot greatly improve the performance, and the above static loop iterations are only the simplest case, there are still many situations that require dynamic cycle expansion. In addition, loop expansion is a basic technology for loop optimization and often needs to cooperate with other optimizations. The process of vectoring often starts from the cycle.
2. sectioning/loop-sectioning/strip-mining/strip mining
Cyclic splitting?
Wiki did not find a page specifically about sectioning, which was mentioned in the previous loop optimization Wiki (http://en.wikipedia.org/wiki/Loop_optimization) as follows:
Loop-sectioning (also known as strip-mining) is a loop-Transformation Technique for enabling SIMD-encodings of loops and improving memory performance. this technique involves each vector operation being done for a size less than or equal to the maximum Vector
Length on a given vector machine.
A link related to this technology is provided below:
Strip Mining to optimize memory use on 32-bit intel architecture
Strip-mining (http://docs.oracle.com/cd/E19205-01/819-5262/aeugr/index.html)
To put it simply, sectioning is a loop conversion technology that enhances SIMD's circular coding and improves memory performance, so that each vector operation is completed with a length smaller than or equal to the maximum vector length of the given vector machine.
Refer to the above link to easily understand the process of memory optimization by this technology. Of course, the above link does not include examples of SIMD-related optimization.
3. Loop Interchange
Http://en.wikipedia.org/wiki/Loop_interchange
In compiler theory, loop interchange is the process of exchanging the order of two iteration variables.
In short, cyclic switching is the process of exchanging iteration variables of internal and external layers. It should be noted that not all situations can be directly exchanged cyclically, and data dependency needs to be determined.
The following example can easily understand the process of cyclic switching:
for(int i=0;i<M;i++) for(int j=0;j<N;j++) a[i][j]=i+j;for(int j=0;j<N;j++) for(int i=0;j<M;i++) a[i][j]=i+j;
Obviously, the above two loops are equivalent, but the performance is different. Which one is better? In most cases, we believe that the first loop is better, because the two-dimensional array is arranged consecutively, and the second loop may lead to more cache refresh times.
Blog related to circular exchange:
Http://www.cnblogs.com/bingsuixing/archive/2009/04/20/1440057.html
Http://www.cdblp.cn/paper/%E5%BE%AA%E7%8E%AF%E4%BA%A4%E6%8D%A2%E4%B8%8E%E9%80%92%E5%BD%92%E6%B6%88%E9%99%A4/6951.html
4. Loop Blocking
Loop Blocking is an optimization technique provided by the Intel compiler. It is a combination of strip mining and loop interchange.
Refer to Region.