A little summary of the optimization

Source: Internet
Author: User
Tags arithmetic

Memory aliasing (Memory alias usage)

The compiler must assume that different pointers may point to the same location in memory. This has resulted in a major impediment to optimization.

For example:

void Twiddle1 (int *xp, int *yp)

{

*xp + = *YP;

*xp + = *YP;

}

void Twiddle2 (int *xp, int *yp)

{

*XP = 2 * (*YP);

}

It looks like Twiddle1 and Twiddle2 are doing the same thing, and Twiddle2 should be Twiddle1 's optimized version, because twiddle2 only needs to access XP and YP once, and Twiddle1 uses it two times.

However, consider one of the following:

int t;

Twiddle1 (&t, &t);

Twiddle2 (&t, &t);

Do they get the same result, no. This is the problem that the compiler must take into account when doing optimizations, and the compiler does not fully understand the programmer's intentions, so it can only be conservatively optimized for the current code.

The same examples include:

int counter = 0;

int f (int x)

{

return (COUNTER+X);

}

int F1 (int x)

{

return f (x) +f (x);

}

int f2 (int x)

{

return 2*f (x);

}

how to represent the performance of a program-cpe/

Cpe:cycles per element, the number of concurrent periods for each of the elements.

How to understand, such as an array int array[50], which is used in the function f () to calculate, the last F () used to go to the CPU clock is 100, then the function f () of the CPE is 100/50=2.0;

Why not use the number of cycles per cycle instead of the number of periods for each element, as there may be cyclic expansion.

How to eliminate the inefficient cycle

Ø reduce unnecessary function calls:

The Vec_length in the for (i = 0; i < Vec_length (v) i++) can be placed entirely outside the loop body.

Ø Eliminate unnecessary memory references

Such as

for (i = 0; i < length; i++)

{

*dest = *dest + data[i];

}

You can use local variables to perform operations before assigning values to dest.

Ø for loop expansion, let the software flow

Note that the prerequisite for software to flow is that there is no judgment statement in the loop body, and an if is not. So to get the code to execute faster, try to move the judgment statement out of the loop.

reduce the overhead of floating-point operations

In general, the speed at which a processor makes a fixed-point operation is likened to a floating-point operation (exception for a dedicated floating-point processor). As in DM642, the speed of fixed-point arithmetic is 10 times times the speed of floating-point arithmetic. Therefore, in the processing of floating-point numbers, the first conversion to a fixed point after the operation of the re-assignment will achieve a great performance improvement.

whether the array should be converted to pointer code

Based on experience, the array code will be preferable to the pointer code, and we have seen compilers that apply very advanced optimizations to the group code, while only minimal optimizations are applied to the pointer code. And the array code is more readable.

Rational Use of the cache

For DM642, the size of the L2 cache can be set, but if the cache is fully open, all L2 SRAM will be exhausted. This makes it impossible to use a technique such as DMA-buffer. But if it doesn't open at all, it has a very big impact on how fast the program runs.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.