Compiling efficient C Programs and C code optimization based on the Program Design cornerstone and Practice Series

Source: Internet
Author: User
Tags processing instruction
Programming design cornerstones and practical series of writing efficient C programs and C code optimization
Although there are many effective guidelines for optimizing C code, there is still no substitute for a thorough understanding of the compiler and the machine you work on. Generally, speeding up the program will also increase the amount of code. These added codes will also affect the complexity and readability of a program, which is unacceptable. For example, you are programming on some small devices, such as mobile devices, PDAs, etc. These have strict memory restrictions, so The motto in optimization is: Write code in memory and speed should be optimized.

Integers
When we know that the number used cannot be negative, we should use unsigned int instead of int. When some processors handle integer arithmetic operations, unsigned int is faster than int. Therefore, an integer variable is defined in a tight loop. It's better to write code like this:

register unsigned int variable_name;
However, we cannot guarantee that the compiler will notice the register keyword. It is also possible that the unsigned is the same for some processors. These two keywords are not applicable in all compilers. Remember that integer arithmetic is much faster than floating-point arithmetic, because processors can perform integer arithmetic directly. Floating-point arithmetic depends on an external floating-point processor or floating-point math library. We need to be more precise when dealing with decimals (such as when we are doing a simple statistical program), to limit the result to 100, and to convert it to floating point numbers as late as possible. Division and Remainder
In a standard processor, depending on the numerator and denominator, a 32-bit division requires 20-140 clock cycles to complete, which is equal to a fixed time plus the time that each bit is divided.

Time (numerator / denominator) = C0 + C1 * log2 (numerator / denominator)
     = C0 + C1 * (log2 (numerator)-log2 (denominator)).

PS: numerator: numerator denominator: denominator
ARM processors need to consume 20 + 4.3N clock cycles. This is a very time-consuming operation and should be avoided as much as possible. In some cases, division expressions can be rewritten with multiplication expressions. For example, (a / b)> c can be written as a> (c * b), provided that we already know that b is non-negative and that b * c does not exceed the range of integer numbers. If we can determine that one of the operands is unsigned, then it would be better to use unsigned division, because it is much faster than signed division.
Combining division and remainder
In some cases, both division and remainder operations are required. In this case, the compiler combines the division and remainder operations because the division operation always returns both the quotient and the remainder. If both operations are used, we can write them together.

typedef unsigned int uint;
uint div32u (uint a) {
     return a / 32;
 }
int div32s (int a) {
     return a / 32;
}
Both of these divisions avoid calling division functions. In addition, unsigned divisions use fewer instructions than signed divisions. Signed division takes more time, because it divides the final result to zero, and the shift tends to negative infinity. An alternative for modulo arithmetic
We generally use the remainder operation for modulus, but sometimes it is possible to rewrite using an if statement. Consider the following two examples:

uint modulo_func1 (uint count)
{
   return (++ count% 60);
}

uint modulo_func2 (uint count)
{
   if (++ count> = 60)
  count = 0;
  return (count);
}
The second example is preferable to the first one, because the code generated by it will be faster. Note: This only works if the value of count is between 0-59.
But we can use the following code (I added) to achieve equivalent functions:

uint modulo_func3 (uint count)
{
    if (++ count> = 60)
        count% = 60;
    return (count);
}
Using array indices
Suppose you want to set the value of another variable to a specific character based on the value of one variable. You might do this:

switch (queue) {
case 0: letter = 'W';
   break;
case 1: letter = 'S';
   break;
case 2: letter = 'U';
   break;
}
Or this:
if (queue == 0)
  letter = 'W';
else if (queue == 1)
  letter = 'S';
else
  letter = 'U';
A concise and fast way is to simply index the value of a variable into a string, for example:
static char * classes = "WSU";
letter = classes [queue];
Global variables
Global variables are not allocated on registers. Modifying global variables needs to be done indirectly through pointers or function calls. So the compiler will not store global variables in registers, which will bring extra, unnecessary burden and storage space. So in the more critical loop, we do not need to use global variables.
If a function uses global variables frequently, we can use local variables as a copy of the global variables, so that we can use registers. The condition is that any child functions called by this function do not use these global variables.

for example:

int f (void);
int g (void);
int errs;
void test1 (void)
{
  errs + = f ();
  errs + = g ();
}

void test2 (void)
{
  int localerrs = errs;
  localerrs + = f ();
  localerrs + = g ();
  errs = localerrs;
}
It can be seen that each addition in test1 () needs to read and store the global variable errs, while in test2 (), the localerrs are allocated on the register and only one instruction is needed. Using Aliases
Consider the following example:

void func1 (int * data)
{
    int i;

    for (i = 0; i <10; i ++)
    {
          anyfunc (* data, i);
    }
}
Even if * data has never changed, the compiler does not know that anyfunc () has not modified it, so every time the program uses it, it must read it from memory. It may be an alias for some variables. Modified in other parts of the program. If we can be sure that it will not be changed, we can write:
void func1 (int * data)
{
    int i;
    int localdata;

    localdata = * data;
    for (i = 0; i <10; i ++)
    {
          anyfunc (localdata, i);
    }
}
This will give the compiler more choices for optimization. Live variables and spilling
The number of registers is fixed in each processor, so there is a limit to the number of variables that can be stored in a register at a particular location in the program. Some compilers support "live-range splitting", which means that variables can be allocated to different registers or memory in different parts of the function. The survival range of a variable is defined as: the starting point is a space allocation for the variable, and the end point is between the last use before the next space allocation. Within this range, the value of the variable is legal and live. Outside the scope of survival, variables are no longer used and are dead. Its registers can be used by other variables, so that the compiler can arrange more variables into the registers.
The number of registers that can be allocated to a register is equal to the number of variables that have overlapped life spans. If this number exceeds the number of available registers, some variables must be temporarily stored in memory. This process is called "spilling".
The compiler preferentially releases the least frequently used variables, minimizing the cost of releases. The "release" of variables can be avoided in the following ways:

Limit the maximum number of active variables: You can usually use simple and compact expressions without using too many variables inside the function. Splitting large functions into simpler, smaller functions may also help. Use the keyword register to modify the most frequently used variables: tell the compiler that this variable will be used often, and require the compiler to assign this variable to a register with a very high priority. However, in some cases, variables can still be leaked. Variable Types
The C compiler supports basic variable types: char, short, int, long (signed, unsigned), float, double. Defining the most appropriate type for a variable is very important, because it can reduce the length of code and data and can significantly improve efficiency.

Local variables
If possible, avoid using char and short for local variables. For char and short types, the compiler reduces the size of this local variable to 8 or 16 bits after each allocation of space. This is called sign extension for signed variables and unsigned extension for unsigned variables. This operation is achieved by shifting the register to the left by 24 or 16 bits, and then by the same number of bits (signed or unsigned) to the right. instruction).
These shift operations can be avoided by using local variables of int and unsigned int. This is especially important for those cases where the data is first transferred to a local variable and then manipulated using the local variable. Even if the data is input or output in 8-bit or 16-bit form, it still makes sense to treat them as 32-bit.
Let's consider the following three example functions:

int wordinc (int a)
{
   return a + 1;
}
short shortinc (short a)
{
    return a + 1;
}
char charinc (char a)
{
    return a + 1;
}
Their results are the same, but the first snippet runs faster than the others. Pointers
If possible, we should use a reference to the structure as a parameter, that is, a pointer to the structure, otherwise, the entire structure will be pushed onto the stack and then passed, which will reduce the speed. It may take several K bytes for the program to pass the value, and a simple pointer can also achieve the same purpose, which only needs a few bytes.
If the contents of the structure are not changed inside the function, then the parameters should be declared as const pointers. for example:

void print_data_of_a_structure (const Thestruct * data_pointer)
{
    ... printf contents of the structure ...
}
This example code tells the compiler that the contents of external structures will not be changed inside the function, and there is no need to reread when accessing them. It also ensures that the compiler captures any code that modifies this read-only structure, giving the structure additional protection. Pointer chains
Pointer chains are often used to access structure information. For example, the following common code:

typedef struct {int x, y, z;} Point3;
typedef struct {Point3 * pos, * direction;} Object;

void InitPos1 (Object * p)
{
   p-> pos-> x = 0;
   p-> pos-> y = 0;
   p-> pos-> z = 0;
}
In the code, the processor must reload p-> pos for each assignment operation, because the compiler does not know that p-> pos-> x is not an alias of p-> pos. A better approach is to cache p-> pos as a local variable, as follows:
void InitPos2 (Object * p)
{
   Point3 * pos
= p-> pos;
   pos-> x = 0;
   pos-> y = 0;
   pos-> z = 0;
}
Another possible method is to include the Point3 structure in the Object structure, avoiding the use of pointers completely. Conditional execution
Conditional execution is mainly used in if statements, but also complex expressions composed of relational operations (<, ==,>, etc.) or bool operations (&&,!, Etc.). It's good to keep the if and else statements as simple as possible so that they can be well conditional. Relational expressions should be broken into pieces containing similar conditions.
The following example demonstrates how the compiler uses conditional execution:

int g (int a, int b, int c, int d)
{
   if (a> 0 && b> 0 && c <0 && d <0)
   // grouped conditions tied up together //
      return a + b + c + d;
   return -1;
}
Conditions are grouped, so they can be conditionalized. Boolean expressions & range checking
A common boolean expression is used to check whether a variable takes a certain range, for example, to check whether a point is within a window.

bool PointInRectangelArea (Point p, Rectangle * r)
{
   return (p.x> = r-> xmin && p.x <r-> xmax &&
                      p.y> = r-> ymin && p.y <r-> ymax);
}
There is a faster way: (x> = min && x <max) is converted to (unsigned) (x-min) <(max-min). Especially when min is 0, it is more effective. Here is the optimized code:
bool PointInRectangelArea (Point p, Rectangle * r)
{
    return ((unsigned) (p.x-r-> xmin) <r-> xmax &&
   (unsigned) (p.y-r-> ymin) <r-> ymax);

}
Boolean expressions & compares with zero
After a compare (CMP) instruction, the corresponding processor flag is set. These flags can also be set by other instructions, such as MOV, ADD, AND, MUL, which are basic mathematical and logical operation instructions (data processing instructions). If a data processing instruction is to set these flags, the setting method of the N and Z flags is the same as the setting method of comparing the number with zero. The N flag indicates whether the result is negative, and the Z flag indicates whether the result is zero.
In C, the relational operator of the signed numbers corresponding to the N and Z flags in the processor is x <0, x> = 0, x == 0, x! = 0, and unsigned numbers correspond to x == 0, x! = 0 (or x> 0).
In C, each time a relational operator is used, the compiler generates a comparison instruction. If the relational operator is one of the above, the compiler will optimize the comparison instruction if the data processing instruction immediately follows the comparison instruction. such as:

int aFunction (int x, int y)
{
   if (x + y <0)
      return 1;
  else
     return 0;
}
Doing so will save comparison instructions in critical loops, reduce code size, and increase efficiency. The C language does not have the concept of carry flags and overflow flags, so it is impossible to access the C and V flags without using the embedded assembly language. However, the compiler supports borrow flags (unsigned overflow), for example:
int sum (int x, int y)
{
   int res;
   res = x + y;
   if ((unsigned) res <(unsigned) x) // carry set? //
     res ++;
   return res;
}
Lazy Evaluation Exploitation
In an if (a> 10 && b = 4) statement like this, make sure that the first part of the AND expression is most likely to be false, and the second part is most likely not executed.

Use switch () instead of if ... else ... In the case of more conditional choices, you can use if ... else ... else ..., like this:

if (val == 1)
    dostuff1 ();
else if (val == 2)
    dostuff2 ();
else if (val == 3)
    dostuff3 ();
Using switch can be faster:
switch (val)
{
    case 1: dostuff1 (); break;
    case 2: dostuff2 (); break;
    case 3: dostuff3 (); break;
}
In an if statement, even if the last condition is true, you must first determine whether all the previous conditions are true. The Switch statement eliminates this extra work. If you have to use if ... else, put the most likely condition first. Binary Breakdown
Make the judgment conditions binary. For example, don't use the following list:

if (a == 1) {
} else if (a == 2) {
} else if (a == 3) {
} else if (a == 4) {
} else if (a == 5) {
} else if (a == 6) {
} else if (a == 7) {
} else if (a == 8)

{
}
Instead:
if (a <= 4) {
    if (a == 1) {
    } else if (a == 2) {
    } else if (a == 3) {
    } else if (a == 4) {

    }
}
else
{
    if (a == 5) {
    } else if (a == 6) {
    } else if (a == 7) {
    } else if (a == 8) {
    }
}
even:
if (a <= 4)
{
    if (a <= 2)
    {
        if (a == 1)
        {
            / * a is 1 * /
        }
        else
        {
            / * a must be 2 * /
        }
    }
    else
    {
        if (a == 3)
        {
            / * a is 3 * /
        }
        else
        {
            / * a must be 4 * /
        }
    }
}
else
{
    if (a <= 6)
    {
        if (a == 5)
        {
            / * a is 5 * /
        }
        else
        {
            / * a must be 6 * /
        }
    }
    else
    {
        if (a == 7)
        {
            / * a is 7 * /
        }
        else
        {
            / * a must be 8 * /
        }
    }
}
Slow and inefficient:
c = getch ();
switch (c) {
    case 'A':
    {
        do something;
        break;
    }
    case 'H':
    {
        do something;
        break;
    }
    case 'Z':
    {
        do something;
        break;
    }
}
Fast and efficient:
c = getch ();
switch (c) {
    case 0:
    {
        do something;
        break;
    }
    case 1:
    {
        do something;
        break;
    }
    case 2:
    {
        do something;
        break;
    }
}
The above is a comparison between two case statements. Switch statement vs. lookup tables
The switch statement is usually used in the following situations:

Call one of several functions to set a variable or return value to execute one of several code snippets
If the case representation is dense, in the first two cases where a switch statement is used, a more efficient lookup table can be used. For example, the following two routines to convert assembly code to strings:

char * Condition_String1 (int condition) {
  switch (condition) {
     case 0: return "EQ";
     case 1: return "NE";
     case 2: return "CS";
     case 3: return "CC";
     case 4: return "MI";
     case 5: return "PL";
     case 6: return "VS";
     case 7: return "VC";
     case 8: return "HI";
     case 9: return "LS";
     case 10: return "GE";
     case 11: return "LT";
     case 12: return "GT";
     case 13: return "LE";
     case 14: return "";
     default: return 0;
  }
}

char * Condition_String2 (int condition) {
   if ((unsigned) condition> = 15) return 0;
      return
      "EQ \ 0NE \ 0CS \ 0CC \ 0MI \ 0PL \ 0VS \ 0VC \ 0HI \ 0LS \ 0GE \ 0LT \ 0GT \ 0LE \ 0 \ 0" +
       3 * condition;
}

The first routine requires 240 bytes and the second requires only 72. Loop termination
If the loop termination condition is written carelessly, it can place a significant burden on the program. We should try to use "reciprocal to zero" loops and use simple loop termination conditions. The loop termination condition is relatively simple, and the program will consume relatively little time when executed. Take the following two examples of calculating n !, the first example uses an increasing loop and the second uses a decreasing loop.

int fact1_func (int n)
{
    int i, fact = 1;
    for (i = 1; i <= n; i ++)
      fact * = i;
    return (fact);
}

int fact2_func (int n)
{
    int i, fact = 1;
    for (i = n; i! = 0; i--)
       fact * = i;
    return (fact);
}
As a result, the second example is much faster than the first. Faster for () loops
This It is a simple and effective concept. Generally, we are used to writing a for loop like this:

for (i = 0; i <10; i ++) {...}
The i values are: 0,1,2,3,4,5,6,7,8,9
Without caring about the order of the loop counters, we can do this:

for (i = 10; i--;) {...}
i values are: 9,8,7,6,5,4,3,2,1,0, and the cycle is faster
This method is feasible because it uses faster i- as a test condition, that is, "is i a non-zero number, if it is minus one, then continue". Compared to the original code, the processor has to "subtract 10 from i, whether the result is a non-zero number, and if so, increase i, and then continue", which makes a significant difference in a tight loop.
This syntax may seem a little strange, but it is perfectly legal. The third statement in the loop is optional (an infinite loop can be written as for (;;)), the following can also achieve the same effect:

for (i = 10; i; i-) {}
or:
for (i = 10; i! = 0; i-) {}
The only thing we have to be careful of is to remember that the loop needs to stop at 0 (if the loop is from 50-80, this will not work), and the loop counter is counted down.
In addition, we can assign counters to registers, which can produce more efficient code. This method of initializing the loop counter to the number of loops and then decrementing to zero is also applicable to the while and do statements.

Loop jamming
Where one loop can be used, never use two. But if you want to do a lot of work in the loop, exceeding the processor's instruction buffer, in this case, it may be faster to use two separate loops, because it is possible that both loops are completely saved Instruction buffer.

// Original Code:
for (i = 0; i <100; i ++) {
    stuff ();
}

for (i = 0; i <100; i ++) {
    morestuff ();
}
// It would be better to do:
for (i = 0; i <100; i ++) {
    stuff ();
    morestuff ();
}
Function looping
When you call a function, you pay a price in performance. Not only to change the program pointer, but also to push those variables that are in use on the stack and allocate new variable space. In order to improve the efficiency of the program, there is a lot of work to be done on the functional structure of the program. While ensuring the readability of the program, control the size of the program as much as possible.
If a function is called frequently in a loop, you can consider placing the loop inside the function, which can save the burden of repeatedly calling the function, such as:

for (i = 0; i <100; i ++)
{
    func (t, i);
}
-
-
-
void func (int w, d)
{
    lots of stuff.
}
Can be written as:

func (t);
-
-
-
void func (w)
{
    for (i = 0; i <100; i ++)
    {
        // lots of stuff.
    }
}
Loop unrolling
To improve efficiency, you can unroll small loops, but this will increase the size of the code. After the loop is disassembled, the number of loop counter updates is reduced, and the number of branches of the loop executed is reduced. If the loop is repeated only a few times, it can be disassembled completely, so that the extra overhead caused by the loop will disappear.

such as:

for (i = 0; i <3; i ++) {
    something (i);
}

// is less efficient than
something (0);
something (1);
something (2);
Because in each loop, the value of i will increase, and then check if it is valid. The compiler often unwraps such simple loops, provided that the number of these loops is fixed. For a loop like this:
for (i = 0; i <limit; i ++) {...}
It cannot be disassembled because we don't know how many times it loops. However, it is not impossible to disassemble this type of loop.
Compared to a simple loop, the following code is much longer, but much more efficient. Choose 8 as the block size, just for demonstration, any suitable length is feasible. In the example, the condition of the loop is only checked every eight times, not every time. If the size of the array to be processed is determined, we can use the size of the array as the size of the block (or a value that can divide the length of the array). However, the size of the block is related to the cache size of the system.

// Example 1

    #include

    #define BLOCKSIZE (8)

    void main (void)
    {
    int i = 0;
    int limit = 33; / * could be anything * /
    int blocklimit;

    / * The limit may not be divisible by BLOCKSIZE,
     * go as near as we can first, then tidy up.
     * /
    blocklimit = (limit / BLOCKSIZE) * BLOCKSIZE;

    / * unroll the loop in blocks of 8 * /
    while (i <blocklimit)
    {
        printf ("process (% d) \ n", i);
        printf ("process (% d) \ n", i + 1);
        printf ("process (% d) \ n", i + 2);
        printf ("process (% d) \ n", i + 3);
        printf ("process (% d) \ n", i + 4);
        printf ("process (% d) \ n", i + 5);
        printf ("process (% d) \ n", i + 6);
        printf ("process (% d) \ n", i + 7);

        / * update the counter * /
        i + = 8;

    }

    / *
     * There may be some left to do.
     * This could be done as a simple for () loop,
     * but a switch is faster (and more interesting)
     * /

    if (i <limit)
    {
        / * Jump into the case at the place that will allow
         * us to finish off the appropriate number of items.
         * /

        switch (limit-i)
        {
            case 7: printf ("process (% d) \ n", i); i ++;
            case 6: printf ("process (% d) \ n", i); i ++;
            case 5: printf ("process (% d) \ n", i); i ++;
            case 4: printf ("process (% d) \ n", i); i ++;
            case 3: printf ("process (% d) \ n", i); i ++;
            case 2: printf ("process (% d) \ n", i); i ++;
            case 1: printf ("process (% d) \ n", i);
        }
    }

    }
Counting the number of bits set
Example 1: Test the single least significant bit, count, and then shift.

// Example-1

int countbit1 (uint n)
{
  int bits = 0;
  while (n! = 0)
  {
    if (n & 1) bits ++;
    n >> = 1;
   }
  return bits;
}
Example 2: Divide by 4 first, and then calculate each part where it is 4. Loop dismantling often brings new opportunities for program optimization.
// Example-2

int countbit2 (uint n)
{
   int bits = 0;
   while (n! = 0)
   {
      if (n & 1) bits ++;
      if (n & 2) bits ++;
      if (n & 4) bits ++;
      if (n & 8) bits ++;
      n >> = 4;
   }
   return bits;
}
Exit the loop early
It is usually not necessary to traverse the entire loop. For example, searching for a specific value in an array, we can exit the loop as soon as we find the value we need. The following example searches for -99 among 10,000 numbers.

found = FALSE;
for (i = 0; i <10000; i ++)
{
    if (list [i] == -99)
    {
        found = TRUE;
    }
}

if (found) printf ("Yes, there is a -99. Hooray! \ n");
This is possible, but no matter where the searched item appears, the entire array is searched. The better way is to find the number we need and exit the loop immediately.
found = FALSE;
for (i = 0; i <10000; i ++)
{
    if (list [i] == -99)
    {
        found = TRUE;
        break;
    }
}
if (found) printf ("Yes, there is a -99. Hooray! \ n");
If the number appears at position 23, the loop will terminate, ignoring the remaining 9977. Function Design
It's right to keep functions short and smart. This allows the compiler to perform other optimizations efficiently, such as register allocation.

Function call overhead
For the processor, the overhead of calling the function is small, and usually, the proportion of the work performed by the called function is also small. The number of function parameters that can be passed using registers is limited. These parameters can be integer compatible (char, short, int, and float all occupy one word), or structures within 4 words (including double and long long with 2 words). If the parameter limit is 4, then the fifth and subsequent words are saved on the stack. This increases the cost of storing these parameters in the calling function and restoring them in the called function.

int f1 (int a, int b, int c, int d) {
   return a + b + c + d;
}

int g1 (void) {
   return f1 (1, 2, 3, 4);
}


int f2 (int a, int b, int c, int d, int e, int f) {
  return a + b + c + d + e + f;
}

ing g2 (void) {
 return f2 (1, 2, 3, 4, 5, 6);
}
In the function g2, the fifth and sixth parameters are stored in the stack and restored in f2, each parameter brings 2 memory accesses. Minimizing parameter passing overhead
For the cost of passing arguments to functions
To a minimum, we can:
Whenever possible, ensure that the function has no more than four parameters, or even fewer, so that the stack is not used to pass parameters.
If a function has more than four parameters, make sure you can do a lot of work in this function, which can offset the cost of passing stack parameters.
Use a pointer to a structure as a parameter, not the structure itself.
Putting the relevant parameters into a structure and then passing its pointer to the function can reduce the number of parameters and increase the readability of the program.
Minimize the number of parameters of type long because it uses the space of two parameters. The same applies to double.
Avoid situations where one part of the parameter is transferred using a register and the other part is transferred using a stack. In this case the parameters will all be pushed onto the stack.
Avoid situations where the number of arguments to a function is variable. In this case, all parameters use the stack.

Leaf functions
If a function no longer calls other functions, such functions are called leaf functions. In many applications, about half of the function calls are calls to leaf functions. Leaf functions can be compiled very efficiently on all platforms, because they do not need to save and restore parameters. The cost of pushing the stack at the entry and exiting the stack is very small compared to the work done by a leaf function that is complex enough and requires 4 or 5 parameters. If possible, we should try to arrange frequently called functions as leaf functions. The number of times a function is called can be determined by a profiling facility. There are several ways to ensure that functions are compiled into leaf functions:

Do not call other functions: including operations that are converted to call C library functions, such as division and floating-point operations. Use the keyword __inline to decorate small functions. Inline functions
For all debugging options, inline functions are disabled. After using the inline keyword to modify a function, unlike the normal function call, the call to the function in the code will be replaced by the function body itself. This makes the code faster, on the other hand it affects the length of the code, especially if the embedded functions are large and often called.

__inline int square (int x) {
    return x * x;
}
#include
double length (int x, int y) {
    return sqrt (square (x) + square (y));
There are several advantages to using inline functions: there is no overhead of calling functions.
Because functions are replaced directly, there is no additional overhead, such as storing and restoring registers.

Lower parameter assignment overhead.
The overhead of parameter passing is usually lower because it does not require copying variables. If some of these parameters are constants, the compiler can make further optimizations.

The disadvantage of inline functions is that if the function is called in many places, it will increase the code size. The size of the length difference is very dependent on the size of the embedded function and the number of calls.

It is wise to set only a few key functions as inline functions. If set properly, the embedded function can reduce the code length. A function call requires a certain number of instructions, but using the optimized embedded function can be compiled into fewer instructions.

Using Lookup Tables
Some functions can be approximated as lookup tables, which can significantly improve efficiency. The accuracy of lookup tables is generally lower than the accuracy of calculation formulas, but in most programs, this accuracy is sufficient.
Many signal processing software (such as MODEM modulation software) will make extensive use of sin and cos functions, these functions will bring a lot of mathematical operations. For real-time systems, accuracy is not very important, and the sin / cos lookup table appears more practical. When using lookup tables, try to combine similar operations into one lookup table, which is faster and uses less space than using multiple lookup tables.

Floating-point Arithmetic
Although floating-point arithmetic is time-consuming for any processor, sometimes we still have to use floating-point arithmetic, such as signal processing. However, when writing floating-point arithmetic code, we must keep in mind:

Floating-point division is slow
Division is twice as slow as addition or multiplication. We can write a constant division operation by multiplying it by the inverse of the number (for example, x = x / 3.0 is written as x = x * (1.0 / 3.0)). The calculation of the inverse is done during the compilation phase.

Use float instead of double
Float type variables consume less memory and registers and are more efficient because of their low precision. When accuracy is sufficient, float is used.

Don't use transcendental functions,
Prior functions (such as sin, cos, log) are implemented using a series of multiplications and additions, so these operations are more than 10 times slower than ordinary multiplications.

Simplified floating point expression
The compiler does not perform much optimization in mixed operations of integer and floating-point types. For example, 3 * (x / 3) will not be optimized to x, because floating-point operations usually lead to reduced precision, and even the order of expressions is important: (a + b) + c is not equal to a + (b + c). Therefore, manual optimization is beneficial.

However, in certain situations, the efficiency of floating-point operations is not up to the specified level. In this case, the best approach may be to abandon floating-point operations and use fixed-point operations instead. When the variation range of the variable is small enough, fixed-point arithmetic is more accurate and faster than floating-point arithmetic.

Misc tips
In general, you can trade storage space for time. You can cache frequently used data instead of recalculating or reloading it every time. Such as the sin / cos table, or a table of pseudo-random numbers (if you don't really need random numbers, you can calculate 1,000 at the beginning and reuse them in subsequent code)
Use as few global variables as possible.
Declare variables inside a file as static, unless it is necessary to be global.
Don't use recursion. Recursion can make the code very neat and beautiful, but it will generate a lot of function calls and overhead.
Accessing single-dimensional arrays is faster than multi-dimensional arrays
Use #defined macros instead of small functions that are often used.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.