In terms of performance optimization, we always pay attention to the 80-20 principle, that is, 20% of the program consumes 80% of the running time. Therefore, to improve the efficiency, we mainly consider improving the 20% code. Do not optimize the program to sell less than 80%.
First move: Change Time with Space
The biggest contradiction in computer programs is the contradiction between space and time. From this perspective, we should reverse thinking to consider the program efficiency, we now have 1st moves to solve the problem-change the space for time. For example, the value assignment of a string:
Method A: the general method
# Define LEN 32
Char string1 [LEN];
Memset (string1, 0, LEN );
Strcpy (string1, "This is a example !! ");
Method B:
Const char string2 [LEN] = "This is a example! ";
Char * cp;
Cp = string2
You can use pointers to perform operations.
From the above example, we can see that the efficiency of A and B is incomparable. In the same bucket, B can directly use the pointer, and A needs to call two character functions to complete the operation. B's disadvantage is that flexibility is not as good as. When you need to change the content of A string frequently, A has better flexibility. If you use method B, you need to pre-store many strings, although it occupies A large amount of memory, however, the program execution efficiency is achieved.
If the system has high real-time requirements and memory requirements, I recommend that you use this method.
Second, use macros instead of functions.
This is also the first change. The difference between a function and a macro is that a macro occupies a large amount of space while a function occupies time. You need to know that function calling uses the system stack to store data. If the compiler has the stack check option, generally, some Assembly statements are embedded in the function header to check the current stack. At the same time, the CPU also needs to save and restore the current site during function calling to perform stack pressure and elastic stack operations, therefore, it takes some CPU time to call a function. Macro does not have this problem. Macros are embedded into the current program only as pre-written code and do not generate function calls. Therefore, they only occupy space. This phenomenon is particularly prominent when the same macro is frequently called.
Example:
Method C:
# Define bwmcr2_address 4
# Define bsMCDR2_ADDRESS 17
Int BIT_MASK (int _ bf)
{
Return (1U <(bw ##_ _ bf)-1) <(bs ##_ _ bf );
}
Void SET_BITS (int _ dst,
Int _ bf, int _ val)
{
_ Dst = (_ dst )&~ (BIT_MASK (_ bf) |
\
(_ Val) <(bs # _ bf ))
& (BIT_MASK (_ bf ))))
}
SET_BITS (mcr2, mcr2_address, ReGISterNumber );
Method d:
# Define bwmcr2_address 4
# Define bsMCDR2_ADDRESS 17
# Define bmMCDR2_ADDRESS BIT_MASK (MCDR2_ADDRESS)
# Define BIT_MASK (_ bf)
(1U <(bw ##_ _ bf)-1)
<(Bs # _ bf ))
# Define SET_BITS (_ dst, _ bf, _ val)
\
(_ Dst) = (_ dst )&~ (BIT_MASK (_ bf )))
| \
(_ Val) <(bs # _ bf ))
& (BIT_MASK (_ bf ))))
SET_BITS (mcr2, mcr2_address,
RegisterNumber );
The D method is the best position operation function I have seen. It is part of ARM's source code and implements many functions in just three lines, covering almost all bit operation functions. The C method is its variant, and the taste needs to be carefully understood.
Third TRICK: solve problems using mathematical methods
Now we assume the second trick of writing efficient C language-using mathematical methods to solve the problem. Mathematics is the mother of computers. Without the foundation and foundation of mathematics, there will be no computer development. Therefore, when programming, using some mathematical methods will increase the execution efficiency of the program by an order of magnitude. For example, 1 ~ Sum of 100.
Method E:
Int I, j;
For (I = 1 I <= 100; I ++)
{
J + = I;
}
Method F
Int I;
I = (100*(1 + 100)/2
This example is a mathematical case that I was most impressed with. It was tested by my computer teacher. At that time, I only had a third-grade primary school. Unfortunately, I didn't know how to use the formula N × (N + 1)/2 to solve this problem. Method E loops 100 times to solve the problem. That is to say, at least 100 assignments, 100 judgments, and 200 additions (I and j) are used ); method F only uses one addition, one multiplication, and one division. The effect is self-evident. So now, when I compile a program, I use my brains to find patterns and maximize the power of mathematics to improve the program running efficiency.
Step 4: Bit operations
Bit operation. Reduce division and modulo operations. The bit of data in a computer program is the smallest unit of data that can be operated. In theory, you can use the bit operation to complete all the operations and operations. Generally, bit operations are used to control hardware or perform data transformation. However, flexible bit operations can effectively improve the efficiency of program running. Example:
Method G
Int I, J;
I = 257/8;
J = 456% 32;
Method H
Int I, J;
I = 257> 3;
J = 456-(456> 4 <4 );
Literally, H is much more troublesome than G. However, you can check the generated assembly code carefully to understand that the method gcall uses the basic modulo function and division function, which can be called by both functions, there are also a lot of assembly code and registers involved in the operation, while method H is just a few related assembly, the code is more concise, more efficient. Of course, due to the differences in compilers, there may be little difference in efficiency. However, from the perspective of ms c and arm c I have encountered, the efficiency gap is not small. For the mathematical calculation of "*", "/", or "%" as the exponent level of 2, the conversion to shift operation "<>" usually improves the algorithm efficiency. Because the period of multiplication and division operations is usually larger than that of shift operations. In addition to improving the computing efficiency, the C language bit operation is another typical application in embedded system programming, in addition, it is widely used in scenarios such as bitwise AND (&), or (|), non (~) Operation, which is closely related to the programming features of embedded systems. We usually need to set the bit of the hardware register. For example, we can set the low 6 bits of the interrupt shielding control register of the AM186ER 80186 processor to 0 (on interrupt 2 ), the most common practice is:
# Define INT_I2_MASK 0x0040
WTemp = inword (INT_MASK );
Outword (INT_MASK, wTemp &~ INT_I2_MASK );
The method to set this bit to 1 is:
# Define INT_I2_MASK 0x0040
WTemp = inword (INT_MASK );
Outword (INT_MASK, wTemp | INT_I2_MASK );
To determine whether the bit is 1 is:
# Define INT_I2_MASK 0x0040
WTemp = inword (INT_MASK );
If (wTemp & INT_I2_MASK)
{
... /* This bit is 1 */
}
When using this method, you must note that the CPU usage is different. For example, a program written on a PC that passes debugging on a PC may cause code risks when it is transplanted to a 16-bit platform. Therefore, this approach can be used only on the basis of certain advanced technologies.
Step 5: Assembly embedding
In the eyes of people familiar with assembly languages, C language programs are all spam ". Although this argument is somewhat radical, it makes sense. Assembly language is the most efficient computer language, but it cannot be used to write an operating system? Therefore, to achieve program efficiency, we had to adopt a flexible method-Embedded Assembly, mixed programming. Embedded C programs mainly use online Assembly, that is, directly insert _ asm {} Embedded Assembly statements in C Programs.
For example, assign array 1 to array 2, and each byte must be consistent.
Char string1 [1, 1024], string2 [1024];
Method I
Int I;
For (I = 0 I <1024; I ++)
* (String2 + I) = * (string1 + I)
Method J
# Ifdef _ pc _
Int I;
For (I = 0 I <1024; I ++)
* (String2 + I) = * (string1 + I );
# Else
# Ifdef _ ARM _
_ Asm
{
MOV R0, string1
MOV R1, string2
MOV R2, #0
Loop:
LDMIA R0 !, R3-R11
STMIA R1 !, R3-R11
ADD R2, R2, #8
CMP R2, #400
BNE loop
}
# Endif
Another example:
/* Add the values of two input parameters and store the results in another global variable */
Int result;
Void Add (long a, long * B)
{
_ Asm
{
Mov ax,
Mov bx, B
Add ax, [BX]
MOV result, AX
}
}
Method I is the most common method and uses 1024 cycles. Method J is differentiated based on different platforms. On the ARM platform, the same operation is completed with only 128 cycles of Embedded Assembly. Some may say, why not use a standard memory copy function? This is because the source data may contain 0 bytes of data. In this case, the standard library function will end early without completing the required operations. This example is typically used to copy LCD data. Using Embedded Assembly based on different CPUs can greatly improve program execution efficiency.
Although it is mandatory, it will be costly to use it easily. This is because the use of Embedded Assembly restricts the portability of the program, so that the program can be transplanted on different platforms! At the same time, this trick is contrary to the modern software engineering idea, and can be used only when it is forced.
Step 6: Use register variables
When a variable is frequently read/written, the memory needs to be accessed repeatedly, which takes a lot of access time. Therefore, the C language provides a variable, register variable. These variables are stored in the CPU registers. When used, they do not need to access the memory, but are directly read and written from the registers to improve efficiency. The register variable description is register. Variable control variables with a large number of loops and variables used repeatedly in the loop body can be defined as register variables, and cyclic count is the best candidate for applying register variables.
(1) only local automatic variables and parameters can be defined as register variables. Because register variables are dynamically stored, the amount of data that requires static storage cannot be defined as register variables, including: Inter-module global variables, intra-module global variables, and local static variables;
(2) register is a "recommended" keyword, meaning that the program recommends that the variable be placed in the register, but the variable may not become a register variable because the condition is not met, it is stored in the memory, but the compiler does not report an error (there is another "suggested" keyword in C ++: inline ).
The following is an example of using register variables:
/* Calculate 1 + 2 + 3 + .... + N value */
WORD Addition (BYTE n)
{
Register I, s = 0;
For (I = 1; I <= n; I ++)
{
S = s + I;
}
Return s;
}
This program loops n times, I and s are frequently used, so it can be defined as a register variable.
7. Exploitation of hardware features
First of all, you must understand the access speed of the CPU to various types of memory, basically as follows:
CPU internal RAM> external synchronization RAM> external asynchronous RAM> FLASH/ROM
The program code has been burned in FLASH or ROM. We can allow the CPU to directly read and execute the code from it, but this is usually not a good solution, we 'd better copy the target code in FLASH or ROM into RAM after the system is started, and then execute the code to speed up instruction obtaining;
For uart and other devices, there is a certain internal capacity of the receiving BUFFER. We should try to interrupt the CPU after the BUFFER is full. For example, when the computer terminal transmits data to the target machine through the RS-232, it is not suitable to set UART to receive only one BYTE to raise the interrupt to the CPU, thus unnecessary waste of interrupt processing time;
If a device can be read in DMA mode, it will adopt DMA reading. The DMA reading method is more efficient in reading the storage information contained in the target. The basic unit of data transmission is block, the transmitted data is directly sent from the device to the memory (or vice versa ). Compared with the interrupt driver, the DMA mode reduces CPU intervention on peripherals and further improves parallel operations between the CPU and peripherals.
Author: chenlycly