Due to the constraints of power consumption, cost and volume, the processing capability of the embedded microprocessor is also significantly different from that of the desktop system processor.ProgramThe running space and time requirements are more stringent.
Generally, performance optimization is required for embedded applications to meet the performance requirements of embedded applications.
1. Types of embedded program optimization
Embedded Application optimization refers to modifyingAlgorithm, Structure, and use software development tools to improve the program, so that the modified program running speed is higher orCodeSmaller size.
Depending on the optimization focus, program optimization can be divided into operation speed optimization and code size optimization. Running Speed Optimization refers to shortening the running time required to complete a specified task through application structure adjustment and other means on the basis of fully understanding the hardware and software features; code size optimization means that the application can properly implement the required functions and minimize the amount of code. In practical applications, the two are often in conflict with each other. To speed up the program running, it is necessary to increase the amount of Code; in order to reduce the size of the program code, it may be at the cost of reducing the program running speed. Therefore, prior to program optimization, specific optimization policies should be formulated based on actual needs. With the development of computer and microelectronics technology, storage space is no longer the main factor restricting embedded systems. Therefore, this article mainly discusses the optimization of operating speed.
2 embedded program optimization principles
Embedded program optimization mainly follows the following three principles.
① Equivalence Principle: The functions implemented by the program are consistent before and after optimization.
② Effective principle: After optimization, the operation speed is faster or the storage space is smaller than that before optimization.
③ Economic principle: optimization procedures have to pay a small price and get better results.
3 main aspects of embedded program optimization
Optimization of embedded programs includes algorithm and data structure optimization, compilation optimization, and code optimization.
3.1 algorithm and Data Structure Optimization
Algorithms and data structures are the core of program design. The quality of algorithms determines the merits and demerits of programs. To implement a certain function, multiple algorithms can be used. The complexity and efficiency of different algorithms vary greatly. Selecting an efficient algorithm or optimizing the algorithm can improve the performance of the application. For example, binary search is faster than sequential search in data search. Recursive programs require a large number of process calls and save the local variables of all returned processes in the stack. The time efficiency and space efficiency are very low; if recursive Programs adopt non-recursive conversion methods such as iteration and stack according to the actual situation, the program performance can be greatly improved.
Data structure also plays an important role in program design. For example, if you insert or delete data items multiple times in unordered data, using the linked list structure will be faster.
Algorithm and data structure optimization is the preferred optimization technology.
3.2 compilation Optimization
Nowadays, many compilers have certain code optimization functions. During compilation, parallel programming technology is used for correlation analysis. semantic information of source programs is obtained, and software pipelines, data planning, and loop reconstruction technologies are used, automatically performs optimization unrelated to the processor system to generate high-quality code. Many compilers have different levels of optimization options, so you can choose an appropriate optimization method. In general, if the highest level of optimization is selected, the compiler will unilaterally pursue code optimization and sometimes lead to errors.
In addition, some dedicated compilers are designed to optimize some architectures, so that hardware resources can be fully utilized to generate high-quality code. For example, Microsoft Embedded Visual C ++'s intel compiler is fully designed for the Intel XScale system and is highly optimized to create faster code. This compiler uses a variety of optimization technologies, including scheduling technology for optimizing command pipeline operations, support for dual loading and storage Intel XScale technical features, and inter-process optimization (storing the variables used by functions in registers, for quick access.
In the embedded software development process, a compiler with strong optimization capability should be selected to make full use of its code optimization function to generate efficient code and improve the program running efficiency.
3.3 code optimization
Code optimization is to replace the original code with an assembly language or a simpler program code, so that the compiled program runs more efficiently. The compiler can automatically optimize program segments and code blocks. However, it is difficult to obtain program semantic information, algorithm flows, and program running status information. Therefore, you need to manually optimize the program. The following are some common optimization techniques and techniques.
(1) code replacement
Use short-cycle commands instead of long-cycle commands to reduce the computing intensity.
① Reduce Division operations. Use the multiplication and division numbers on both sides of Relational operators to avoid Division operations. Some division and modulo operations can be replaced by bitwise operations. Because bitwise operation commands only require one instruction cycle, while "/" operations require calling subprograms. The code is long and execution is slow. For example:
Before optimization, if (A/B)> C) and a = A/4
After optimization, if (A> (B * C) and a = A> 2
② Reduce multiplication. For example:
Before optimization, A = POW (A, 3.0)
After optimization, A = A * a *
③ Use the white addition and auto-subtraction commands. For example:
Before optimization, A = a + 1, A = A-l
After optimization, A ++, a -- or INC, Dec
④ Use small data types as much as possible. When the Defined variables meet the usage requirements, the priority sequence is: Char> im> long int> float ).
For Division, the use of the unsigned number is more efficient than the signed number. In actual calls, reduce the forced conversion of data types as much as possible; Use less floating-point operations. If the calculation result can be controlled within the error, use a long integer to replace the floating point type.
(2) global variables and local variables
Use less global variables and more local variables. The global variables are stored in the data storage. If the global variables are defined, the MCU will lose a usable data storage space. Too many global variables will lead to insufficient memory allocation by the compiler; local variables are mostly located in MCU registers. Among the vast majority of MCU, register operations are faster than data storage, and instructions are more flexible, which is conducive to generating code of higher quality, the registers and data storage occupied by local variables can be reused in different modules.
(3) Use register variables
When a variable is frequently read/written, the memory needs to be accessed repeatedly, which takes a lot of access time. To improve access efficiency, you can use the CPU register variable to directly read/write data without accessing the memory. Loop control variables with a large number of loops and variables used repeatedly in the loop body can be defined as register variables, while loop count is the best choice for applying register variables. Only Local Automatic variables and parameters can be defined as register variables. Because register variables are dynamically stored, all variables that require static storage cannot be defined as register variables. The register variable description is register. The following is an example of using register variables:
(4) reduce or avoid time-consuming operations
A large number of running requests of applications are usually spent on key program modules. Key modules often contain loops or nested loops. Reduce the time-consuming operations in the loop and increase the execution speed of the program. Common time-consuming operations include input/output operations, file access, graphical interface operations, and system calls. If you cannot avoid reading/writing a file, access to the file will be a major factor affecting the program running speed. There are two ways to increase file access speed: one is to use memory ing files, and the other is to use memory cache.
(5) Optimization of switch statement usage
During programming, the case values are sorted by possibility. Placing the most likely case in the first case and the least likely case in the last case can improve the execution speed of the switch statement block.
(6) Optimization of the loop body
Loop body is the focus of program design and optimization. For some modules that do not require cyclic variables to participate in calculation, you can place them outside the loop. For a loop body with a fixed number of times, the for loop is more efficient than the while loop, And the reduce counting loop is faster than the increase counting loop. For example:
In actual operation, two commands must be added to the loop body for each loop: One subtraction command (reducing the cyclic Count value) and one condition branch command. These commands are called "cyclic overhead ". On the ARM processor, the subtraction Command requires one cycle, and the conditional branch Command requires three cycles. In this way, each cycle adds four additional cycles of overhead. You can use the loop expansion method to speed up the loop operation, that is, repeat the loop theme multiple times and reduce the number of cycles in the same proportion to reduce the loop overhead and increase the code size. In exchange for the program running speed ..
(7) function call
To efficiently call a function, try to limit the number of parameters used by the function to a maximum of four. When an arm call is performed, four or less parameters are passed through the Register, and more than 5th parameters are passed through the memory stack. If more parameters are called, you can organize the relevant parameters in a structure and use the passed struct pointer instead of the parameter.
(8) inline functions and Embedded Assembly
Important functions that have a major impact on performance can use the keyword _ inline, which saves the overhead of function calling. The negative impact is to increase the code size. The time-demanding part of the program can be written using Embedded Assembly, which usually significantly improves the speed.
(9) use the lookup table instead of the Calculation
Do not perform complex operations in the program, such as the float of a floating point. For these time-consuming and resource-consuming operations, you can exchange space for time. Calculate the function value in advance and place it in the program storage area. When the program is running, you can directly look up the table, reducing the workload of Repeated Computation during program execution.
(10) Use a hardware-optimized function library
Intel's GPP (graphics performance primitives Library)/IPP (Integrated perform-ance primitives Library) Library designed for XScale Processors, some typical operations and Algorithms for multimedia processing, graphic processing, and numerical operations are manually optimized, which can bring the computing potential of XScale hardware into full play and achieve high execution efficiency.
(11) Utilize hardware features
To improve program running efficiency, we need to make full use of hardware features to reduce the overhead, such as reducing the number of interruptions and using DMA transmission methods.
The access speed of the CPU to various types of memory is sorted by: CPU internal RAM> external synchronization Ram> external asynchronous Ram> flash/ROM. For program code that has been burned in flash or ROM, if the CPU directly reads the code from it and runs slowly, after the system is started, the target code in flash or ROM can be copied to Ram and executed to speed up program running.
4 Conclusion
The performance optimization of embedded programs is often in conflict with the development cycle, development cost, and readability of software. Weigh the pros and cons and make a compromise. Optimize algorithms and data structures as the preferred optimization technology, and then select efficient compilers, system runtime libraries, and graphics libraries based on features, performance differences, investment budgets, and other factors; performance monitoring tools are used to detect program hotspots that take up most of the running time, and code optimization methods are used to optimize it. Finally, efficient compilers are used for compilation and optimization to obtain high-quality code.