1. the design philosophy of the Proteus
The ARM Kernel adopts the hierarchical architecture. A design concept is the design concept. It aims to design a set of simple and effective instruction sets that can be executed at a high clock frequency in a single cycle. The focus of his design is the complexity of the commands executed by the hardware, because the software is easier to provide more flexibility and intelligence than the hardware. Therefore, the RISC design has higher requirements on compilers. On the contrary, traditional computers with complex instruction sets (CISC) focus more on the functionality of hardware execution instructions, making CISC more complex.
The design philosophy of the Proteus is mainly implemented by the following four design principles:
Explain Instruction Set
The number of instruction classes is reduced by the number of instruction classes. The length of each instruction is fixed, and the pipeline is allowed to obtain the next instruction in the current Instruction Decoding phase. In the CISC processor, the instruction length is usually not fixed, the execution also takes multiple cycles.
Pipeline assembly line
Under ideal conditions, the pipeline advances step by step every cycle to obtain the highest throughput. However, the execution of CISC commands needs to callCodeOfProgram.
Register
There are more general-purpose registers for the server. Each register can store data or addresses. Registers provide fast local storage access for all data operations, while CISC processors are dedicated processors for specific purposes.
▇ Load-Store Structure
The processor can only process data in registers. Independent load and store commands are used to transmit data between registers and external memory. Because the access to the memory is time-consuming, the access to the memory and data processing are separated. One advantage is that the data stored in the register can be used repeatedly to avoid multiple accesses to the memory. On the contrary, in the CISC structure, the processor can directly process data in the memory.
Ii. Arm design ideas
To reduce power consumption, arm processors have been specially designed into smaller cores and higher code density. The ARM kernel is not a pure hierarchical structure, so that it can better adapt to its main application field-embedded systems. In a sense, we can even think of the success of the ARM Kernel because it has not sunk too deeply in the concept of the RISC. The key to the system is not simply the processor speed, but the effective system performance and power consumption.
Instruction Sets for Embedded Systems
The period of some specific commands is variable.
For example, the execution cycle of load/store commands loaded/stored in multiple registers is uncertain.
The pipeline embedded bucket-based slots generate more complex commands.
▇ Thumb 16-bit Instruction Set
Condition execution
Explain enhanced commands
3. Efficient C Programming
1) effective usage of c Data Type
Except for the 8-or 16-bit arithmetic modulo operators, do not use the char and short types for local variables stored in the register. Use a signed or unsigned int type. The division operation uses the unsigned number for faster execution.
Small data types should be used as much as possible for arrays and global variables stored in the primary storage to meet the data size. This can save storage space. The armv4 architecture can effectively load and store data of all widths, and can effectively access the Array Using an incremental array pointer. For short-type arrays, avoid using the offset of the array base address, because the ldrh command does not support offset addressing.
Because implicit or explicit data type conversion usually has additional instruction cycle overhead, avoid using it in expressions. The load and store commands generally do not produce additional conversion overhead, because the load and store commands automatically convert data types.
Avoid using char and short for function parameters and return values. Even if the parameter range is small, int type should be used to prevent unnecessary type conversion by the compiler.
2) efficiently compile the loop body
Explain uses a zero-counting cycle structure, so that the compiler does not need to allocate a register to save the loop stop value, and the commands compared with 0 can also be omitted.
The counter uses the unsigned cyclic Count value. The condition for loop continuation is I! = 0 instead of I> 0. This ensures that there are only two commands in the loop overhead.
If the operator knows that the loop body will be executed at least once in advance, it is better to use the do-while loop than the for loop. In this way, the compiler can skip the step of checking whether the loop Count value is 0.
When you expand an important loop body, you can reduce the cycle overhead, but do not overdo it. If the cycle overhead is small for the entire program, loop expansion will increase the amount of code and reduce the cache performance.
Limit tries to make the array size a multiple of 4 or 8, so that you can easily expand the loop with a variety of options such as 2, 4, and 8, without worrying about the remaining array elements.
3) Efficient register allocation
The compiler should try to limit the number of local variables used in the internal loop of the function to a maximum of 12. In this way, the compiler can allocate these variables to arm registers.
Compiler can guide the compiler to determine the importance of a variable by checking whether it belongs to the innermost loop variable.
4) Efficient function call
Limit tries to limit the number of function parameters to no more than four, so that function calling is more efficient. You can also organize several related parameters in a struct and use the passed struct pointer to replace multiple parameters.
Compile puts relatively small called functions and called functions in the same original file, and must be defined first and then called, the compiler can optimize function calls or Inline smaller functions.
You can use the keyword _ inline to inline an important function that has a major impact on performance.
5) Avoid pointer aliases
The compiler should not rely on the compiler to eliminate the public subexpressions that store access, but should create a new local variable to save the value of this expression, so that you can only apply for a job for this expression once.
Avoid using the address of a local variable. Otherwise, the access efficiency to this variable is relatively low.
6) Efficient Structure Arrangement
Elements of the struct structure should be arranged according to the size of the elements. The smallest element is placed at the beginning, and the largest element is arranged at the end.
Struct can be replaced by a small hierarchical struct instead of a large struct.
To improve portability, add a padding space to the structure of the API manually. In this way, the structure arrangement will not depend on the compiler.