Association between arm design ideas and efficient C Programming

Source: Internet
Author: User

1. the design philosophy of the Proteus

The ARM Kernel adopts the hierarchical architecture. A design concept is the design concept. It aims to design a set of simple and effective instruction sets that can be executed at a high clock frequency in a single cycle. The focus of his design is the complexity of the commands executed by the hardware, because the software is easier to provide more flexibility and intelligence than the hardware. Therefore, the RISC design has higher requirements on compilers. On the contrary, traditional computers with complex instruction sets (CISC) focus more on the functionality of hardware execution instructions, making CISC more complex.

The design philosophy of the Proteus is mainly implemented by the following four design principles:

Explain Instruction Set

The number of instruction classes is reduced by the number of instruction classes. The length of each instruction is fixed, and the pipeline is allowed to obtain the next instruction in the current Instruction Decoding phase. In the CISC processor, the instruction length is usually not fixed, the execution also takes multiple cycles.

Pipeline assembly line

Under ideal conditions, the pipeline advances step by step every cycle to obtain the highest throughput. However, the execution of CISC commands needs to callCodeOfProgram.

Register

There are more general-purpose registers for the server. Each register can store data or addresses. Registers provide fast local storage access for all data operations, while CISC processors are dedicated processors for specific purposes.

▇ Load-Store Structure

The processor can only process data in registers. Independent load and store commands are used to transmit data between registers and external memory. Because the access to the memory is time-consuming, the access to the memory and data processing are separated. One advantage is that the data stored in the register can be used repeatedly to avoid multiple accesses to the memory. On the contrary, in the CISC structure, the processor can directly process data in the memory.

Ii. Arm design ideas

To reduce power consumption, arm processors have been specially designed into smaller cores and higher code density. The ARM kernel is not a pure hierarchical structure, so that it can better adapt to its main application field-embedded systems. In a sense, we can even think of the success of the ARM Kernel because it has not sunk too deeply in the concept of the RISC. The key to the system is not simply the processor speed, but the effective system performance and power consumption.

Instruction Sets for Embedded Systems

The period of some specific commands is variable.

For example, the execution cycle of load/store commands loaded/stored in multiple registers is uncertain.

The pipeline embedded bucket-based slots generate more complex commands.

▇ Thumb 16-bit Instruction Set

Condition execution

Explain enhanced commands

3. Efficient C Programming

1) effective usage of c Data Type

Except for the 8-or 16-bit arithmetic modulo operators, do not use the char and short types for local variables stored in the register. Use a signed or unsigned int type. The division operation uses the unsigned number for faster execution.

Small data types should be used as much as possible for arrays and global variables stored in the primary storage to meet the data size. This can save storage space. The armv4 architecture can effectively load and store data of all widths, and can effectively access the Array Using an incremental array pointer. For short-type arrays, avoid using the offset of the array base address, because the ldrh command does not support offset addressing.

Because implicit or explicit data type conversion usually has additional instruction cycle overhead, avoid using it in expressions. The load and store commands generally do not produce additional conversion overhead, because the load and store commands automatically convert data types.

Avoid using char and short for function parameters and return values. Even if the parameter range is small, int type should be used to prevent unnecessary type conversion by the compiler.

2) efficiently compile the loop body

Explain uses a zero-counting cycle structure, so that the compiler does not need to allocate a register to save the loop stop value, and the commands compared with 0 can also be omitted.

The counter uses the unsigned cyclic Count value. The condition for loop continuation is I! = 0 instead of I> 0. This ensures that there are only two commands in the loop overhead.

If the operator knows that the loop body will be executed at least once in advance, it is better to use the do-while loop than the for loop. In this way, the compiler can skip the step of checking whether the loop Count value is 0.

When you expand an important loop body, you can reduce the cycle overhead, but do not overdo it. If the cycle overhead is small for the entire program, loop expansion will increase the amount of code and reduce the cache performance.

Limit tries to make the array size a multiple of 4 or 8, so that you can easily expand the loop with a variety of options such as 2, 4, and 8, without worrying about the remaining array elements.

3) Efficient register allocation

The compiler should try to limit the number of local variables used in the internal loop of the function to a maximum of 12. In this way, the compiler can allocate these variables to arm registers.

Compiler can guide the compiler to determine the importance of a variable by checking whether it belongs to the innermost loop variable.

4) Efficient function call

Limit tries to limit the number of function parameters to no more than four, so that function calling is more efficient. You can also organize several related parameters in a struct and use the passed struct pointer to replace multiple parameters.

Compile puts relatively small called functions and called functions in the same original file, and must be defined first and then called, the compiler can optimize function calls or Inline smaller functions.

You can use the keyword _ inline to inline an important function that has a major impact on performance.

5) Avoid pointer aliases

The compiler should not rely on the compiler to eliminate the public subexpressions that store access, but should create a new local variable to save the value of this expression, so that you can only apply for a job for this expression once.

Avoid using the address of a local variable. Otherwise, the access efficiency to this variable is relatively low.

6) Efficient Structure Arrangement

Elements of the struct structure should be arranged according to the size of the elements. The smallest element is placed at the beginning, and the largest element is arranged at the end.

Struct can be replaced by a small hierarchical struct instead of a large struct.

To improve portability, add a padding space to the structure of the API manually. In this way, the structure arrangement will not depend on the compiler.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.