ARM processor Structure
Arm and thumb status
Proteus Technology
Assembly Line Technology
Exceeded Technology
Arm and thumb status
Later versions of V4 include:
(1) 32-bit arm Instruction Set
(2) 16-bit thumb instruction set, which is a subset of arm instruction sets.
After the ARM7TDMI core, the ARM microprocessor of the T variant has two working states:
(1) arm status
(2) thumb status.
When the ARM microprocessor executes a 32-bit arm command set, it operates in the arm state;
When the ARM microprocessor executes a 16-bit thumb command set, the task is in the thumb state.
Thumb Technology Introduction
When the ARM7 architecture is widely used, the embedded controller market is still occupied by 8-bit and 16-bit processors. These products cannot meet the needs of high-end applications. These applications require the performance of 32-bit RISC processors and code density higher than 16-bit CISC processors.
To solve the code density problem, arm added the T variant.
Thumb extracts 36 instruction formats from 32-bit arm instruction sets and can re-compile the 16-bit operation code.
During running, the 16-bit thumb command is decompressed by the processor into a 32-bit command.
The thumb core has two sets of independent instruction sets. It enables the designer to obtain arm32-bit instruction performance while enjoying the advantages of the Code produced by the thumb instruction set, balance the performance with the code size.
Compared with the arm instruction set, the thumb Instruction Set has the following limitations:
To complete the same operation, the thumb command usually requires more commands. Therefore, the arm instruction set is more suitable when the system running time is demanding.
The thumb instruction set does not contain some commands required for exception handling. Therefore, the arm command is required for exception interruption. This restriction determines that the thumb command must be used with the arm command.
Status transition between ARM and thumb
During program execution, the microprocessor can switch between the two working states at any time, and this change does not affect the working mode of the processor and the content in the corresponding register.
Enter the thumb status: When the status bit of the operand register (bit [0]) is 1, run the Bx command.
Enter the arm status: When the status bit of the operand register (bit [0]) is 0, run the Bx command.
Proteus Technology
Embedded microprocessor can be divided into two types: CISC and RISC;
CISC (Complex Instruction Set Computer): a complex command system computer. With the development of computer technology, new complex instruction sets are constantly introduced, and the computer architecture will become more and more complex.
About 20% of the commands will be used repeatedly, accounting for 80% of the entire program code, while the remaining 80% of the commands are not frequently used, accounting for only 20% of the program design.
Balanced CED Instruction Set Computer: A simplified Command System Computer. fixed-length command formats are used.
-Use a single-cycle command
-Extensive use of registers
-Load/store commands for Batch Data Transmission
-Automatically add or remove addresses used in cyclic Processing
Comparison between the server-side security management framework and the CISC architecture
Proteus Technology
The ARM processor uses the load/store architecture as a typical server type. That is, only the storage/fetch commands of the load/store can access the memory, other commands do not allow memory operations.
The basic features of the architecture of the Proteus are as follows:
(1) Most commands only need to execute simple and basic functions, and the execution process is completed within a machine cycle.
(2) Only load/storage commands are retained. The operands are taken out from the memory by the load/storage commands and put into the registers.
(3) chip logic uses hard wiring logic instead of microcode technology.
(4) Reduce the number of commands and the addressing mode.
(5) fixed instruction formats and simplified instruction decoding.
(6) Optimize compilation.
Proteus Technology
The ARM architecture also uses some special technologies:
All commands can be executed based on the preceding execution results, improving the execution efficiency of commands.
You can use the load/store command to transmit data in batches to improve data transmission efficiency.
You can complete both logical processing and shift processing in a data processing instruction.
There are also advantages for both of them, and the boundaries are not that obvious.
Modern CPUs often use the peripheral CISC and include the characteristics of the CPU. For example, the supernormal Instruction Set CPU integrates the advantages of the CPU and CISC and will become one of the future CPU development directions.
Assembly Line Technology
It is a technology that breaks down each instruction into multiple steps and overlaps operations in each step to implement parallel processing of several instructions;
The commands in the program are still executed in sequence, but several commands can be obtained in advance. When the current command has not been executed, start the subsequent commands in advance, this can speed up program running;
CPU performance is an important factor in the development and design of embedded systems.
Pipeline Technology is essentially a factor that affects program execution speed.
Since each execution stage of a command in a computer is relatively independent, most of the modern CPUs are designed as pipeline machines, in which several commands can be executed in parallel. The overlapping Technology of pipelines greatly improves the CPU running efficiency.
When the information in the assembly line flows smoothly, the CPU assembly line can work best.
However, in actual application, the operation duration of each command execution stage is different. Some command sequences may interrupt the information flow in the pipeline, so sometimes the pipeline operation is not smooth, this will temporarily reduce the CPU execution speed.
Execution of a single-cycle instruction
Arm's 3-level Assembly Line
The ARM7 architecture uses a three-segment pipeline:
(1) Fetch refers to: extract the instruction from the memory.
(2) decoding: The operation code and the operand are decoded to determine the function to be executed. Prepare the control signals required for the data path for the next cycle. This level of instruction "occupies" decoding logic, rather than "occupies" the data path
(3) run the decoded command. The instruction occupies the data path, the register stack is read, and the operands are moved in the bucket row location device. Alu generates computation results and writes them back to the destination register. The ALU results change the condition bit of the status register according to the instructions.
PC change process in Assembly Line Mode
Three-Level assembly line operations for multi-cycle arm commands
Data paths involve all execution cycles, address calculation, and data transmission. The decoding logic always generates the control signal used by the data path in the next cycle. Therefore, in addition to the decoding cycle, the control signal required for data transmission is also generated in the STR address calculation cycle.
The data paths accessed and executed by the specified memory occupy resources that cannot be shared at the same time. For multi-cycle commands, if the commands are so complex that they cannot be completed within a single clock cycle, the pipeline will be blocked.
Arm Assembly Line Design Problems
1) Shorten the program execution time:
Tprog: the time required to execute a program;
Ninst: Number of commands for executing the program;
CPI: Average number of clock cycles for executing each command;
Fclk: the clock frequency of the processor.
Measures:
Increase the clock frequency fclk (resulting in an increase in the sequence of the pipeline ).
Reduce the average number of clock cycles for each command CPI (issues related to the pipeline need to be addressed)
2) streamline issues:
Structure-related: resource conflicts occur when some commands overlap in the pipeline.
Measure: 1) separate command cache and data cache are used. 2) The ALU uses a separate divider to complete address calculation.
Data-related: When a command requires the execution result of the preceding command and these commands overlap in the pipeline, it may cause data-related in the pipeline.
Data is related to "post-write", "post-write", and "post-read.
Measure: 1) bypass technology. 2) pipeline lock technology.
Control related: When the pipeline encounters branch commands and other commands that change the Pc value, control related occurs.
Measure: 1) introduce the delayed branch. 2) Calculate the Pc value (that is, the target address of the Branch) when the branch is successfully transferred as soon as possible ).
Arm's 5-level Assembly Line
Both the arm9-and strongarm architectures adopt 5-level pipelines.
I-Cache and D-Cache are added to separate the Access Point of the memory from data access;
Added Dedicated channels and registers for data write back;
Divides the execution process of commands into five parts:
Description: Extracts commands from the instruction memory and stores them in the instruction assembly line.
Instruction Decoding: decodes the instruction and reads the register operand from the register stack.
Execute: shift an operand to generate the ALU result. If the command is load or store, calculate the memory address in Alu.
Data Cache: access the data storage if needed; otherwise, the ALU result simply caches a clock period so that all commands have the same pipeline flow.
Write back: Write the result of the command to the Register heap.
Pipeline comparison
Excessive execution
By repeatedly setting multiple sets of command execution components and simultaneously processing and completing multiple commands, parallel operations can be achieved to increase the processing speed.
All arm kernels, including popular ARM7, arm9and arm11, are single-cycle instruction machines.
Arm's next-generation processor will be an over-the-standard machine capable of processing multiple instructions each cycle.
Hypervisor: A processor that executes multiple commands at the same time within a clock cycle.
Multi-Instruction Unit in excess Processor
The excessive processing capacity is compatible with the assembly line technology. To enable multiple commands to be sent simultaneously within a clock cycle, the over-standard processing unit must have two or more command lines that can work simultaneously. But at the same time, it also brings about the scheduling problem of multiple pipelines and the resource conflict problem of operation components.
The processor must dynamically check the instruction relevance during execution.
If the Code contains branch commands, you must separate the execution of the branch from the execution of the branch.
Computing execution time is almost impossible.