Instruction sets are an important part of the CPU architecture. The C language syntax is a high-level overview and abstraction of the methods used to solve real-world problems, including arithmetic, logical operations, and branch control, the Instruction Set is specific support for these abstractions. Compilation is only intended for developers to better remember instructions, but it corresponds to the machine code recognized by the CPU, therefore, assembly is also a low-level language.
CPU Command Execution generally includes the fetch, decoding, and execution. This is a classic three-level command execution pipeline. These three processes are often described in textbooks, and the same is the same as the arm7. However, the modern CPU design usually uses a more widely used five-level pipeline, that is, it can be divided into finger fetch, decoding, execution, access and write-back. Why is it divided into 5 levels? This is determined by the time of each stage of the pipeline. We can consider the assembly line of a factory in real life.
Assume that an assembly line has only three processes and three workers A, B, and C, the efficiency of this production line depends on the efficiency of the worker with the lowest efficiency. It is assumed that it takes 10 seconds for B to complete the process in charge, and 5 seconds for A and C to complete the process. A total of four products are required. The total time should be: 5 + 10*4 + 5 = 50 seconds. (the first 5 is the time for a to first perform the first step. At this time, B and c have to wait, the last 5 is that C must wait until B completes.) C is waiting, and B is always busy.
Of course, in any case, the execution of the pipeline is better than the completion of the pipeline, just like a, B, c is responsible for all the work done by a person, it takes 20 seconds to complete. 20*4 = 80 seconds.
The most rational scenario is that the efficiency of three people is the same, so there will be no waiting. So how can we solve the problem that B has the lowest efficiency? That is, the work of B is re-decomposed and divided into two processes, namely B1 and B2, which are completed in 5 seconds, and the total completion time is 40 seconds.
The three-level flow execution of CPU commands is a problem where the time for each step is uneven, that is, the fetch and decoding operations are usually fast, the execution includes operations and access registers, memory or write-back functions. Therefore, the execution time is generally longer than the retrieval and decoding time. the retrieval and decoding time can be completed within a single clock period, however, the execution takes two to three clock cycles to complete. To achieve higher streamline efficiency, we need to break down the execution part into execution (operations, etc.), memory access (memory), and write-back (registers ).
For software developers, the most important thing is to know the relationship between the current PC (Program count register) value and the current execution command. The CPU obtains commands Based on the memory address of the current PC. Therefore, the PC value always refers to the address of the current command step, decoding is the decoding of some CPU Circuits Based on the obtained instruction machine code, and selects the corresponding circuit to execute this execution, such as the addition circuit, subtraction circuit, and logic and circuit; execution is the execution process of this circuit.
The assembly line of ARM7 is:
From the figure, we can see that at T1, the CPU execution circuit executes the mov command, while the circuit obtains the sub command, therefore, the current running address of the mov circuit should be the current Pc value minus 8. if the currently running command is a function call (BL command), but the return address should be the address of the add command (PC minus 4 ).