This article is reproduced from Inter official website: https://software.intel.com/zh-cn/articles/book-Processor-Architecture_CPU_work_process
The working process of the CPU
The basic work of the CPU is to execute the stored instruction sequence, which is the program. The process of executing a program is actually the process of constantly taking out instructions, analyzing instructions, and executing instructions.
The CPU takes an instruction from the main memory of the storage program, decodes and executes the instruction, saves the execution result, and then goes to fetch the instruction, decode, execute the instruction ..., and so on again and again and again, so that the computer can work automatically. This cycle will continue unless a shutdown instruction is encountered. It is shown in procedure 3-3
Figure 3-3 Program execution process
3.2.1 Instruction Execution Process
Almost all of von Neumann's CPUs can be divided into 5 stages: Fetch instruction, instruction decoding, execution instruction, access number, and result write-back. As shown in 3-4.
Figure 3-4 Execution of instructions
1. Take the instruction stage
The fetch instruction (instruction Fetch,if) stage is the process of taking an instruction from main memory to the instruction register.
A value in the program counter PC that indicates the position of the current instruction in main memory. When an instruction is removed, the value in the PC is automatically incremented according to the instruction word length. If the word length instruction, then (PC) +1? PC, if the double word instruction, then (PC) +2? PC, and so on.
2. Instruction decoding phase
Once the instruction is removed, the computer immediately enters the instruction decoding (instruction Decode,id) phase.
In the instruction decoding phase, the instruction decoder divides and interprets the retrieved instructions according to the predetermined instruction format, identifies and divides the different instruction classes and various methods to obtain the operands.
In the computer of Combinational logic control, the instruction decoder produces different control potentials to different instruction opcode to form different micro-operation sequences. In a micro-programmed computer, the instruction decoder uses the instruction opcode to find the entry of the micro-program that executes the instruction and executes it from this entry.
In a traditional design, the part of the CPU that is responsible for instruction decoding is the hardware that cannot be changed. However, in many new CPUs using micro-program control technology, micro-programs are sometimes overridable and can be modified by modifying the CPU of the finished product to change the decoding method.
3. Execution order Phase
After taking the instruction and instruction decoding phase, proceed to the execution instruction (EXECUTE,EX) stage.
This phase of the task is to complete the instructions specified by the various operations, the implementation of the instruction function. To do this, different parts of the CPU are connected to perform the required operations.
For example, if an addition operation is required, the arithmetic logic unit (ALU) is connected to a set of inputs and a set of outputs, and the inputs provide the values that need to be added, and the output will contain the result of the final operation.
4. Visit and access number stage
Depending on the instruction required, it is possible to access the main memory and read the operand, thus entering the access number (MEMORY,MEM) phase.
The task of this stage is to get the address of the operand in main memory according to the instruction address code, and read the operand from main memory for operation.
5. Result write back Stage
As the last phase, the result writeback (WRITEBACK,WB) phase writes the run result data of the execution instruction phase back to some form of storage. The resulting data is often written to the internal registers of the CPU for quick access by subsequent instructions. In some cases, the resulting data can also be written to a relatively slow, but inexpensive, and large-capacity main memory. Many instructions also change the state of the flags in the program status Word register, which identifies different operating results that can be used to influence the action of the program.
After the instruction is executed and the result data is written back, the computer then takes the next instruction address from the program counter PC, starts a new round of loops, and the next instruction cycle will normally take the next instruction in sequence if no unexpected events (such as results overflow, etc.) occur.
Many new CPUs can take out, decode and execute multiple instructions at the same time, reflecting the characteristics of parallel processing.
3.2.2 Instruction Cycle
1. Basic concepts of the instruction cycle
(1) Instruction period
The time required for the CPU to take out an instruction and execute the instruction is called the instruction cycle.
The length of the instruction cycle is related to the complexity of the instruction.
(2) CPU cycles
The instruction cycle is often represented by a number of CPU cycles.
Because the CPU is operating faster, and the CPU spends a long time accessing main memory, the CPU cycle is usually defined by the shortest time to read an instruction from main memory.
CPU cycles are also known as machine cycles.
(3) Clock cycle
A CPU cycle consists of several clock cycles.
The clock cycle is the most basic unit of time for processing operations and is determined by the machine's frequency.
The time width of a CPU cycle is determined by the sum of the number of clock cycles.
Figure 3-5 is the instruction cycle with a fixed-length CPU cycle.
Figure 3-5 Instruction Cycle
(4) The minimum time required to remove and execute any instruction is two CPU cycles.
Any instruction, its instruction cycle requires at least two CPU cycles, while the instruction cycle of a complex instruction requires more CPU cycles. This is because the take-out phase of an instruction requires a CPU cycle time, while the execution phase of an instruction requires at least one CPU cycle time. Because of the unequal number of CPU cycles required for the execution cycle of different complexity directives, the instruction cycles of the various instructions are also different.
Example 3-1: There is a program consisting of 5 typical instructions (shown in table 3-1), please analyze the instruction cycle of each instruction.
Table 3-1 a program consisting of 5 typical directives
Main Memory |
Operating Instructions |
Address |
instruction or data content |
... |
... |
|
020 |
Cla |
0→AC, accumulator AC zeroing |
021 |
ADD 30 |
(AC) + (30)? AC, the value of the accumulator AC is added to the data in main memory address 30, and the result is stored in the accumulator AC |
022 |
STA 40 |
(AC) 40, the value of the accumulator AC is stored in the main memory address 40 |
023 |
NOP |
Empty operation, no function |
024 |
JMP 21 |
Unconditional transfer to main memory address 21 start execution |
... |
... |
|
030 |
|
Number of operands |
... |
... |
|
040 |
|
Store the results of the operation |
... |
... |
|
Solution
①CLA directive
The CLA directive is a 0 instruction that does not access main memory, it requires 2 CPU cycles, where the take instruction phase requires 1 CPU cycles, and the execution instruction phase requires 1 CPU cycles.
Figure 3-6 CLA Instruction Cycle
In the 1th CPU cycle, that is, take the instruction phase, the CPU from the main access instruction, the program counter PC plus 1, and the instruction opcode decoding to determine what to do;
In the 2nd CPU cycle, the execution instruction phase, the CPU completes the operation required by the instruction.
②add 30 Instructions
The ADD 30 instruction is an instruction that accesses the primary access number and performs the addition, and its instruction cycle consists of 3 CPU cycles, where the fetch instruction phase requires 1 CPU cycles, and the execution instruction phase requires 2 CPU cycles.
Figure 3-7 ADD 30 instruction Cycle
In the 1th CPU cycle, the fetch instruction phase, the CPU accesses the instruction from the main and decodes it to determine what operation to perform;
The execution instruction phase consists of 2 CPU cycles, where in the 2nd CPU cycle, the CPU passes the address code (operand address) portion (30) of the instruction to the address register and completes the address decoding, while in the 3rd CPU cycle, the CPU accesses the operand from the primary and performs the addition operation.
③sta 40 Instructions
The STA 40 instruction is a stored instruction that accesses the main memory, and its instruction cycle consists of 3 CPU cycles, in which the fetch instruction phase requires 1 CPU cycles, and the execution instruction phase requires 2 CPU cycles.
Figure 3-8 STA 40 instruction Cycle
In the 1th CPU cycle, the fetch instruction phase, the CPU accesses the instruction from the main and decodes it to determine what operation to perform;
The execution instruction phase consists of 2 CPU cycles, wherein in the 2nd CPU cycle, the CPU will send the address code (operand address) portion (40) of the instruction to the address register and complete the address decoding, while in the 3rd CPU cycle, the CPU writes the contents of the accumulator register to the main deposit cell (40).
④NOP directive
The NOP instruction is an empty operation instruction, without any function, equivalent to the CPU idling, but still need 2 CPU cycles, where the take instruction phase requires 1 CPU cycles, the execution instruction phase requires 1 CPU cycles. (instruction cycle diagram is the same as Figure 3-6 CLA Directive)
In the 1th CPU cycle, the fetch instruction phase, the CPU accesses the instruction from the main and decodes it to determine what operation to perform;
In the 2nd CPU cycle, that is, the execution of the instruction phase, the operation Controller does not emit any control signal, the CPU does not do any operation.
⑤jmp 21 Instructions
The JMP 21 directive is a direct-addressed program control (transfer) instruction that consists of 2 CPU cycles, where the fetch instruction phase requires 1 CPU cycles, and the execution instruction phase requires 1 CPU cycles.
Figure 3-9 JMP 21 instruction Cycle
In the 1th CPU cycle, the fetch instruction phase, the CPU accesses the instruction from the main and decodes it to determine what operation to perform;
In the 2nd CPU cycle, which is the execution instruction phase, the CPU sends the address code (transfer address) portion (21) of the instruction to the program counter PC, thus changing the order of execution of the program and realizing the unconditional transfer of the program.
2. Show instruction cycle with instruction flowchart
In computer design, you can use the instruction flowchart to represent the instruction cycle of an instruction in the same way as the drawing program flowchart.
In the instruction flowchart,
Box: Represents an action step, and the contents of the box represent the operation of the data path or some kind of control operation.
A diamond box: usually used to denote a certain discriminant or test, and its action is attached to a square in front of it.
Male operation symbol "~": indicates that an instruction has been executed and transferred to a public operation. The so-called public operation, is a command after the completion of the CPU began to do some operations, these operations are mainly CPU to the peripheral request processing. If the peripheral does not request data to the CPU, then the CPU switches to the main access next instruction.
Example 3-2: For example 3-1 of the 5 typical instructions of the program, please use the instruction flowchart to indicate its instruction cycle.
Solution
Figure 3-10 indicating the instruction cycle with the instruction flowchart
As can be seen from Figure 3-10, the reference phase of all instructions is exactly the same and is a CPU cycle.
However, the execution phase of the instruction, because the function of each instruction is different, the CPU cycle used varies. Where the CLA, NOP, and jmp directives are a CPU cycle, the ADD, STA instructions are two CPU cycles.
The general instruction flowchart has a common process segment and many parallel branches. A public process segment is a sequence of processes that take instruction operations. The take instruction operation is the common operation step of each instruction, and all instruction reading steps are the same, so the operation flow of all read instructions is the same. The operation of the instruction phase is a different operation of each instruction, so after taking the instruction phase, the process is divided into many branches according to the instruction, and usually a branching process is arranged for each instruction.
Example 3-3:3-11 is shown as a dual-bus structure of the data path of the machine, the IR is the instruction register, the PC as a program counter (with self-amplification), M main (by r/w signal control), AR Primary memory address register, Dr for data register, ALU by +,-control signal to complete what operation, Control signal g is controlled by a gate circuit. In addition, the line is marked with a control signal, for example, Yi represents the y register input control signal, r1o is the Register R1 output control signal, the non-marked line is the straight line, is not controlled.
"ADD r2,r0" command Complete (R0) + (R2) →r0 function operation, try to draw its instruction cycle flowchart (assuming that the address of the instruction is placed in the PC), and lists the corresponding micro-operation control signal sequence.
Figure 3-11 Data path for dual-bus structured machines
Solution
The "ADD r2,r0" instruction is an addition instruction, the two numbers of participating operations are placed in registers R2 and R0, and the instruction cycle flowchart consists of taking the instruction phase and executing the instruction phase. Based on the given data path graph, the "ADD r2,r0" instruction is shown in the detailed instruction cycle flowchart 3-12, and the right part of the graph shows the micro-operation control signal sequence used in each machine cycle.
Figure 3-12 Detailed instruction cycle flowchart for the "ADD r2,r0" instruction
3.2.3 Timing Generator
1. Timing Signal
In the process of high-speed computer operation, each part of the computer must strictly abide by the time rules, there can be no error.
The coordinated action of each part in the computer needs time sign, and the time sign is reflected by the timing signal.
The timing signals required for the work of each part of the computer are uniformly generated by the timing generator in the CPU.
Example 3-4: The instructions and data represented in the binary code are all in main memory, so how does the CPU identify them as data or instructions?
Solution
In time, the take-command event occurs in the first CPU cycle of the instruction cycle, which occurs during the fetch instruction phase, and the fetch data event occurs in the following CPU cycles of the instruction cycle, which occurs at the execution instruction stage.
Spatially, if the code being removed is an instruction, then it must be sent to the instruction register, and if the code being fetched is data, it must be sent to the operator.
2. Timing Generators
The timing signal generator in the CPU, whose function is to use logic circuit to emit timing signal, realize timing control, so that the computer can work accurately, quickly and methodically.
The timing signal generator is the part that produces the instruction period control timing signal, when the CPU starts to fetch instruction and executes the instruction, the operation Controller takes advantage of the sequence of timing pulses generated by the timing signal generator and the different pulse intervals, providing various micro-operation timing control signals required by each part of the computer, organized, Rhythmic command of each part of the machine according to specified time action.
From the operation controller design method, the sequential circuit of the combinational logic controller is more complicated, and the timing circuit of the microcontroller is relatively simple.
3.2.4 control mode
The process by which a controller controls the operation of an instruction is the process of executing a deterministic sequence of operations sequentially.
In order for the machine to execute the instructions correctly, the controller must be able to produce operation control signals in the correct timing.
The method of controlling the timing signal of different operation sequences is called Controller control mode.
The control mode is usually divided into three kinds: Synchronous control mode, asynchronous control mode and joint control mode, which reflect the timing mode of timing signal in essence.
1. Synchronous control mode
Synchronous control means the execution of each step in the sequence of operations, which is controlled by a predetermined timing signal with a benchmark, which is characterized by a unified clock in the system and all control signals from this unified clock signal.
In synchronous control mode, in any case, the number of CPU cycles and clock cycles required for a given instruction to execute is fixed.
Synchronous control mode is sometimes referred to as fixed timing control mode or no answer control mode.
Depending on the situation, the synchronization control mode can choose the following options:
- Perform a variety of different instructions with a fully unified machine cycle. Obviously, for simple instructions and simple operations, this creates a waste of time.
- Use a variable length machine cycle. Most operations are scheduled to be completed in a shorter machine cycle, and for certain time-intensive operations, the method of prolonging the machine cycle is adopted to resolve the problem.
- The central control is combined with local control. The majority of the instructions are scheduled to be completed in a fixed machine cycle (called Central control), whereas for a few complex instructions (multiply, divide, floating point operations) Another timing is used (called Local control).
The design of synchronous control is simple and the operation control is easy to realize.
2. Asynchronous control mode
The asynchronous control mode is a kind of control mode which takes time according to each instruction, the actual need of each operation, and the time occupied by different instructions is decided according to the need.
In the asynchronous control mode, the instruction period of each instruction can be composed of the number of machine cycles, or it can be determined by the response signal of the controller after the executing part completes the CPU requirement. That is, each operation that the CPU accesses to control the signal is determined by the time it takes to take, each instruction, how much time it takes for each operation to control the signal.
Obviously, the operation control sequence formed in this way does not have a fixed number of CPU cycles and a strict clock cycle in sync with it, so it is called asynchronous mode.
Asynchronous control mode is sometimes referred to as variable timing control mode or response control mode.
Under the asynchronous control mode, the instruction is efficient, but the hardware implementation of the control circuit is quite complicated.
The asynchronous control method has been widely used in the computer. For example, CPU to main memory read and write, I/O devices and main memory data exchange and so on generally adopt asynchronous control mode, in order to ensure high speed at the time of execution.
3. Joint control mode
Modern computer system is generally used in the way of synchronous control and asynchronous control combined, that is, joint control mode.
The design idea of the joint control mode is that the synchronous control mode is adopted in the function parts, and the asynchronous control mode is used among the functional parts, and the asynchronous control mode is used as much as possible when the hardware is implemented.
The following two scenarios are typically selected for joint control:
- Most operation sequences are arranged in a fixed machine cycle, and for certain times it is difficult to determine the operation to perform the part's response signal as the end of the operation;
- The number of clock cycles in the machine cycle is fixed, but the number of machine cycles for each instruction cycle is not fixed.
The working process of the CPU