The previous chapter describes how C is translated into assembler and how the assembler is used. But how is the assembler implemented? For example (add%eax%edx) This instruction, we know its function, how does the processor execute the instruction to get the desired result? -This is the subject of this chapter.
(i) Y86 instruction set architecture
To simplify the problem, we do not use the instruction set architecture of Intel and ATT to abstract simplify a Y86. The Y86 defines various state elements, instruction sets and their coding, programming specifications, and exception event handling.
(ii) Memory and clock
Storage devices are controlled by the same clock. The clock is a periodic signal that determines when the new value is loaded into the device.
There are two types of storage devices:
1) Clock register (Hardware register), store single bit or word, clock signal control register load input value;
2) Random access memory (memory) to store multiple words, using the address to choose which word to read and write, including: The processor's virtual storage system, program registers and so on.
(Hardware and machine-level programming, the concept of "register" is a subtle gap.) In hardware, registers are directly connected to the circuit, while in machine-level programming, registers are the addressable words in the register file, and the address is the register ID. To avoid ambiguity, the two types of registers are called hardware registers and program registers, respectively. )
(Figure for Clock register operation, Y86 using clock Register to save program counter (PC), Condition Code (CC) and program status (Stat))
(iii) 6 major phases of instruction execution
The following is the decomposition of each of the Y86 directives corresponding to the 6 stages, which can be carefully contrasted against the above instructions
(iv) SEQ processor
We directly implement the above 6 phases of the corresponding hardware structure, called the SEQ processor
The above shows the phased implementation of each instruction, and now we summarize these instructions to give the instructions for each phase:
(the corresponding need_regids, Need_valc; SrcB, dstm; Alub; Mem_data, mem_write instruction case see textbook)
(v) Pipeline
The problem with SEQ is that in a clock cycle, you have to complete 6 stages. The clock cycle must therefore be very slow. So we use pipelining models.
Two important concepts of pipelining:
Throughput (throughput): The total number of customers served per unit of time;
Latency (Latency): The time required to service a user.
pipelining models require pipeline registers to be placed between each phase .
The following is a description of the pipeline legend:
Can see the great advantages of the pipeline, but the pipeline is not a bit of a problem?
1. Dividing system computing into a set of phases with the same latency is a daunting challenge;
2. If the pipeline is too deep, the revenue decreases. (now the processor uses a very deep (15+ stage) pipeline, Y86 with 5-stage pipeline)
3. Pipeline with feedback. 1) Data dependency: The adjacent instructions may be related to each other. Pipeline mode, the next instruction may require the data, the previous instruction has not been generated. 2) control dependency (control related): whether the next instruction executes dependent conditions, the previous instruction has not been calculated.
When we implement the pipeline structure, we have to solve the third problem.
(vi) Y86 pipeline realization--pipe-
First, for SEQ, we move the calculation of the PC to the reference stage (calculated at the beginning of the clock, not at the end), and then add pipeline registers between the stages:
On the basis of pipe-structure, according to our previous study, as long as we solve several major problems:
1. How to solve data related?
2. How to solve the control related? The emphasis is on RET directives and conditional jump instructions
3. How do I handle exception commands?
resolving data related :
This process needs to be carefully understood, especially when the status update is performed under clock control.
To address control-related :
This problem is divided into three parts: 1. How do I predict the next PC value? 2. How do I handle RET instructions? 3. How do I handle forecast error branches?
Handling Exception Directives :
For the control of related issues, but also need to do some more instructions, how to combine multiple control problems together, how should we handle? This issue is left to the end as a supplement,
(vii) Pipe processor implementation
According to the above discussion, as long as we add the solution to the above problem in pipe-, we get the pipe processor we want.
1. Implement forwarding . Only a few circuits can be added to achieve this.
Specific pipe of the HCL description, see textbook.
2. Implement suspend and cancel instructions
On the basis of this new type of pipeline register, it is easy to implement suspending an instruction or canceling an instruction, so that we can implement the solution of the previous problem.
(eight) combination of control-related situations
Csapp (3): How the processor executes the instruction