The third stage of Self-writing processor-the blueprint for the openmips processor for Teaching

Source: Internet
Author: User

I will upload my new book "self-writing processor" (not published yet). Today is the tenth article. I try to write it every Thursday.


This chapter describes how to implement the openmips processor for teaching. This chapter provides a blueprint for the openmips System for the tutorial version. First, it introduces the design objectives of the system. It details the 5-level pipelines implemented by the openmips processor program. Section 3.2 describes the interfaces of the openmips processor and their functions. Section 3.3 briefly explains the functions of various source code files. Finally, the implementation method of the openmips processor is described. Readers will find that the implementation methods provided in this book are completely different from those provided in the existing books, making it easier to understand and practice.

3.1 system design objectives 3.1.1 design objectives

The second part of this book is the teaching version of openmips processor. It is a 32-bit scalar processor with a Harvard structure and is compatible with the mips32 Release 1 Instruction Set architecture (Release 1 is not mentioned later ), the advantage is that you can use the existing MIPS compiling environment, such as the GCC compiler. The design objectives of openmips are as follows.

  • Five-level integer assembly line, namely: Finger fetch, decoding, execution, memory access, and write-back
  • Harvard structure, separated command and data interfaces
  • 32 32-bit integer registers
  • Big end Mode
  • Vectorized Exception Handling, supporting precise Exception Handling
  • Six external interruptions are supported.
  • With 32bit data and address bus width
  • Single-cycle Multiplication
  • Support for delayed transfer
  • Compatible with the mips32 instruction set architecture and supports all integer commands in the mips32 Instruction Set
  • Most commands can be completed within one clock period

The above design goals are easy to understand. Apart from delay transfer and precision exceptions, the former will be introduced in chapter 8th "Implementation of transfer instructions, the latter will be introduced in chapter 11th "Implementation of exception-related commands.

3.1.2 five-level integer Assembly Line

This book talks about the assembly line in computers. First, let's listen to the definition of the computer assembly line in Wikipedia: assembly line refers to splitting the computer instruction processing process into multiple steps, it also uses multiple hardware processing units for parallel execution to speed up command execution. There are two keywords: (1) split; (2) parallel. The instruction processing can be divided into at least three steps: extract the instruction from the memory, explain the instruction, and execute according to the explain result. Simply put, the instruction is interpreted, decoded, and executed. If we only have one hardware processing unit, this unit requires both the pointer, decoding, and execution. We assume that the above three operations can be completed at T, the processing time of one command is 3 TB, and the processing time of N commands is 3nt. However, if we design three hardware units to do one of these three tasks, then we can decode the next command while executing the command. We can also extract another command while decoding the next command. This is the classic three-level pipeline, as shown in 3-1.


As shown in Figure 3-1, it takes 5 TB to execute three commands on the third-level assembly line, and 9 TB to execute the commands without using the assembly line. The third-level assembly line is used in the arm7. However, nothing in the world is so simple and perfect. It is assumed that the time required for obtaining, decoding, and execution is T. Actually, this is not the case. For example, it may take a long time to obtain the pointer, assume that it takes 2 TB to get the data, as shown in Figure 3-2.


It can be seen that in the 3t-4t period and 5t-6t period, the pipeline is waiting for the end of the fetch finger. At this time, the decoding and execution stages are stuck, so that the process will naturally slow down. Finally, the time required to execute three commands is 8 TB. To solve this problem, the cache is introduced. The processor only needs one clock period to read the instruction from the cache.

Another case is that the execution phase is too long. For example, when the command is load/store, the execution phase may take longer than T due to access to the memory, at this time, the pipeline will also be stuck. In order to solve the problem of stagnant pipelines in this situation, five pipelines are introduced, namely: Fetch, decoding, execution, memory access, and write-back. See Figure 3-3.


Memory Access is used to load data from the memory to the register or save the register data to the memory. Of course, this step is not required if it is not the load/store command, in this case, the operation result of the execution phase is sent to the next level of write-back stage. Write back is used to write data to the destination register. The design goal of openmips is also the five-level pipeline. Specifically, the main work of each stage of the openmips five-level pipeline is as follows.

  •  Finger fetch stage: Read the instruction from the instruction memory and determine the next instruction address.
  •  Decoding stage: Decodes the instruction and reads the value of the Register to be used from the General Register. If the instruction contains an immediate number, the immediate number must be extended or unsigned. If the transfer instruction meets the transfer condition, the transfer target is given as the new instruction address.
  •  Execution phase: Perform operations based on the operands and operation types given in the decoding phase, and provide the operation results. If the load/store command is used, the target address of the load/store is also calculated.
  •  Memory Access stage: If it is a load/store command, the data storage will be accessed at this stage. Otherwise, the results of the execution stage will be passed down to the write-back stage. At the same time, it is necessary to determine whether an exception needs to be handled at this stage. If yes, the pipeline will be cleared and transferred to the entry address of the exception handling routine for further execution.
  •  Write-Back Stage: Save the operation result to the target register.

The reader may not fully understand the main work of the above pipeline at all stages, and it does not matter. This book is not a one-time implementation of all the above work, but a step-by-step improvement. At the beginning, only the basic work of each stage of the pipeline is realized, and the basic work is gradually enriched and improved.

3.1.3 instruction execution cycle

As mentioned in the openmips design goal, all integer commands in the mips32 instruction set can be implemented, and most commands can be executed within one clock cycle. Specifically, the clock cycle required for executing all commands implemented by openmips is shown in Table 3-1.


Table 3-1 provides the following descriptions.

(1) openmips plans to use trial commercial law to complete Division operations. For 32-bit division, the execution phase requires at least 32 clock cycles, plus some clock cycles required for preparation, the execution can be completed only after 36 clock cycles. In Chapter 2 "Implementation of arithmetic operation commands", the implementation process of Division commands is described in detail.

(2) the multiplication and accumulation commands MADD and maddu. the multiplication and subtraction commands Msub and msubu both require two clock cycles to complete the execution. This is mainly because the four commands have to perform two operations: One multiplication and one addition/subtraction. If the two operations are completed in one clock cycle of the execution phase, this will significantly increase the time required for the execution phase and reduce the operating clock frequency of openmips. Therefore, the openmips design uses two clock cycles to complete the four commands in the execution phase, multiplication is performed on one clock cycle, and addition/Subtraction is performed on the next clock cycle. In Chapter 2 "Implementation of arithmetic operation commands", we will introduce the implementation process of multiplication, accumulation, and multiplication and subtraction commands.

Openmips processor interface for tutorial 3.2

The external interface of the tutorial version openmips processor is shown in 3-4. The input interface is displayed on the left and the output interface is displayed on the right, which is intuitive and easy to understand. The descriptions of each interface are shown in Table 3-2, which can be divided into three types: System Control Interface (including reset, clock, interrupt), command memory interface, and data storage interface.



3.3 file description

Openmips is a five-level assembly line processor. The modules and corresponding files in each stage of the assembly line are shown in 3-5. In the figure, the module name is marked at the top of each module, and the corresponding file name is marked at the bottom. The relationship between modules is not drawn, because the relationship is complex and cannot be drawn in the book. For details, refer to "openmips module connection diagram" on the CD-Rom of this book. the detailed connection relationship between modules is drawn.


The details are as follows.

(1) Fetch phase

  • PC module: Provides the instruction address, in which the instruction pointer register PC is implemented. The value of this register is the instruction address. Corresponding pc_reg.v File
  • If/ID module: registers between the get-in and decoding phases are implemented, and the results of the get-in phase (obtained instruction, instruction address, and other information) are transmitted to the decoding phase at the next clock. Corresponding to the if_id.v File

(2) decoding stage

  • Id module: decodes commands. The decoding result includes the operation type, source operand required for the operation, and destination Register address to be written. Corresponds to the ID. V file.
  • Regfile module: Implements 32 32-bit general integer registers, which can read two registers and write one register at the same time. Corresponding to the regfile. V file.
  • ID/ex module: registers between the decoding and execution phases, and transmits the results of the decoding phase to the execution phase in the next clock cycle. Corresponds to the id_ex.v file.

(3) execution phase

  • Ex module: Performs specified operations based on the results of the decoding phase and provides the calculation results. Corresponds to the ex. V file.
  • Div module: The Division module. Corresponds to the Div. V file.
  • EX/MEM module: registers between the execution and memory access phase, and transfers the results of the execution phase to the Access Memory phase in the next clock cycle. Corresponds to the ex_mem.v file.

(4) memory access stage

  • Mem module: If commands are loaded and stored, data storage is accessed. In addition, exception judgment is performed in this module. Corresponds to the mem. V file.
  • MEM/WB module: registers between the memory access and the write-back phase. The results of the memory access phase are transferred to the write-back phase in the next clock cycle. Corresponding to the mem_wb.v file.

(5) Write-Back Stage

  • Cp0 module: corresponds to the cp0 coprocessor In the mips architecture.
  • Llbit module: implements the register llbit. This register is used in the process of link loading command ll and conditional storage command SC, which will be detailed in chapter 9th.
  • HiLo module: Implements register hi and Lo. These two registers are used in the process of multiplication and division commands, which will be detailed in chapter 7th.

In addition, there is a CTRL module, which is used to control the pause and purge actions of the entire assembly line, so it is not easy to classify it into a stage in the assembly line. It is drawn separately on the top 3-5 of the figure. Corresponds to the Ctrl. V file.

Appendix A of this book provides the interfaces of each module and descriptions of the functions of interfaces. Readers may feel that there are too many modules and too many interfaces for each module, which seems difficult to understand. This worry is unnecessary. I have mentioned that this book is not a single implementation of all the above modules, instead, we first implement some of the modules, which only implement a small number of interfaces as long as they meet our requirements. Then, as openmips implements more and more instructions, we will add modules and interfaces.

3.4 Implementation Method

Before writing this book, there have been some books that introduce the implementation of soft-core processors. These books have one thing in common when introducing implementation methods: one consideration of all instructions, all situations, then the code is provided. I think this is not an easy-to-use method, and it may not be used when the author implements the processor. In this book, the author draws on the "Incremental Model" in software development and uses a completely different implementation method: first consider the simplest situation and give the code, then, consider a little more situations, modify and supplement the code. With the increasing number of considerations, constantly modify and supplement the code, and finally implement the requirements.

In chapter 2, we have considered the simplest situation: to implement only one instruction. This instruction is the logic "or" Instruction Ori. With this instruction, you can build the structure of the openmips pipeline, as shown in Figure 3-6. Readers do not need to understand the specific meaning for the moment. They only need to compare them with Figure 3-7 to understand their respective complexity. The specific meaning will be described in Chapter 4th.


In subsequent chapters, we will implement logical operation commands, shift operation commands, empty commands, mobile operation commands, arithmetic operation commands, transfer commands, load storage commands, coprocessor access commands, and exception-related commands in sequence., finally, all integer commands defined in the mips32 instruction set architecture are implemented, as shown in Figure 3-7. Similarly, readers do not need to understand the specific meaning at this time. They only need to compare them with Figure 3-6 to understand their respective complexity.


Compared to figures 3-6 and 3-7, we will find that the complexity is greatly increased. If I want to implement the data flow diagram shown in Figure 3-7 from the very beginning, readers need to know all the commands defined in the mips32 Instruction Set and understand their functions. Obviously, it will increase the difficulty of understanding. A better way is to consider only the data flow diagram shown in Figure 3-6 at first. The reader only needs to understand the role of the instruction Ori, and then add more instructions step by step to enrich and improve the data flow diagram, the data flow diagram shown in Figure 3-7 is implemented. For example, when a transfer instruction is added, the "transfer judgment" module is added for the decoding phase in Figure 3-6. When an exception instruction is added, the "exception judgment" module will be added for the access phase in Figure 3-6 data flow diagram.

The above is the implementation method used in this book to implement the openmips processor. The next chapter will implement the minimum structure and only consider executing one instruction Ori.


Not complete to be continued!

The detailed module connection diagram of openmips can be downloaded at http://download.csdn.net/detail/leishangwen/7667697.


The third stage of Self-writing processor-the blueprint for the openmips processor for Teaching

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.