Design of FPGA-based eight-bit RISC CPU

Source: Internet
Author: User

From: http://www.icembed.com/info-19346.htm

With the rapid development in the field of digital communication and industrial control, dedicated Integrated Circuits (ASIC) are required to have higher and higher functions, lower power consumption, and shorter production cycle, these all pose a huge challenge to the chip design. The traditional chip design method can no longer meet the complex application requirements.

1 Introduction

SOC (system on a chip) is becoming increasingly popular with its high integration and low power consumption. Developers do not need to design ASIC from a single logic gate. Instead, they can quickly design an application using a function module with an existing IC chip, called a core or an intellectual property (IP) macro unit, the efficiency is greatly improved. The IP address core of the CPU is the core of SOC Technology. It develops cpu ip address verification with proprietary intellectual property rights. China keeps up with the world's advanced electronic technology, it is of great significance to improve the core competitiveness of the information industry in the world.

Reduced Instruction Set Computer (AES) is proposed for the Complex Instruction Set Computer CISC (Complex Instruction Set Computer). It has the following features: 1) a limited and simple instruction set; 2) emphasize the use of registers or the CPU is equipped with a large number of usable registers; 3) emphasize the use of command lines.

2 cpu ip core composition

Although the performance indicators and structure details of various CPUs are different, the basic functions to be completed are the same, which can be divided into eight basic components as a whole: clock generator, instruction registers, accumulators, arithmetic logic operation units of the CPU, data controllers, status controllers,ProgramController, program counter, and address multiplexing. The status Controller is responsible for controlling the operational relationships between each component. The specific structure and logical relationship 1 are shown.

The clock generator uses external clock signals to generate a series of clock signals for each part of the CPU. In order to ensure the hop variability of the signal after the frequency division, the synchronous state machine method is adopted in the design.

The command register stores the commands sent from the data bus into the register when the trigger clock clk1 is triggered by a positive jump. The Data Bus transmits data and commands in time-sharing mode, which is determined by the load_ir signal of the Status controller. The load_ir signal is input to the instruction register through the Enable ENA port. After resetting, the instruction register is cleared to zero. Each Command has two 16-bit bytes. The 3-bit high is the operation code, and the 13-bit low is the address line. The address bus of the CPU is 13 BITs, and the address space is 8 K bytes. In this design, the Data Bus is 8 bits, each command is taken twice, each time controlled by the variable state.

The accumulators are used to store the current computation results and are a data source in binary computation. After resetting, the value of the accumulators is zero. When the accumulators receive the load_acc signal from the CPU status controller through the enable signal ENA port, they receive the data from the data bus when the clk1 clock is jumping along.

 

 

Figure 1 CPU Structure

 

The Arithmetic Logic Operation Unit implements basic operations such as addition, sum, XOR, and redirection based on different input operation codes.

The function of the Data controller is to control the data output of the accumulators. Because the data bus is a public channel for various operations to transmit data, it is time-sharing, sometimes it transmits commands, and sometimes it needs to transmit data. In other cases, the data bus should be in high-impedance mode to allow use of other components. Therefore, a control signal is required when any component outputs data to the bus, and the start and stop of the control signal is determined by the signal output by the CPU status controller. The control signal datactl_ena determines when to output data in the accumulators.

The address multiplexing is used to output the PC (program counter) address or data/port address. The first four clock cycles of each instruction cycle are used to read the instruction from the Rom. The output should be a PC address, and the last four clock cycles are used to read and write the RAM or port, this address is provided by the command, and the output signal of the selected address is provided by the 8-division clock signal fecth.

The program counter is used to provide the instruction address for reading the instruction. The instruction is stored in the memory in order of address. There are two ways to form the instruction address. One is the execution of the program in sequence, second, run the JMP command to obtain the new command address.

The State Machine Controller accepts the reset signal RST. When the RST is valid, it can be set to 0 through the signal ENA and input to the state machine to stop the state machine. A state machine is the core of CPU control. It is used to generate a series of control signals, start or stop some components, and when the CPU reads and writes the I/O Ports and ram areas, all are controlled by state machines. The current state of the state machine is recorded by the variable state. The value of state is the number of clocks that have passed in the current instruction cycle. The instruction cycle is composed of eight clocks, each of which requires a fixed operation.

3 system timing

The reset and start operations of the risc cpu are triggered by the signal of the RST pin. When the RST signal enters the high level, the current operation will be terminated by the CPU, and as long as the RST stays in the high level, the CPU remains in the reset state, and all the status registers of the CPU are set to invalid. When the signal RST is back to the low level, the first fetch rising edge will start the work of the risc cpu, read the command from the beginning of the RoM 000 and execute the corresponding operation.

Command sequence. The first three clock cycles of each command are used to read the command ~ The 6-cycle reading signal RD is valid, the 7th-cycle reading signal is invalid, and the 8th-Cycle address bus outputs the PC address to prepare for the next instruction.

Command sequence. The write address is set up for the first 3.5 clock cycles of each command. The data is output for the fourth cycle, the write signal is output for the second clock cycle, and 5th clock ends, the output of the 7.5 clock cycle is the PC address, which is used to prepare for the next instruction.

Figure 2 shows the result of Waveform Simulation by ModelSim se6.0.

4 microprocessor commands

Data processing commands: data processing commands perform arithmetic and logical operations on the data in the register. Other commands only transmit data and control the execution sequence of the program. therefore, data processing commands are the only commands that can modify data values. Data Processing commands generally require two source operations to generate a single result. all operands are 8-bit wide, from registers, or from the immediate number defined in the instruction. each source operand register and result register are independently specified in the instruction.

 

 

Figure 2 read/write command sequence

Data transfer and Control Transfer commands: a total of 17 commands, excluding commands that control program transfer by Boolean variables. There are full-storage calling, long transfer, and absolute call and absolute transfer in a 2 kb block program space; full-space length relative shift and short relative transfer within one page; there are also conditional transfer instructions. These commands use acall, ajmp, lcall, ljmp, sjmp, M, JZ, jnz, one, djnz. The Control Transfer Instruction is mainly used to modify the 1x pointer to control the program stream. The registers used mainly include SP, PC, and IR.

Commands are composed of operation codes and operands. The purpose of the command circuit is to separate the instruction codes from the operands. The composition circuit is shown in 3. A command circuit consists of a program pointer, a program pointer parsing module, a Rom, an IR (instruction register), and a controller Status Register. The process of getting Command commands is as follows: the PC pointer value is assigned by the pc_mux module, and the commands in Rom are obtained and sent to the data input port of the command register. The instruction register is controlled by the Status Register. When the Instruction Signal is valid, the instruction code in the Rom is stored in the instruction register and decoded by the Controller to generate the control signal, control the increment of the PC pointer and extract the next instruction.

 

 

Figure 3 command circuit

 

5. Assembly

Assembler programs are developed to debug soft Cores. Manual coding of machine codes is prone to errors and requires a lot of work. The assembler must also modify the instruction set during the debugging process. Therefore, it is required that the compiler structure be simple and reliable, and stack can be used wherever necessary in the program.CodeProgramming Skills and assembler efficiency are not required. The assembler program is used to test the basic instruction set of the CPU. If each instruction of the CPU is correctly executed, it stops at the HLT instruction. If the program is paused at another address, a command error occurs. In the program, the hexadecimal form after the @ symbol represents the address of the memory, and the // post of each line represents the comment. The following is a small piece of program code. After compiled assembly machine code is loaded into the virtual ROM, you can start simulation by loading the data involved in the calculation into the virtual Ram.

 

Machine code address Assembly mnemonic comments

@ 00 // address Declaration

101000011000 // 00 begin: LDA data_2

2017_0001

011_11000 // 02 and data_3

2017_0010

100_11000 // 04 XOR data_2

2017_0001 00000000000 // 06 skz

2017_0000

000_00000 // 08 hlt // and does't work

 

6. debugging

The most basic debugging method is based on the development and simulation environment provided by FPGA vendors. testbench is written in the hardware description language to form a minimum operating environment. Testbench generates incentives for the target soft core, records the soft core output, and compares it with the expected value to determine the design error of the core. The advantage of this method is that the implementation is easy and the results are accurate, but the hardware description language encoding is large.

 

The accuracy of the simulation results, whether functional simulation or timing simulation, the simulation step size cannot be too small, resulting in the entire system simulation time is too long. In this design, we first integrate the sub-modules of the risc cpu to check whether the sub-modules are correct. If an error is found, we can check and verify the sub-modules in a small range. After the sub-module is integrated, a large module is separated from the peripheral device and the test module, as shown in figure 4, this is the technical schematic diagram generated by Xilinx ise7.1.
 

The overall result is only a general portal-level network table, but some logical relationships with, or, non-portal, and the actual configuration of the chip are still different. In this case, the implementation and layout cabling tools provided by FPGA/CPLD vendors should be used to establish the actual connection and ing of the internal functional units of the chip based on the selected chip model. This implementation and layout cabling tool is generally used by the manufacturer of the selected device, because only the producer knows the internal structure of the device, for example, the tool for implementing and deploying cabling in the ISE integrated environment is flow engine.

 

 

Figure 4 CPU Technology schematic

Stas (static timing analysis) Static timing analysis is a required step for FPGA design. After FPGA is constrained, integrated, and laid out for cabling, you can run Timing Analyzer in Ise to generate detailed time series reports. In this design, minimum period: 12.032ns (maximum frequency: 83.112 MHz ), minimum input arrival time before clock: 6.479ns, maximum output required time after Clock: 9.767ns. Then, the designer checks the timing Report, finds the path that does not meet the setup/hold time and the path that does not meet the constraints According to the prompts of the tool, and modifies the path to ensure that the data can be correctly sampled. In the later simulation, the delay of layout cabling is reversed to the design, so that the simulation includes both the door delay and the line delay information. This post-simulation is the most accurate simulation, which can reflect the actual work of the chip.

7 conclusion

The complex design of a CPU with a high degree of abstraction is a process from abstraction to concrete. Based on the structure characteristics of FPGA, this paper discusses the design and implementation of an eight-bit microprocessor Soft Core Design Method on FPGA, the design method and design reuse technology of the On-chip system are studied. The Instruction Set and debugging method are given. A design method of IP address based on FPGA is proposed. The author's innovation in this article is: according to the internal structure of Spartan II, address and data are optimized in the encoding phase, and the internal layout and wiring are reconfigured in the implementation phase, the designed microprocessor only occupies 78 slices and 1 block ram. It is implemented on 0.1 million chips and occupies 6% of resources.

References:

[1] Xia Yuwen. tutorial on designing a digital system using OpenGL [M]. beijing: Beijing University of Aeronautics and Astronautics Press, 2003. [2] yuan junquan sun minqi Cao Rui Digital System Design tutorial [M]. xi'an: Xi'an University of Electronic Science and Technology Press, 2002.

[3] J. bhasjer: a comprehensive and practical tutorial on the translation of Tilde HDL by Sun Haiping [M]. Beijing: Tsinghua University Press, 2004.

[4] Yang houjun Zhang gongjing Zhang Kun-zang's computer system structure-Pentium PC [M]. Beijing: Science Press, 2004.

[5] Feng Haitao, Wang Yonggang, Shi Jiang Tao, Yan Tianxin, Wang yanfang. design and Implementation of FPGA-based 32-bit integer microprocessor [J]. small computer system, 2005, 26 (6): 1113-1117.
Research and Design of eight-bit cpu ip core [D]. Dalian: Dalian University of Technology, 2005

[7] http://www.xilinx-china.com/

[8] yuan benrong, Liu Wanchun, Jia Yun, Zhu Yuwen, some basic methods for FPGA Design Using OpenGL [J]. Microcomputer Information, (6): 93-95

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.