Reduced Instruction Set Computer (riduced Instruction Set Computer)
CISC (Complex Instruction Set Computer) Complex Instruction Set Computer
The so-called "architecture" refers to the processor resources that can be used by programmers during program design on a CPU. The most important among them is the instruction system and register group provided by the processor. Note the difference between architecture and structure: the former is the logical abstraction of the processor, and is a part of programmer's attention. The latter is the specific implementation, which is generally the concern of computer system designers. In general, arachitecture and structure are different levels of concept, but they also have a certain relationship.
Taking the design of the command system as an example: the same command system can be implemented through "Hard connection" or "microprogram. The former is implemented through the hardware circuit of the CPU, and the latter is implemented through the "microprogram. If the instruction set is implemented through hard connection, it is very difficult to design the circuit for complex instructions. If you use a microprogram to implement the instruction set, you can implement complex instructions. Modern CISC processors generally use microcodes.
There are two sets of commands at different levels in the microprocessor using microcode technology: one is for programmers and the other is for hardware and the other is for underlying microcodes. There is an interpreter between the instruction and the microcode, which translates the instruction into a corresponding microcode sequence. It can be imagined that the relationship between commands and microcodes is actually the promotion of the idea of "subprogram call.
For the implementation of CISC and RISC, they focus on different complexity: The implementation of the CISC processor is more complex, and the complexity of the Proteus compiler is higher. |
Compared with instructions, micro-code features:
1. microcodes represent very simple basic operations, and commands may be very complex.
2. The sub-code fetch operation is very fast: All the micro-codes are in the Rom, And the commands are in the memory. [Note_1]
3. The microcode format is very regular and simple. Therefore, decoding is easy.
4. The microcode execution speed is very fast, while the instruction speed is relatively slow.
From the perspective of the processor architecture, we can regard the basic units of modern CISC that use microcode technology as a fast CPU core. The problem arises: what if we don't introduce the interpreter and directly use the Proteus microcode as an instruction? -- This is what we think of in the future.
Let's take a look at the advantages and disadvantages of the CISC Instruction Set implemented by microcode:
The CISC Instruction Set tends to be complicated, that is, to align with advanced languages. processor vendors have provided some powerful and complex commands, such: intel provides the "string" Move command for the x86 processor based on move, which can copy data in the memory in bytes into blocks, equivalent:
While (n --) * DEST ++ = * SRC ++; |
This facilitates data structure replication. Other complex operations can also be implemented using one command. There are also a wide variety of addressing methods for CISC complex commands, and the operands can directly come from the memory. However, complex commands introduce problems to the pipeline technology widely used in modern processor technology: In a microprocessor, the execution of commands is generally divided into pre-pointing, "getting operands", and "computing ", "Storage" and other operations. For CISC complex commands, their execution time is different [note_2] (some can be completed within 4 or 5 clock cycles, but some need dozens of commands, even for simple commands, it may also lead to different execution times due to different addressing methods ). What's worse, the instruction length is also inconsistent, and the length of the same instruction varies according to different addressing methods. How to Design the pipeline length for these commands? If the assembly line is designed according to the shortest instruction, the assembly line will be interrupted when complex instructions are encountered. If the assembly line is designed according to the longest instruction, some stations will be skipped when short instructions are executed, make the assembly line not fully filled.
In view of the above situation and the 20-80 law (80% of the cases execute Common commands that account for 20% of the instruction set ). Most complex commands are rarely used. When using advanced languages for program design, the compiler generally does not generate special complex commands to be compatible with earlier CPUs. The CPU design can be simplified if you discard these complex commands that are not commonly used. This is the starting point of the server.
Features
1. A smaller number of command systems: the number of types is small, and only simple commands are provided. Most of these commands can be completed within 4 or 5 clock cycles.
2. The instruction operands must be pre-stored in registers, so that the time for getting the specified operation is unified.
3. The instruction length, addressing mode, and format are all uniform: this will make full use of the pipeline and basically achieve the goal of executing an instruction with a clock pulse.
4. The difference between the sub-program call of the RISC system and the CISC System: In the CISC system, the context must be stored in the stack during program call and return, and memory operations are required. Some of them are stored in registers, and the parameters are also transferred using registers. (If a nested subroutine call exists, the context in the intermediate call process still needs to splash out of the register (spill) into the stack, but the "leaf" subroutine does not need .)
5. The link to a specific sub-program can be regarded as a trace of a session in which the session is interrupted. When the session is interrupted in CISC, all the register content is pushed into the stack. The session is divided into lightweight and heavyweight ones. For lightweight interrupt, only the register content to be saved is saved; for heavyweight interrupt, the processing is like regular interrupt.
6. All of them use pipelines, high-speed cache, and do not use microcodes.
Of course, also has its disadvantages: the code density is not high, the size of executable files is large, and the compilation code is less readable. Low code density is a noteworthy issue: If the cache is not used, a larger command storage space is required, and greater memory bandwidth is occupied during finger fetch. If cache is used, the cache hit rate is reduced.
RISC vs arm
Arm has its own characteristics as a new player in the world of assume. The following section describes the comparison between ARM and RISC, and summarizes some misunderstandings about the concept of arm:
Arm's uniqueness
1. Arm provides a compressed Instruction Set: thumb, which encodes a subset of the arm instruction set into a 16-bit instruction set. The processor can switch to the thumb command mode during execution.
2. In arithmetic commands, you can shift the second operand before the operation (for example, ldreq r0, [R1, R2, LSR #16]!). Note: The shift is completed through a combined circuit without the need for a clock pulse, so it does not affect the instruction execution time.
3. Arm supports conditional execution of commands. Generally, processors only support conditional transfer of commands. Conditional transfer will invalidate Subsequent commands in the assembly line and "Cut off", while conditional execution avoids this situation. (When the execution part of a condition exceeds 3 instructions, it is better to use the conditional transfer instruction ).
4. Whether the execution result of the command affects the flag space in the Program Status Register is determined by the programmer: adding S (for example, add-Adds) to the operation code can change the flag space of the operation result.
Misunderstanding of the views on the RISC and CISC
1. All of the Proteus commands are simple commands.
Look at the previous ldreq r0, [R1, R2, LSR #16]! Command, which makes the General CISC processor unattainable. The "Simplicity" of the RISC lies in the uniformity of instruction set execution time, instruction length, and instruction format.
2. CISC's complex commands are slow and execution efficiency is low.
The modern CISC processor has a very long assembly line (piII uses a 25-level assembly line) and fully optimizes the execution of commands. However, the advantage of server guard is that the command execution time is the same for both the old CPU and new CPU, and there is no need to optimize the command execution. For the same complex commands, the execution cycle on piII may be much shorter than that on 386 when the CPU clock frequency is the same. That is to say, the CISC processor improves the instruction speed through continuous optimization of the processor. When paying attention to the execution speed of the same instruction, it must take into account the CPU used, but the other is not the same as the other.
3. More registers are needed than CISC processors.
This is not a requirement problem, but an implementation problem: the design of the CPU is simpler than that of the CISC, And the CPU occupies less space, thus freeing up space for storing registers. CISC can also be composed of many registers (68000 has the same number of registers as arm ). Of course, this concept is also completely false. In the case of the design of the computer science program, it requires a lot of registers.
4. There is an assembly line for all of them.
The assembly line is not used for arm2.
In short, the simplicity of the design of the Proteus processor makes it advantageous in terms of volume, power consumption, heat dissipation, and cost.
[Note_1] Introduction to the structure of Harvard, von noriman:
Feng. von norann pointed out that a program is only a (special) type of data that can be processed like data, therefore, data can be stored in the same memory together-this is the famous Feng. noriman principle.
Modern computers are based on the von noriman structure: the executable program image is located in the disk, and the OS loads it into the memory at runtime.
However, when I/O is used frequently and I/O data is large (such as network processors), Feng. the noriman structure introduces a bottleneck: imagine that when a peripheral initiates a DMA request and obtains the CPU license, the bus is occupied by the peripherals, And the CPU temporarily gives up control of the bus, it cannot access the memory to get the finger. In the early days, the CPU that did not implement the pipeline adopted the "cycle misappropriation" technology and used the idle clock period of the CPU to access the memory for DMA, which solved the problem of mutual exclusion between the CPU and DMA on the bus. However, with the adoption of the pipeline, in particular, in the Proteus processor, the CPU also executes commands while taking the finger, and there is no idle clock cycle for memory access. For a high number of registers, the data is usually in the register. During DMA, the CPU can execute commands. However, when the CPU is retrieved from the memory, the pipeline is damaged because the bus is occupied by peripherals.
To address these problems, the Harvard structure may be a solution: in embedded systems, we tend to adopt the "Harvard structure" that uses both program and data storage and two bus ". In the Harvard structure, the CPU can also perform finger pointing even if the data bus is occupied (when the CPU needs to access the data memory, it has to stop ).
In fact, modern processors use the caching technology widely, and data retrieval and retrieval are carried out through the cache. The cache can also be divided into the Harvard structure and the Von noriman structure: is it a unified cache, or a database and a program cache (which is called an improved Harvard structure )? In the CPU that uses the pipeline, it is ideal that the CPU needs to be indicated in each clock cycle. If the commands executed in the pipeline need to access the memory at the same time, the two will conflict with each other. Either let the pipeline pause for a shot, do not take the finger; or use the Harvard structure of the cache, take the finger and access the inner well water does not make the river.
[Note_2] concepts about cycles:
The time required to execute a command is calledInstruction cycleThe instruction cycle is often expressed by several clock cycles.
The recurrence of the clock pulse is calledClock cycleThe clock cycle is the basic time measuring unit of the CPU, which is determined by the computer clock speed.
The time required for information exchange between a CPU and external devices and internal memory is calledBus Cycle.