The development of pipelined technology for Intel series CPUs

Source: Internet
Author: User
Tags prefetch

The development of pipelined technology for Intel series CPUs

CPU (central processing Unit), also known as "microprocessor (microprocessor)", is the core component of modern computers. For PCs, CPU specifications and frequencies are often used as an important indicator of the strength of a computer's performance.

In the process of increasing CPU computing power, pipelining has a significant effect on the efficiency of CPU, just like the production of water in the automobile industry, it has a far-reaching impact on the development of processors.

Intel Company was founded in 1968 in the United States, throughout the history of it, few companies can be like Intel for many years remain strong vitality. As the largest CPU developer and producer in the world today, Intel is like a pillar of the information age, making an outstanding contribution to the development of information technology.

Next I will take a personal look at Intel's CPU pipeline technology development process and related technical methods, divided into four parts.

Note: The first machine with instruction Pipelining is IBM7030 as known as Stretch

A review of the development process of pipelining technology used in Inter series CPU chips

The pipeline is the first time Intel has started using the 486 chip. The pipeline works like an assembly line in industrial production. In the CPU consists of 5-6 different functions of the circuit unit to form an instruction processing line, and then a X86 instruction is divided into 5-6 steps and then executed by these circuit units, so that the implementation of a CPU clock cycle to complete an instruction, thus increasing the CPU's operation speed.

Classic Pentium Each integer pipeline is divided into four levels of water, that is, instruction prefetching, decoding, execution, writeback results, floating-point water is divided into eight levels of water.

Intel's first pipeline introduction, i486 Five-level pipeline:

The i486 processor introduced in 1989 introduces a five-stage pipeline. At this point, no more than one instruction is run on the CPU, and each pipeline runs different instructions at the same time. This design allows the i486 to increase performance more than one times over the same frequency of 386 processors. In the five-level pipeline, the instruction is removed from the instruction cache (the instruction cache in i486 is 8KB), the second stage is the decoding stage, the instruction is translated into a specific function operation, and the third stage is the transfer stage, which is used to transform the memory address and offset, and the fourth stage is the execution phase, and the instruction performs the operation The fifth stage is the exit phase, and the result of the operation is written back to register or memory. Because the processor runs multiple instructions at the same time, it greatly improves the performance of the program.

80486 solve data-related problems with data bypass.

80486 take advantage of the prefetch transfer target method to speed up the instruction branching operation.

But this phase of the CPU in the implementation of some data-related instructions, there will be pipeline congestion problem.

Pentium (Pentium) processor

1993 Intel launched the Pentium (Pentium) processor. The Pentium processor architecture adds a second independent superscalar pipeline. The main pipeline works like i486, and the second pipeline runs some simpler instructions in parallel, such as fixed-point arithmetic, and the pipeline can do it faster.

Pentium Pro (Pentium Pro) processor

1995 Intel launched the Pentium Pro (Pentium Pro) processor. The Pentium Pro uses a completely different design than the previous processor. The processor uses a number of new features to improve performance, including parts that are executed in a disorderly sequence and guessing execution . The pipeline expands to level 12, and introduces the concept of " superscalar pipelining ", so that many instructions can be processed simultaneously.

Pentium 4 processor

The Pentium 4 processor, released in 2002, introduced Hyper-Threading Technology . The design of the chaotic execution part allows the instruction to be executed faster than the processor can provide instructions. Therefore, for most applications, the CPU's chaotic execution parts are idle for most of the time, even under high load conditions. In order for the instruction flow to flow fully into the chaotic execution part, Intel joins the second front-end component (note: In the processor structure, the front end refers to the reference, decoding, register renaming and other modules, after the processing of the front-end parts, the instruction waits for the launch into the disorderly sequence of the execution parts). Although there is actually only one sequential execution part, it can see two processors for the operating system. The front-end part contains two sets of X86 registers of the same function, and two instruction decoders are processed separately according to the address pointed to by the two instruction pointers. All instructions are executed by a shared, disorderly execution part, but are not known to the application. The final result returns the virtual two processors when the execution of the order execution completes and exits the pipeline as before.

Second, summarize the methods of improving pipeline performance and CPU performance during Intel CPU development, and related technologies 80486

The Intel 80486 integer processing part realizes the instruction Pipelining, which belongs to the early pipelining technology and is representative.

The integer directive takes a 5-step instruction pipeline, and each step typically requires a clock cycle:

①PF Step--instruction Prefetch (Prefetch)

②D1 Step--Instruction decoding 1 (Decode Stage 1)

③D2 Step--Instruction decoding 2 (Decode Stage 2)

④ex Step--Instruction Execution (execute)

⑤WB Step-Writeback (write back)

80486 using "data bypass" to solve data-related problems, set up the relevant private path, that is, the previous instruction to write the results back to the Register group, the next instruction no longer read the Register group, but directly to the previous instruction of the ALU calculation results as their input data to start the calculation process, So that the operation that would otherwise need to be paused can continue, and the book is called "Directional technology".

80486 Use the branch prediction method to speed up the instruction branch operation.

But this phase of the CPU in the implementation of some data-related instructions, there will be pipeline congestion problem. For example, first write and then read.

Pentium processor

Pentium Architecture adds a second independent superscalar pipeline, two lines can be run in parallel, and each pipeline can have multiple instructions at different pipelining level execution. It can execute commands that are more than i486 at the same time.

Pentium Pro (Pentium Pro) processor

Disorderly execution: The processor executes the instruction in a sequence determined by the availability of the input data, rather than the original data of the program. In this way, you can avoid waiting for the processor due to the next program instruction, and instead handle the next command that can be executed immediately.

The Pentium Pro's order-of-execution components have 6 execution units: Two fixed-point processing units, a floating-point processing Unit, a pickup unit, a memory address unit, and a unit of memory. These two fixed-point processing units are different, one can handle complex fixed-point operations, one can handle two simple operations at the same time. In an ideal situation, the Pentium Pro's disorderly execution component can execute 7 micro instructions in a single clock cycle.

Guessing execution: Improves execution speed by interpreting and executing the program instructions that may be needed in advance.

12-level pipeline, Superscalar pipeline: In a clock cycle a pipeline can execute more than one instruction. A directive is divided into more than 10-segment instructions for different circuit units to complete.

Pentium 4 processor

Introduction of Hyper-Threading technology: The use of special hardware instructions, the two logical core simulation into two physical chips, so that a single processor can use thread-level parallel computing, and thus compatible with multi-threaded operating systems and software, reduce the idle time of the CPU, improve CPU calculation and operation efficiency.

Three perspectives on Intel CPUs Technical direction to improve pipelining performance

Improve the parallelism of instruction, set up device and assembly line repeatedly.
Increase device utilization and plan the order of execution of instructions.
Increase the device execution rate.

Just like the Tick-tock development strategy that Intel is currently performing, it takes turns on both sides of the architecture and process.

Four-course report learning experience

What I want to say is:

Before learning this knowledge, never thought of what is happening in the CPU, now feel the CPU is like a wonderful little world, the fast computing relies on not magic, but the unremitting struggle of generations of scientific research and engineering personnel, the CPU is really like a man-made wonderful world, Condensed the wisdom and imagination of mankind.

I use the computer every day, enjoy the information brought about by the rich knowledge and convenience, today's computer compared to the past, the operation is a lot simpler, while the function is also greatly enriched, which can not help people feel that the use of computers is a simple thing, I once thought so.

Now, there is a chance to gain a deeper understanding of the hidden little silicon chip in the PC in front of you.

Moore's Law promotes the construction of the whole society, in order to improve the computing rate people have created many wonderful and effective methods, both in architecture and process, in hardware and software optimization.

From single-line to ultra-line, from single-core to multi-core, witnessed a problem solved there are so many programs, which makes me quite sad: the face of a problem, the solution may be more than you can not think of, you must open the idea, do not let the thought be limited.

"Computer system Structure" although is a hardware-biased specialized course, but contains the philosophy and ideological enlightenment, will be in the future life and work to inspire me!

Resources:

1.80486 of the instruction pipeline Chanxiaojie

Http://www5.zzu.edu.cn/qwfw/wjyl/9-xntgjs/4a44c05f3f5a870e013f70e4d3f4207c.html

2.A Journey Through the CPU Pipeline

http://www.gamedev.net/page/resources/_/technical/general-programming/a-journey-through-the-cpu-pipeline-r3115

The development of pipelined technology for Intel series CPUs

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.