Why is the actual frequency of only 2500g amd + processor running faster than the actual frequency of 2G P4-2.4B is faster? Why is the tulatin core processor with a 0.13 micron process capable of achieving a maximum of 1.4 GB? Instead, the Willamette core processor with a 0.18 micron process can easily achieve 2 GB? Next, let's analyze why the above two "strange circles" exist.
Each CPU has an "execution Pipeline" (hereinafter referred to as "Pipeline"). The CPU relationship between pipelines is similar to that between automobile assembly lines and automobiles. The CPU pipeline is not a physical pipeline or channel for data input and output. It is the "next thing to do" to execute commands ".
The execution of each instruction must go through the same steps. We call such steps "level ". The "level" tasks in the pipeline include the commands to be executed next to the Branch, the calculation results of the branch data, the storage location of the branch results, and the execution operations ...... The most basic CPU pipeline can be divided into 5 levels:
1. Obtain commands
2. Translation instructions
3. Calculate the operand
4. execute commands
5. Storage to high-speed cache
You may find that the descriptions of each level 5 mentioned above are very general. If you add some special levels, the pipeline will be extended:
1. Obtain command 1
2. Obtain command 2
3. Translation Instruction 1
4. Translation instruction 2
5. Calculate the operands.
6. Dispatch operations
7. Confirmation
8. execute commands
9. Store to cache 1
10. Store to high-speed cache 2
Both the most basic pipeline and the extended pipeline must complete the same task: receive commands and output calculation results.
The difference between the two is that the former has only five levels, and each level of the former has more work than each of the latter's 10 levels. If other details are the same, you must use the "5-level" pipeline in the first case. The reason is simple: 5-level data filling is much easier than 10-level data filling. And if the processor's pipeline is not always full of data, it will lose valuable execution efficiency-this will mean that the CPU execution efficiency will be compromised to some extent.
So what is the difference between the length of the CPU pipeline?
The key lies in that the pipeline length is not a simple repetition. It can be said that it refines the work at each level, so that the work at each level is simpler, therefore, in the "10-level" mode, the time required to complete each level of work is significantly faster than that in the "5-level" mode. The slowest (and most complex) "level" structure determines the speed of each "level" in the entire pipeline-Please remember this!
Assume that the first pipeline mode requires one clock cycle at each level, and the slowest time can be completed within 1 NS, the processor based on this pipeline structure can clock at 1 GHz (1/1ns = 1 GHz ). The current situation is that the number of pipelines in the CPU is increasing. Therefore, it is necessary to significantly shorten the clock cycle to provide performance equal to or higher than that of the shorter pipeline processor.
Fortunately, the work done in each clock cycle in a long pipeline is reduced, so even if the processor frequency is increased, each clock cycle is shortened, the time required for each "level" is reduced accordingly, so that the CPU can run at a higher frequency.
If the Second Pipeline mode above can increase the processor clock speed to 2 GHz, we should be able to achieve twice the performance of the original processor-if the pipeline remains fully loaded.
However, this is not the case. Any internal CPU pipeline may encounter errors during pre-reading. Once an error occurs, the command must be re-executed from the first level pipeline, if a CPU with a Level 5 pipeline executes a command, an error occurs when it reaches level 4th, the speed to re-execute this command from the first level of pipelines is much faster than that of a CPU with a 10 level of pipelines in case of a 8th level pipeline error, that is to say, we cannot fully utilize all the resources of the CPU. Why do we still need a CPU with a higher clock speed ??
Looking back a few years ago, let's look at the first release of the GHz and GHz Pentium 4 Processors: at that time, Intel increased the 10-level pipeline of the original Pentium three processor to 20 of Pentium four, and the pipeline length increased by 100%. The GHz Pentium 4 processor was originally listed as struggling, and the negative impact of ultra-long pipelines was the serious low execution efficiency caused by the error of pre-read commands, it is even impossible to match the Pentium three processors with a 1 GHz clock speed, but the obvious advantage is that the clock speed is greatly improved, because the 20-level pipelines are compared with the 10-level pipelines, the execution time of each level of pipelines is shortened. Although the execution efficiency is reduced, the frequency of the processor is determined based on the execution time of each level of pipelines, which has nothing to do with the execution efficiency, that's why Willamette's core Pentium 4 processor, which uses a 0.18 micron process, can easily achieve a 2 GB clock speed!
Of course, more sophisticated manufacturing techniques can also play a role in improving the clock speed of the processor. When Pentium 4 switched to the Northwood core of the 0.13 micron manufacturing process, the advantage of clock speed was greatly reflected, it has been flushed to 3.4 GB, the CPU of a long pipeline can only make full use of its advantages at a high frequency. It uses a very high frequency and a short clock cycle to make up for the time wasted in re-executing the command when the pre-read command fails..
However, the theoretical frequency limit of the Pentium 4 processor with a 20-level pipeline and a 0.13 micron Northwood core is 3.5 GB. What should I do?
Intel will always use the "lengthen Pipeline" this time-tested clock speed improvement approach-the new use of Prescott core Pentium four processor (commonly known as P4-E), actually used 31 level pipeline, through the above introduction, we can clearly conclude that the Prescott core's Benz 4 processor will be much slower than the Northwood core's Benz 4 processor in a clock cycle, that is to say that the initial P4-E is not faster than the P4-C, although the P4-E has a larger level 2 cache, but at the same frequency, the P4-E is definitely not the opponent of the P4-C, only when the clock speed of the P4-E to 5g or above, it is possible with the P4-3.4C CPU base, the famous CPU performance test software superpi can reflect this gap: P4-3.4E processor, operation pI value after decimal point 1 million takes 47 seconds, which is only equivalent to the result of the P4-2.4C, And the P4-3.4C operation only needs 31 seconds, the same frequency of the P4-3.4E is far behind !!
AMD 2500 + processor, uses a 10-level pipeline, only G clock speed can match g P4; Apple's G4 processor, but also uses a 7-level pipeline, only GB of clock speed can match the P4 of 2.8c, which is attributed to the higher execution efficiency brought by shorter pipelines, in terms of execution efficiency, Intel loses the pipeline length, but intel wins the pipeline length in terms of clock speed improvement. Most consumers are still unfamiliar with the professional problem of pipeline, people only know the one-sided, wrong, and absurd theory of "the higher the frequency of the processor, the faster the speed !! This is Intel's cleverness !!! (I personally think this passage is the core of the entire article, and I understand the essence of this article when I understand it)
The above briefly introduces the knowledge about the CPU execution efficiency and pipeline!