The clock frequency is also called the unit MHz, which is used to indicate the CPU's operation speed. CPU's frequency = FSB x Frequency multiplier factor. Many people think that the frequency of the CPU to determine the speed of operation, this is not only a one-sided, but also for the server, this understanding has been biased. So far, no formula has been established to realize the numerical relationship between the frequency and the actual operation speed. Even the two largest processor manufacturers, Intel and AMD, there is also a big controversy, we from Intel's product development trend, we can see that Intel is focused on enhancing its own frequency of development. Like other processor manufacturers, someone once compared the whole of the United States with a fast 1G, which runs the equivalent of a 2G Intel processor.
Therefore, the CPU's frequency and CPU actual operational capacity is not directly related to the frequency of the CPU in the digital pulse signal oscillation speed. In Intel's processor offerings, we can also see an example where the 1 GHz Itanium chip can perform almost as fast as the 2.66 GHz Xeon/opteron, or 1.5 GHz Itanium 2 is about as fast as 4 GHz Xeon/opteron. CPU's operating speed also depends on the CPU's pipeline of various aspects of performance indicators.
Of course, the frequency and the actual speed is related, can only say that the clock is only the performance of the CPU one aspect, and does not represent the overall performance of the CPU.
2. FSB
The FSB is the base frequency of the CPU, which is also MHz. The FSB of the CPU determines the speed at which the entire motherboard runs. Frankly speaking, in the desktop, we call overclocking, is the FSB (of course, in general, CPU frequency is locked) believe this is very good understanding. But for the server CPU, overclocking is absolutely not allowed. In front of the CPU determines the speed of the motherboard, the two are synchronous operation, if the server CPU overclocking, changed the FSB, will produce asynchronous operation, (desktop many motherboards support asynchronous operation) This will cause the entire server system instability.
At present, most of the computer system FSB is also memory and the speed of synchronization between the motherboard, in this way, can be understood as the CPU FSB directly with the memory connection, to achieve the synchronous operation between the two states. FSB and front-end bus (FSB) frequency is very easy to confuse, the following front side bus introduces us to talk about the difference between the two.
3. Front-side bus (FSB) frequency
The front-end bus (FSB) frequency (i.e. bus frequency) is directly affecting CPU and memory direct data exchange speed. There is a formula that calculates that data bandwidth = (bus frequency x Data bandwidth)/8, the maximum bandwidth of data transfer depends on the width and frequency of all simultaneous transmissions. For example, now support 64-bit Xeon Nocona, the front-end bus is 800MHz, according to the formula, its maximum bandwidth of data transmission is 6.4gb/seconds.
FSB and front-end bus (FSB) frequency difference: the speed of the front-end bus refers to the speed of data transmission, FSB is the CPU and the speed between the motherboard synchronization. In other words, 100MHz FSB refers to the digital pulse signal in the oscillation 10 million times per second, while the 100MHz front-end bus means that the CPU can accept the amount of data transfers per second is 100mhzx64bit÷8byte/bit=800mb/s.
In fact, now "HyperTransport" the emergence of the framework, so that the actual sense of the front-end bus (FSB) frequency has changed. We knew before that the IA-32 architecture must have three major components: the memory controller hub (MCH), the I/O Controller hub, and the PCI hub, such as Intel's typical chipset Intel 7501, Intel7505 chipset, tailored for a dual to strong processor, The MCH includes a front-end bus with a frequency of 533MHz to the CPU, with DDR memory and a front-end bus bandwidth of up to 4.3gb/seconds. But as processor performance continues to improve, it poses a lot of problems for the system architecture. and the "HyperTransport" architecture not only solves the problem, but also improves the bus bandwidth more effectively, for example, the AMD Opteron processor, the flexible hypertransport I/O bus architecture enables it to integrate the memory controller, Enables the processor to exchange data directly and in memory without passing the system bus to the chipset. In this case, the front-end bus (FSB) frequency in the AMD Opteron processor does not know where to start.
4, the CPU bit and word length
Bit: in digital circuits and computer technology using binary, code only "0" and "1", which either "0" or "1" in the CPU is a "bit."
Word Length: The number of digits in the computer technology that the CPU can process at one time (at the same time) is called word length. So the CPU that can handle 8 bits of data is usually called a 8-bit CPU. Similarly, 32-bit CPUs can handle binary data with a word size of 32 bits per unit time. The difference between byte and word length: Since the commonly used English characters are represented by 8-bit binaries, 8 bits are usually called a byte. Length of Word is not fixed, for different CPU, word length is not the same. A 8-bit CPU can process only one byte at a time, while a 32-bit CPU can handle 4 bytes at a time, and the same-word 64-bit CPU handles 8 bytes at a time.
5. Frequency doubling coefficient
The frequency multiplier coefficient refers to the relative proportional relationship between CPU frequency and FSB. In the same FSB, the higher the frequency of the CPU is higher. But in fact, in the same FSB premise, the high frequency CPU itself is not significant. This is because the speed of data transfer between the CPU and the system is limited, blindly pursuit of high frequency and the CPU will have a significant "bottleneck" effect-cpu from the system can not meet the limit of the speed of CPU operations. In general, except for the engineering version of Intel's CPU is locked frequency multiplier, and AMD has no lock before.
6. Caching
Cache size is also one of the important indicators of the CPU, and the structure of the cache and the size of the CPU speed is very large, CPU cache operating frequency is very high, and the processor is generally the same frequency of operation, work efficiency far greater than the system memory and hard drive. In actual work, the CPU often needs to read the same block of data repeatedly, and the increase of cache capacity can greatly increase the hit rate of the CPU's internal reading data, instead of looking for the memory or hard disk, to improve the system performance. But because of CPU chip area and cost factor to consider, the cache is very small.
L1 Cache (first-level cache) is CPU cache, which is divided into data cache and instruction cache. Built-in L1 cache capacity and structure of the CPU performance, but the cache is composed of static RAM, the structure is more complex, in the CPU core area can not be too large, the capacity of the L1-level cache can not do too much. The capacity of the L1 cache for a generic server CPU is usually 32-256kb.
L2 cache (Level two caching) is the second layer of CPU cache, divided into internal and external chips. The internal chip level two cache runs the same speed as the frequency, while the external level two cache is only half the frequency. L2 cache capacity also affects CPU performance, the principle is that the larger the better, now the largest household CPU capacity is 512KB, and servers and workstations with CPU L2 cache up to 256-1MB, some up to 2MB or 3MB.
L3 Cache (Level three buffer), divided into two, the early is external, now are built-in. In fact, the application of the L3 cache can further reduce the memory latency and improve the performance of the processor while calculating the large amount of data. Reducing memory latency and increasing the ability to compute large amounts of data can be very helpful to the game. Increasing the L3 cache in the server domain still has a significant performance improvement. For example, a configuration with a larger L3 cache uses physical memory more efficiently, so its slower disk I/O subsystem can handle more data requests. Processors with larger L3 caches provide more efficient file system caching behavior and shorter message and processor queue lengths.
In fact, the earliest L3 cache was applied to the AMD-released K6-III processor, when the L3 cache was limited to the manufacturing process and was integrated into the motherboard instead of being integrated into the chip's interior. The L3 cache, which can only be synchronized with the system bus frequency, is not much worse than the main memory. The L3 cache was later used by Intel as an Itanium processor for the server market. Then there is the P4ee and Xeon MP. Intel also intends to launch a 9MB L3 cache Itanium2 processor, and a dual-core Itanium2 processor for later 24MB L3 caching.
But basically the L3 cache is not very important to the performance of the processor, for example, the Xeon MP processor with 1MB L3 cache is still not an opponent of Opteron, this shows that the increase in the front-end bus is more efficient than the increase in cache performance.
7.CPU Extended Instruction Set
CPU relies on instructions to calculate and control the system, each CPU in the design of a series of its hardware circuit with a set of instruction system. command is also an important indicator of CPU, instruction set is one of the most effective tools to improve the efficiency of microprocessors. From the current mainstream architecture, instruction set can be divided into complex instruction set and simplified instruction set, and from the specific application, such as Intel's MMX (Multi Media Extended), SSE, SSE2 (streaming-single instruction Multiple data-extensions 2), SEE3 and AMD 3dnow! are all CPU extensions, enhancing CPU multimedia, graphics and Internet processing capabilities. We usually refer to the CPU's extended instruction set as the "CPU instruction set". The SSE3 instruction set is also currently the smallest instruction set, before MMX contains 57 commands, SSE contains 50 commands, SSE2 contains 144 commands, SSE3 contains 13 commands. Currently SSE3 is also the most advanced instruction set, the Intel Prescott processor has supported the SSE3 instruction set, AMD will add support for the SSE3 instruction set in the future dual-core processors, and this instruction set will also be supported by the U.S.-wide processor.
8.CPU kernel and I/O operating voltage
Starting from 586CPU, the CPU's operating voltage is divided into kernel voltage and I/o voltage, usually the core voltage of the CPU is less than or equal to I/O voltage. Kernel voltage is based on the size of the CPU production process, the general production process is smaller, the core operating voltage is lower; I/o voltage is generally 1.6~5v. Low voltage can solve the problem of excessive power consumption and high fever.
9. Manufacturing process
The micron of manufacturing process refers to the distance between circuit and circuit in IC. The tendency of manufacturing technology is to develop towards the higher density. The higher the density of IC circuit design, means that in the same size of the IC, you can have a higher density, more complex functions of the circuit design. Now the main 180nm, 130nm, 90nm. The Government has recently indicated that it has a 65nm manufacturing process.
10. Instruction Set
(1) CISC instruction set
The CISC instruction set, also known as the complex instruction set, is the English name CISC, (Complex instruction set computer abbreviation). In the CISC microprocessor, the instructions for the program are executed sequentially in sequential order, and the operations in each instruction are serially executed sequentially. The advantage of sequential execution is that the control is simple, but the utilization of various parts of the computer is low and the execution speed is slow. It is actually a x86 series (i.e. IA-32 architecture) CPU produced by Intel and its compatible CPU, such as AMD, via. Even now the new x86-64 (also be AMD64) belong to the category of CISC.
You need to know what the instruction set is, and start with the current X86 architecture CPU. The X86 instruction set is specifically developed by Intel for its first 16-bit CPU (i8086), and the cpu-i8088 (i8086 simplified version) of the world's first PC, launched in IBM1981, is also a X86 directive. At the same time, in order to improve the ability of floating point data processing, the X87 chip is added, then the X86 instruction set and X87 instruction set are collectively called X86 instruction set.
Although as CPU technology continues to evolve, Intel has developed newer i80386, i80486 until the past PII Xeon, Piii Xeon, Pentium 3, and finally to today's Pentium 4 series, Xeon (excluding Xeon Nocona), But in order to ensure that computers continue to run the various applications developed previously to protect and inherit rich software resources, all the CPUs produced by Intel still continue to use the X86 instruction set, so its CPU still belongs to the X86 series. Because the Intel X86 series and its compatible CPUs (such as the AMD Athlon MP,) use the X86 instruction set, they form today's vast X86 series and compatible CPU lineup. X86CPU currently includes both Intel's server CPU and AMD's server CPU.
(2) RISC instruction set
RISC is the abbreviation of "Reduced instruction set Computing" in English, meaning "simplified instruction set" in Chinese. It is developed on the basis of the CISC instruction system, and some people test the CISC machine to show that the use frequency of various instructions is quite disparity, the most commonly used is some simple instructions, they only accounted for 20% of the total number of instructions, but the frequency of the program appears to account for 80%. Complicated instruction system will inevitably increase the complexity of the microprocessor, so that the development of the processor time is long and the cost is high. and complex instructions require complex operations that will inevitably degrade the speed of the computer. For the above reasons, the the 1980s RISC CPU was born, as opposed to the CISC-type CPU, RISC-type CPU not only streamlined the instruction system, but also adopted a so-called "superscalar and super Pipeline structure", greatly increasing the parallel processing capacity. RISC instruction set is the development direction of high performance CPU. It is relative to the traditional CISC (complex instruction set). In comparison, RISC's instruction format is unified, the type is less, and the addressing method is less than the complex instruction set. Of course the processing speed is much improved. At present in the high-end server commonly used in this instruction system CPU, especially high-end servers are all using RISC instruction system CPU. RISC instruction system is more suitable for high-end server operating system UNIX, now Linux also belongs to the Unix-like operating system. RISC CPUs are incompatible with the software and hardware of Intel and AMD CPUs. Www.jz5u.com
At present, in the high-end server using RISC instruction CPU mainly has the following categories: PowerPC processor, SPARC processor, PA-RISC processor, MIPS processor, Alpha processor.
(3) IA-64
Whether epic (explicitly Parallel instruction Computers, exact parallel instruction Computer) is a successor to the RISC and CISC system has been a lot of controversy, in the epic system alone, It is more like an important step in the way Intel's processors move towards RISC systems. In theory, the epic system designed the CPU, under the same host configuration, the processing of Windows software than based on UNIX application software is much better.
Intel uses the EPIC technology server CPU as Itanium Itanium (Development Code, Merced). It is a 64-bit processor and is the first in the IA-64 series. Microsoft has also developed an operating system code-named Win64, which is supported on software. After Intel adopted the X86 instruction set, it turned to a more advanced 64-bit microprocessor, which Intel did because they wanted to get rid of the massive x86 architecture to introduce energetic and powerful instruction sets, so the IA-64 architecture of the epic instruction set was born. IA-64, in many ways, has made great strides than x86. Break through a lot of limitations of traditional IA32 architecture, in the data processing ability, system stability, security, usability, reasonable and so on has obtained the breakthrough enhancement.
The biggest drawback of IA-64 microprocessors is their lack of compatibility with x86, and Intel is better able to run two-dynasty software for IA-64 processors, on IA-64 processors (Itanium, Itanium2 ...). The x86-to-ia-64 decoder is introduced so that the x86 instruction can be translated into IA-64 instruction. This decoder is not the most efficient decoder, nor is it the best way to run x86 code (the best way is to run the x86 code directly on the x86 processor), so Itanium and Itanium2 performance is very poor when running x86 applications. This has also become the root cause of x86-64.
(4) x86-64 (AMD64/EM64T)
AMD is designed to handle 64-bit integer operations at the same time and is compatible with the X86-32 architecture. which supports 64-bit logical addressing, the conversion to 32-bit addressing option is provided, but the data manipulation instruction defaults to 32-bit and 8-bit, providing options for converting to 64-bit and 16-bit, support for general-purpose registers, and, in the case of 32-bit operations, extending the result to a full 64-bit. In this way, the directive has the distinction of "direct execution" and "Conversion execution", and its instruction field is 8-bit or 32-bit, which prevents the field from being too long.
The generation of x86-64 (also known as AMD64) is not groundless, the 32bit addressing space of the x86 processor is limited to 4GB memory, and the IA-64 processor is not compatible with x86. AMD fully consider the needs of customers, strengthen the function of the x86 instruction set, so that the set of instructions can support 64-bit operation mode, so AMD called their structure as x86-64. AMD has introduced new R8-R15 general-purpose registers as an extension of the original X86 processor registers for 64-bit operations in the X86-64 architecture, but it is not fully used in 32-bit environments. The original registers such as EAX and EBX were also expanded from 32 to 64 bits. 8 new registers have been added to the SSE unit to provide support for SSE2. The increase in the number of registers will lead to performance improvements. At the same time, in order to support both 32 and 64-bit code and registers, the X86-64 architecture allows the processor to work in the following two modes: Long mode and legacy mode (genetic mode), and long mode is divided into two seed modes ( 64bit mode and compatibility mode compatibility modes). The standard has been introduced into the Opteron processor in the AMD server processor.
This year also launched the support of 64-bit EM64T technology, and has not been officially EM64T before the ia32e, which is the name of the Intel 64-bit extension technology, to distinguish the X86 instruction set. Intel's EM64T supports 64-bit sub-mode, similar to AMD's X86-64 technology, employs 64-bit linear planar addressing, adds 8 new Universal registers (GPRs), and adds 8 registers to support SSE directives. Similar to AMD, Intel's 64-bit technology will be compatible with IA32 and ia32e, and will only be ia32e when running 64-bit operating systems. The ia32e will consist of 2 Sub-mode: 64-bit Sub-mode and 32-bit sub-mode, which are backward compatible with AMD64. Intel's EM64T will be fully compatible with AMD's X86-64 technology. Now that the Nocona processor has added some 64-bit technology, Intel's Pentium 4E processor also supports 64-bit technology.
It should be said that both are compatible with the x86 instruction set of the 64-bit microprocessor architecture, but EM64T and AMD64 still have some different places, AMD64 the NX bit in the processor will not be provided in Intel's processor.
11. Ultra-assembly line and exceeding standard quantity
In the interpretation of the hyper-assembly line and superscalar before the first understanding of the assembly line (pipeline). The assembly line was the first time Intel had been used in 486 chips. The line of work is like an assembly line in industrial production. In the CPU by the 5-6 different functions of the circuit unit composed of a command processing line, and then a X86 instruction into 5-6 steps and then by these circuit units to execute, so that can be implemented in a CPU clock cycle to complete an instruction, thus increasing the CPU's operation speed. Classic Pentium each integer assembly line is divided into four levels of water, that is, instructions prefetching, decoding, execution, write back results, floating water and divided into eight-class water.
Superscalar is through the built-in multiple lines to execute multiple processors at the same time, the essence of the space in exchange for time. and the Super assembly line is through the refinement of water, improve the frequency, so that in a machine cycle to complete one or more operations, the essence of the time in exchange for space. For example, the Pentium 4 assembly line is up to 20 levels. The longer the step (level) of the pipelining design, the faster it can complete an instruction, and therefore adapt to the CPU with higher operating frequency. But the pipeline too long also brought some side effects, it is likely to have a higher frequency of CPU actual operation speed is low, Intel Pentium 4 appeared this situation, although its frequency can be as high as 1.4G, but its operational performance is far less than the AMD 1.2G Speed Dragon or even Pentium III.
12. Package Form
CPU encapsulation is the use of specific materials to the CPU chip or CPU module in which to prevent damage protection measures, generally must be packaged after the CPU can be delivered to users. CPU packaging depends on the form of CPU installation and device integration design, from the large classification of the CPU is usually installed using the socket socket to use the PGA (grid array) mode of encapsulation, and the use of slot x slot installed in the CPU is all the SEC (unilateral socket box) of the form of encapsulation. Now there are encapsulation technologies such as PLGA (Plastic Land grid array), OLGA (Organic land grid array). Due to the increasingly fierce market competition, the current development direction of CPU packaging technology to save the cost of the main.
13, multithreading
At the same time multithreading simultaneous multithreading, referred to as Smt. SMT can be copied by the structure state on the processor, allow multiple threads on the same processor to synchronously execute and share the processor's execution resources to maximize the wide launch and disorderly handling of superscalar, to improve the utilization of processor computing components, and to ease the access memory latency caused by data correlation or cache misses. When no more than one thread is available, the SMT processor is almost the same as the conventional wide emission superscalar processor. The most attractive thing about SMT is that it can dramatically improve performance by simply changing the design of the processor core at a small scale, with virtually no additional cost added. Multi-threaded technology can be used to prepare more data to be processed for high speed computing core, and reduce idle time of operation Core. This is certainly very attractive for desktop low-end systems. Intel starts with 3.06GHz Pentium 4, all processors will support SMT technology.
14. Multi-core
Multi-core, also refers to single chip multiprocessor (Chip multiprocessors, abbreviated CMP). CMP is proposed by Stanford University, the idea is to integrate the SMP (symmetric multiprocessor) in the large-scale parallel processor into the same chip, each processor executes different processes in parallel. Compared with CMP, the flexibility of SMT processor structure is more outstanding. However, when the semiconductor process enters 0.18 microns, the line delay has exceeded the gate delay, requiring the microprocessor to be designed by dividing many smaller, locally better basic unit structures. In contrast, since the CMP structure has been divided into several processor cores to design, each core is relatively simple, conducive to optimal design, and therefore more promising. At present, both IBM's power 4 chip and Sun's MAJC5200 chip have adopted the CMP structure. Multi-core processors can share caching within the processor, improve cache utilization, and simplify the complexity of multiprocessor system design.
New processors from Intel and AMD will also be incorporated into the CMP structure in the second half of 2005. Xinan Processor Development code for the Montecito, the use of dual-core design, with a minimum of 18MB in-chip cache, to take 90nm process manufacturing, its design is definitely called the current chip industry challenges. Each of its individual cores has a separate L1,L2 and L3 cache, containing about 1 billion transistors.
15. SMP
SMP (symmetric multi-processing), abbreviated for symmetric multiprocessing, refers to a collection of processors (multiple CPUs) on a single computer, shared memory subsystems between CPUs, and bus structures. With the support of this technology, a server system can run multiple processors at the same time and share memory and other host resources. Like double Xeon, which is what we call two, this is one of the most common in symmetric processor systems (Xeon MP can be supported to four-way, AMD Opteron can support 1-8-way). There are also a few 16-way. But in general, the SMP structure of the machine scalability is poor, it is difficult to do more than 100 multiprocessor, the general is generally 8 to 16, but this for most users are enough. The most common in High-performance server and workstation-level motherboard architectures, such as UNIX servers, can support up to 256 CPUs.
The prerequisite for building an SMP system is that the hardware that supports SMP includes the motherboard and CPU, the system platform that supports SMP, and the application software that supports SMP.
To enable SMP systems to perform efficiently, the operating system must support SMP systems such as Winnt, LINUX, and Unix, and so on 32-bit operating systems. That is, multitasking and multithreading can be done. Multitasking is the ability of the operating system to allow different CPUs to perform different tasks at the same time; multithreading means that the operating system enables different CPUs to accomplish the same task in parallel.
To build an SMP system, there is a high demand for the selected CPUs, first of all, the internal APIC (Advanced programmable Interrupt controllers) unit must be built inside the CPU. The core of the Intel Multi-processing specification is the use of the Advanced Programmable Interrupt controller (Advanced programmable Interrupt controllers--apics); again, the same product model, the same type of CPU core, exactly the same operating frequency Finally, keep the same product sequence number as possible, as two production batches of CPUs run as a dual processor, it is possible that one CPU burden is too high, while the other is less burdensome, unable to maximize performance and, worse, may cause a panic.
16. NUMA Technology
NUMA, an inconsistent access distributed shared storage technology, is a system composed of several independent nodes connected by a high-speed private network, each node being either a single CPU or an SMP system. In Numa, Cache consistency has a variety of solutions that require the support of the operating system and special software. Figure 2 is an example of a sequent company NUMA system. There are 3 SMP modules connected by a high-speed private network, consisting of a node that can have 12 CPUs per node. A system like Sequent can reach up to 64 CPUs or even 256 CPUs. Obviously, this is based on SMP, and then the use of NUMA technology to expand, is the combination of these two technologies.
17. Disorderly Sequence Execution Technology
Random Execution (out-of-orderexecution), refers to the CPU allows the number of instructions not in accordance with the order of the procedures required to develop to the corresponding Circuit unit processing technology. This will be based on the status of the circuit unit and the individual instructions can be implemented in advance of the specific situation analysis, will be able to execute the instructions in advance immediately sent to the corresponding circuit unit execution, in this period does not follow the prescribed sequence of instructions, and then by rearranging the units will be the results of the order of The purpose of the chaotic execution technology is to make the internal circuit of the CPU run at full load and improve the speed of the CPU's running program accordingly. Branch Technology: (branch) instructions to the operation of the need to wait for the results, the general conditions of the branch only need to be executed in order, and the conditions of the branch must be processed according to the results, and then decide whether the original order.
18, the CPU internal memory controller
Many applications have more complex read patterns (almost randomly, especially when cache hit are unpredictable) and do not use bandwidth efficiently. Typical such applications are business processing software, which is limited by memory latency even with CPU features such as disorderly execution (out of order execution). This allows the CPU to wait until the data required for the operation is dividend loaded to execute the instruction (whether the data is from the CPU cache or the primary memory system). The current low segment memory latency is about 120-150ns, and CPU speed is above 3GHz, and a single memory request can waste 200-300 CPU cycles. Even if the cache hit rate reaches 99%, the CPU may spend 50% of the time waiting for the end of the memory request-for example, due to memory latency.
You can see the Opteron Consolidated memory controller, which has a much lower latency than the chipset-supported dual-channel DDR memory controller. Intel also integrates memory controllers within the processor as planned, which makes the North Bridge chip less important. But changing the way the processor accesses main memory helps to increase bandwidth, reduce memory latency, and improve processor performance.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.