Computer composition, North-South Bridge, octave, communication, frequency of the same can communicate

Source: Internet
Author: User
Tags intel pentium

How does a computer get on the run

"How the program runs."

"Computer assembly and hardware repair from the beginning to mastery"

"Self-cultivation of programmers"

Three principles of computer.

1. A computer is a machine that performs input, arithmetic, and output.
2. The program is a collection of instructions and data
3. Computer processing is sometimes different from people's thinking habits

The so-called instruction is to control the computer input, operation, Output command.

By listing the instructions sent to the computer, you get the program.

In programming, a set of instructions is given a name, which can be called "function" "Statement" "Method" subroutine "subroutine" and so on.

The data in the program is divided into two categories, one is the input data of the instruction execution object, and the other is the output data obtained from the execution result of the instruction.

Programmers give the data a name, called a "variable", during programming. Seeing variables and functions, you might associate math. As with the memento method of functions in mathematics, the syntax similar to the following is used in many programming languages.

y = f (x)

This sentence means that if the variable x is entered into function f, the result is output to the variable y after some sort of operation inside the function. It is not surprising that the syntax of a programming language is similar to a mathematical formula, since the computer is the first to have all the information represented as a number before it is computed. But there is a point in the program that differs from mathematics in that the names of variables and functions can be made up of more than one character, as in the case below.

Output = operate (input)

Everything is digital to the computer

Representing all the information in numbers is a representative computer-style approach, which is the most different from the habits of human thinking. For example, people use words like "blue" and "red" to describe color information. But in the case of computers, you have to use numbers to represent color information. For example, using "0,0,255" for Blue, "255,0,0" for Red, and "255,0,255" for purple mixed with blue and red. Not only the color, but also the computer's handling of the text. The computer will first convert the text to the corresponding number to do processing, such as the number is called "character encoding." In short, the computer will show everything in numbers.

People who are familiar with computers often say something confusing, such as "open a file here, get a file handle," "Decrypt a file encrypted with the public key with the private key." So what do they call a "file handle"??—— is a number. What is a "public key"??—— is a number. The "Private key" is?—— of course a number. No matter what kind of information the computer is dealing with, just think of it as a number. While this is a bit of a violation of people's thinking habits, handling numbers is very simple for computers.

Computer hardware structure

One key point is that only the frequency is consistent to communicate. The CPU is communicated to other devices through various frequency multipliers, North Bridge, South Bridges, and I/O controllers to various frequencies.

The evolution of the north-south bridge

https://my.oschina.net/u/914655/blog/338903

These 18 backs down no one dares to confuse with you the CPU!

Https://groups.google.com/forum/?hl=iw#!topic/cnpro/RJz28ppqTNo

1. Frequency

The main frequency is also called the clock frequency, the unit is MHz, used to indicate the CPU operation speed. The main frequency of the CPU = FSB x multiplier factor. Many people think that the frequency of the CPU to determine the speed of operation, which is not only a one-sided, and for the server, this understanding has also been biased. So far, there is no definite formula to achieve the numerical relationship between the frequency and the actual speed of operation, even the two major processor manufacturers Intel and AMD, there is also a great controversy, we from
Intel's product development trend, you can see that Intel attaches great importance to strengthen its own frequency development. Like other processor manufacturers, some have taken a fast 1G of the United States to do comparisons, it is operating efficiency equivalent to 2G Intel processor.

  

Therefore, the CPU frequency and the actual CPU computing capacity is not directly related to the frequency of the
The speed of the digital pulse signal oscillation within the CPU. In Intel's processor offering, we can also see an example of this: 1
GHz Itanium chip can behave almost like 2.66 GHz
Xeon/opteron as fast, or 1.5 GHz Itanium 2 about 4 GHz
Xeon/opteron as fast. CPU operation speed also depends on the CPU's pipeline performance indicators in all aspects.

Of course, the frequency and the actual operating speed is related, can only say that the main frequency is only one aspect of CPU performance, and does not represent the overall performance of the CPU. (
bbz888.mcublog.com)

  

2. FSB

The FSB is the CPU's reference frequency, and the unit is MHz. The FSB of the CPU determines the speed at which the entire motherboard is running. Frankly speaking, in the desktop, what we call overclocking, is the ultra-CPU FSB (of course, the CPU frequency multiplier is locked) believe this is very good understanding. But for the server CPU, overclocking is absolutely not allowed. In front of the CPU determines the operating speed of the motherboard, the two are running synchronously, if the server CPU overclocking, change the FSB, will produce asynchronous operation, (desktop many motherboards support asynchronous operation) This will cause the entire server system instability.

  

Most of the current computer system FSB is also the memory and motherboard synchronization between the speed of operation, in this way, can be understood as the CPU FSB directly connected to the memory, to achieve the synchronization between the two running state. The FSB and front-end bus (FSB) frequencies are easily confused, and the following front-end bus introduces us to the difference between the two. (
bbz888.mcublog.com)

  

3. Front-end bus (FSB) frequency

The front-end bus (FSB) frequency, or bus frequency, directly affects the speed of direct data exchange between the CPU and the memory. One formula can be calculated, that is, data bandwidth = (bus frequency × data bandwidth)/8, the maximum bandwidth for data transmission depends on the width and frequency of all simultaneous data transmissions. For example, now supports the 64-bit Xeon Nocona, the front-end bus is 800MHz, according to the formula, its data transmission maximum bandwidth is 6.4gb/seconds.

  

The difference between the FSB and the front-end bus (FSB) Frequency: The speed of the front-end bus refers to the speed at which the data is transmitted, and FSB is the speed at which the CPU runs synchronously with the motherboard. That is, the 100MHz FSB refers to the digital pulse signal in the oscillation 10 million times per second, while the 100MHz front-end bus means that the CPU acceptable data transmission per second is 100mhzx64bit÷8byte/bit=800mb/s.

  

Actually now
The emergence of the "HyperTransport" architecture has changed the actual front-end bus (FSB) frequency in this sense. Before we knew that the IA-32 architecture must have three important components: the memory controller hub
(MCH), I/O Controller hub, and PCI hub, like Intel's typical chipset Intel
7501, Intel7505 chipset, for dual-strong processor tailored, they contain the MCH for the CPU to provide a frequency of 533MHz front-end bus, with DDR memory, the front-end bus bandwidth can reach 4.3gb/seconds. But as processor performance continues to improve, it poses many problems for the system architecture. The "HyperTransport" framework not only solves the problem, but also improves bus bandwidth more effectively, for example, AMD
Opteron processor, Flexible hypertransport
The I/O bus architecture allows it to integrate the memory controller so that the processor does not pass the system bus to the chipset and exchange data directly and in memory. In this case, the front-end bus (FSB) frequency in AMD
The Opteron processor doesn't know where to start. (bbz888.mcublog.com)

  

4, the CPU bit and word length

bit: binary in digital circuits and computer technology, with code only "0" and "1", whichever is
"0" or "1" is a "bit" in the CPU.

Word Length: The number of bits in the computer technology that the CPU can handle at one time (at the same time) as a binary number is called word length. So a CPU that can handle 8 bits of data with a word length is usually called a 8-bit CPU. Similarly
A 32-bit CPU can process binary data with a length of 32 bits in a unit time. The difference between byte and word length: Since the commonly used English characters are represented by 8-bit binary, the 8 bits are usually referred to as one byte. The length of the word word is not fixed, and for different CPUs, the length of word lengths is not the same. A 8-bit CPU can handle only one byte at a time, while a 32-bit CPU can handle 4 bytes at a time, and a CPU with a 64-bit word length could handle 8 bytes at a time.
(bbz888.mcublog.com)

  

5. Octave Factor

The multiplier coefficient refers to the relative proportional relationship between CPU frequency and FSB. At the same FSB, the higher the frequency of the CPU, the higher the multiplier. But in fact, under the premise of the same FSB, high-frequency CPU itself has little meaning. This is because the speed of data transmission between the CPU and the system is limited, blindly pursuing high frequency and the CPU will have obvious "bottleneck" effect-cpu the limit speed of data obtained from the system can not meet the speed of CPU operation. In general, except for the engineering sample version of Intel's CPU is locked octave, and AMD has not been locked. (
bbz888.mcublog.com)


6. Caching


Cache size is also an important indicator of the CPU, and the structure and size of the cache on the CPU speed is very large, CPU cache running frequency is very high, generally with the processor frequency operation, the efficiency is much larger than the system memory and hard disk. When actually working,
The CPU often needs to read the same block of data repeatedly, and the increase of the cache capacity can greatly improve the performance of the system by increasing the hit rate of the CPU's internal read data instead of looking for memory or hard disk. However, due to the CPU chip area and cost factors to consider, the cache is very small.

  

L1 cache (cache) is the first CPU cache, which is divided into data cache and instruction cache. The capacity and structure of the built-in L1 cache has a large impact on the performance of the CPU, but the cache is composed of static RAM, which is more complex, and the CPU
The capacity of the L1 cache is unlikely to be too large if the die area is not too large. The L1 cache capacity of the general server CPU is usually 32-256kb.

  
The L2 cache (level two cache) is the second-tier cache of CPUs, both internal and external. The internal chip level two cache runs at the same speed as the frequency, while the external level two cache is only half the frequency. L2 cache capacity also affects CPU performance, the principle is that the bigger the better, now the largest home CPU capacity is 512KB, and the server and workstation with the CPU L2 cache up to
256-1MB, some up to 2MB or 3MB.

  

L3 Cache (Level three cache), divided into two, the early is external, is now built-in. What it does, however, is that the application of the L3 cache can further reduce memory latency while improving the performance of the processor when large data volumes are computed. Reduce memory latency and increase the ability to compute large data volumes are useful for games. The increased L3 cache in the server domain still has a significant performance boost. For example, a configuration with a larger L3 cache can be more efficient with physical memory, so it is slower for the disk I/O subsystem to handle more data requests. Processors with larger L3 caches provide more efficient file system cache behavior and shorter message and processor queue lengths.

  

In fact, the earliest L3 cache was applied on AMD's K6-III processor, when the L3 cache was limited to the manufacturing process and was not integrated into the inside of the chip, but integrated on the motherboard. The L3 cache, which can only be synchronized with the system bus frequency, is not much worse than the main memory. Later, L3 was used to cache the Itanium processors that Intel has launched for the server market. And then there's P4ee and Xeon.
Mp. Intel also intends to launch a 9MB
L3 Cache of ITANIUM2 processors, and later 24MB
L3 the dual-core ITANIUM2 processor for the cache.

  

But basically the L3 cache is not very important for processor performance improvement, for example, with 1MB
L3 Cache of Xeon
MP processor is still not Opteron opponents, it is obvious that the increase of the front-end bus, more than the cache increase to bring more effective performance improvement. (
bbz888.mcublog.com)

  

7.CPU Extended Instruction Set

The CPU relies on instructions to calculate and control the system, and each CPU specifies a series of instruction systems that match its hardware circuitry at design time. The strength of instructions is also an important indicator of CPU, and instruction set is one of the most effective tools to improve the efficiency of microprocessor. From the current mainstream architecture, instruction set can be divided into complex instruction set and thin instruction set two parts, and from the specific application, such as Intel's MMX (Multi
Media Extended), SSE, SSE2 (streaming-single instruction multiple
Data-extensions
2), SEE3 and AMD 3dnow! and so on are the CPU's extended instruction set, respectively enhanced the CPU multimedia, the graphics image and the Internet and so on processing ability. We usually put
The extended instruction set of the CPU is called the "instruction set of the CPU". The SSE3 instruction set is also the smallest instruction set at the present time, with MMX having 57 commands, SSE containing 50 commands, SSE2 containing 144 commands, SSE3 containing 13 commands. Currently SSE3 is also the most advanced instruction set, the Intel Prescott processor has supported the SSE3 instruction set, AMD will add support for the SSE3 instruction set in the future dual core processors, and the entire American processor will support this instruction set. (
bbz888.mcublog.com)

  

8.CPU core and I/O operating voltage

Starting at 586CPU, the CPU's operating voltage is divided into the core voltage and I/o voltage, usually the core voltage of the CPU is less than equal to I/O voltage. The size of the core voltage is based on the CPU production process, the smaller the general production process, the lower the core operating voltage, I/o voltage is generally 1.6~5v. Low voltage can solve the problem of excessive power consumption and overheating. (
bbz888.mcublog.com)

  

9. Manufacturing process

The micron of the manufacturing process refers to the distance between the circuit and the circuit in the IC. The trend of manufacturing process is to increase the density of the higher direction. The higher density of IC circuit design means that in the same size of the IC, can have a higher density, more complex circuit design. Now the main 180nm, 130nm, 90nm. Recently, the official has already said 65nm manufacturing process. (
bbz888.mcublog.com)

10. Instruction Set


  

(1) CISC instruction set

CISC instruction set, also known as the complex instruction set, the English name is CISC, (Complex
Instruction Set
computer's abbreviation). In the CISC microprocessor, each instruction of the program is executed serially sequentially, and each operation in each instruction is serially executed sequentially. The advantage of sequential execution is that the control is simple, but the utilization of the parts of the computer is not high and the execution speed is slow. In fact, it is the x86 series (i.e. IA-32 architecture) CPU and its compatible CPUs, such as AMD, VIA, which are produced by Intel. Even now the new x86-64 (also known as AMD64) belong to the CISC category.

  

To know what the instruction set is also to be said from the CPU of today's X86 architecture. The X86 instruction set was developed specifically by Intel for its first 16-bit CPU (i8086), which was launched in the world's first PC in the year IBM1981.
cpu-i8088 (i8086 simplified version) is also used X86 instructions, while the computer in order to improve the floating-point data processing capability and increase the X87 chip, the X86 instruction set and X87 instruction set will be collectively referred to as X86 instruction set.

Although with the development of CPU technology, Intel has developed a new i80386, i80486 until the past
PII Xeon, Piii Xeon, Pentium 3, last to today's Pentium
4 series, Xeon (not including Xeon Nocona), but in order to ensure that the computer can continue to run a variety of applications developed in the past to protect and inherit rich software resources, so Intel company produced by all
The CPU still continues to use the X86 instruction set, so its CPU still belongs to the X86 series. Because Intel
X86 series and its compatible CPUs (e.g. AMD Athlon
MP,) are using the X86 instruction set, thus forming today's large X86 series and compatible CPU lineup. X86CPU currently has Intel's server CPUs and AMD servers
Two types of CPUs.

  

(2) RISC instruction set

RISC is English "reduced instruction Set Computing"
Abbreviation, Chinese means "streamlined instruction set". It is developed on the basis of the CISC command system, some people on the CISC machine test shows that the use of various instructions are quite wide, most often using a few relatively simple instructions, they accounted for only 20% of the total number of instructions, but the frequency of the program is 80%. The complex instruction system inevitably increases the complexity of the microprocessor, which makes the processor develop long time and high cost. and complex instructions require complex operations that will inevitably slow down the computer. For these reasons, the RISC-type CPU was born in the 1980s, relative to the CISC type CPU
, RISC-type CPU not only streamlines the instruction system, but also uses a kind of "superscalar and super-pipelined structure", which greatly increases the parallel processing capability. RISC instruction set is the development direction of high performance CPU. It is relative to the traditional CISC (complex instruction set). In comparison, RISC's instruction format is uniform, the variety is less, and the addressing method is less than the complex instruction set. Of course the processing speed is much higher. Currently in the middle-grade server in the general use of this instruction system of the CPU, especially high-end servers are all using RISC instruction system CPU. RISC instruction system is more suitable for high-end server operating system
UNIX, now Linux also belongs to UNIX-like operating systems. RISC-based CPUs are incompatible with Intel and AMD's CPUs on both software and hardware.

  

At present, in the middle-grade server with RISC instruction CPU mainly has the following categories: PowerPC processor, SPARC processor, PA-RISC processor, MIPS processor, Alpha processor.

  

(3) IA-64

  

EPIC (Explicitly Parallel instruction
Computers, the exact parallel instruction computer) is a lot of controversy over whether RISC and CISC systems are inheritors, and it is more like Intel's processor in the epic system.
The important steps of RISC system. In theory, the epic system designed the CPU, under the same host configuration, the processing of Windows application software is much better than the UNIX-based application software.

  

Intel's server CPU with EPIC technology is Itanium Itanium (the Development Code is Merced). It is
64-bit processor, also the first in the IA-64 series. Microsoft has also developed an operating system codenamed Win64, which is supported on the software. After Intel adopted the X86 instruction set, it turned to a more advanced 64-bit microprocessor, and Intel did so because they wanted to get rid of the massive x86 architecture, which introduced a powerful and robust instruction set, and then adopted
The IA-64 architecture of the epic instruction set was born. IA-64
In many ways, they have made great strides over x86. It breaks through many limitations of traditional IA32 architecture, and gains a breakthrough in the processing power of data, stability, security, usability, and rationality of the system.

  

The biggest drawback of IA-64 microprocessors is that they lack compatibility with x86, and Intel is able to better run two-dynasty software for IA-64 processors on IA-64 processors (Itanium, Itanium2
......) The x86-to-ia-64 decoder is introduced so that the x86 instruction can be translated into IA-64 instructions. This decoder is not the most efficient decoder, nor is it the best way to run x86 code (the best way is to run the x86 code directly on the x86 processor), so Itanium
And Itanium2 performance is very bad when running x86 applications. This has also become the root cause of x86-64.

  

(4) x86-64 (AMD64/EM64T)

  

AMD is designed to handle 64-bit integer operations at the same time and is compatible with the X86-32 architecture. It supports 64-bit logical addressing while providing a conversion to 32-bit addressing option, but the data manipulation directives default to 32-bit and 8-bit, providing options for converting to 64-bit and 16-bit, support for general purpose registers, and if 32-bit operations, expand the results to full 64-bit. In this way, the instructions have
The difference between "direct execution" and "transform Execution", where the instruction field is either 8-bit or 32-bit, prevents the field from being too long.

  

X86-64 (also called
AMD64) generation is not unfounded, the x86 processor's 32bit addressing space is limited to 4GB of memory, and IA-64 's processor is not compatible with x86. AMD takes the customer's needs into account and strengthens the x86 instruction set, enabling the set of instructions to support 64-bit operating modes simultaneously, so AMD calls their structure the x86-64. AMD has introduced new R8-R15 general-purpose registers as an extension of the original X86 processor registers in the X86-64 architecture in order to perform 64-bit operations, but these registers are not fully used in 32-bit environments. The original registers such as EAX and EBX are also expanded from 32 to 64 bits. 8 new registers were added to the SSE unit to provide support for SSE2. An increase in the number of registers will result in a performance boost. At the same time, in order to support both 32 and 64 bit codes and registers, the X86-64 architecture allows the processor to operate in the following two modes: Long
Mode (long mode) and legacy
Mode (genetic mode), long mode is also divided into two seed modes (64bit mode and compatibility
mode compatibility). This standard has been introduced in AMD server processors in the Opteron processor.

  

This year, EM64T technology, which supports 64-bit, has not yet been formally named EM64T before ia32e, which is the name of Intel's 64-bit extension technology to differentiate X86 instruction sets. Intel's
The EM64T supports 64-bit sub-mode, similar to AMD's X86-64 technology, with 64-bit linear planar addressing, 8 new Universal registers (GPRs), and 8 registers to support SSE instructions. Similar to AMD, Intel's 64-bit technology will be compatible with IA32 and ia32e and will only be used when running a 64-bit operating system ia32e.
The ia32e will consist of 2 Sub-mode: 64-bit Sub-mode and 32-bit sub-mode, which are backwards compatible with AMD64. Intel's EM64T will be fully compatible with AMD's X86-64 technology. Now that the Nocona processor has joined some 64-bit technologies, Intel's Pentium
The 4E processor also supports 64-bit technology.

  

It should be said that both are compatible with the x86 instruction set of the 64-bit microprocessor architecture, but EM64T and AMD64 still have a few different places, the NX bit in the AMD64 processor will not be available in Intel's processors. (
bbz888.mcublog.com)

  

11. Ultra-pipeline and Superscalar quantity

Before explaining the Super pipeline and Superscalar, first understand the pipeline (pipeline). The pipeline is the first time Intel has started using the 486 chip. The pipeline works like an assembly line in industrial production. In the CPU consists of 5-6 different functions of the circuit unit to form an instruction processing line, and then a X86 instruction is divided into 5-6 steps and then executed by these circuit units, so that the implementation of a CPU clock cycle to complete an instruction, thus increasing the CPU's operation speed. Classic Pentium Each integer pipeline is divided into four levels of water, that is, instruction prefetching, decoding, execution, writeback results, floating-point water is divided into eight levels of water.

  

Superscalar is through the built-in multiple lines to execute multiple processors at the same time, its essence is to exchange space for time. And the ultra-pipeline is through the refinement of water, improve the frequency, so that in a machine cycle to complete one or more operations, its essence is to exchange time for space. such as Pentium
4 of the pipeline is up to 20 levels. The longer the step (level) of the pipeline design, the faster it takes to complete an instruction, so it can adapt to the CPU with higher operating frequency. But the pipeline too long also brought a certain side effect, is likely to appear high frequency CPU actual operation speed is low phenomenon, Intel Pentium 4 appeared this situation, although its frequency can be up to 1.4G above, but its computing performance is far inferior to AMD
1.2G of Fast Dragons even Pentium III. (bbz888.mcublog.com)

  

12. Package Form

CPU encapsulation is a protective measure that uses a specific material to solidify a CPU chip or CPU module in order to prevent damage, and it is generally necessary for the CPU to deliver the user after encapsulation. The way the CPU is packaged depends on
CPU installation form and device integration design, from the large classification of the CPU typically installed with socket sockets using the PGA (raster array) mode encapsulation, and the use of slot
X-Slot-mounted CPUs are all packaged in the form of SEC (single-sided socket cartridges). And now there's PLGA (plastic
Land grid Array), OLGA (Organic land grid
Array) and other encapsulation technologies. Due to the increasingly fierce market competition, the current development direction of CPU packaging technology to save cost-based. (
bbz888.mcublog.com)
13. Multithreading


  

Simultaneous multithreading simultaneous
Multithreading, referred to as Smt. SMT allows multiple threads on the same processor to perform synchronously and share the processor's execution resources by replicating the fabric state on the processor, maximizing the wide emission and disorderly handling, increasing the utilization of processor components, and mitigating access memory latency due to data-related or cache misses. When no more than one thread is available, the SMT
The processor is almost the same as a conventional wide emission superscalar processor. The most compelling thing about SMT is the small-scale change in the processor core design, which can significantly improve performance with little additional cost. Multithreading technology can be used for high-speed computing cores to prepare more data to be processed, reducing the idle time of the computing core. This is undoubtedly attractive for low-end desktop systems. Intel from 3.06GHz
Starting with Pentium 4, all processors will support SMT technology. (
bbz888.mcublog.com)

  

14. Multi-core

  

Multi-core, also refers to a single-chip multi-processor (chip
Multiprocessors, referred to as CMP). CMP is proposed by Stanford University, the idea is that the large-scale parallel processor in the SMP (symmetric multiprocessor) integrated into the same chip, the various processors in parallel to perform different processes. Compared to CMP,
The flexibility of SMT processor architecture is more prominent. However, when the semiconductor process enters 0.18 microns, the line delay has surpassed the gate delay, requiring the microprocessor's design to be done by dividing many smaller, more localized basic unit structures. By contrast, because the CMP structure has been divided into multiple processor cores to design, each core is relatively simple, conducive to optimizing the design, and therefore more promising. Currently
IBM's power 4 chip and Sun's
The MAJC5200 chip uses a CMP structure. Multi-core processors can share caches within the processor, improving cache utilization while simplifying the complexity of multiprocessor system designs.

  

In the second half of 2005, new processors from Intel and AMD will also be incorporated into the CMP structure. Xinan processor Development code for Montecito, dual core design, with a minimum of 18MB
On-chip cache, 90nm process manufacturing, its design is absolutely the challenge of today's chip industry. Each of its individual cores has its own l1,l2 and L3.
The cache contains approximately 1 billion transistors. (bbz888.mcublog.com)

  

15. SMP

  

SMP (symmetric
multi-processing), a symmetric multi-processing structure, refers to the aggregation of a group of processors (multi-CPU) on a computer, the memory subsystem shared between each CPU and the bus structure. Supported by this technology, a server system can run multiple processors at the same time and share memory and other host resources. Like the double Xeon, which is what we call two, this is the most common one in a symmetric processor system (Xeon MP can support up to four ways, AMD
Opteron can support 1-8-way). There are also a few 16-way. However, in general, the SMP structure of the machine scalability is poor, it is difficult to do more than 100 multiprocessor, the general general is 8 to 16, but this is enough for most users. Most common in high-performance server and Workstation-class board architectures, such as UNIX servers can support up to 256 CPUs of a system.

  

The prerequisites for building an SMP system are: the hardware that supports SMP includes the motherboard and CPU, the system platform that supports SMP, and the application software that supports SMP.

  

To enable SMP systems to perform efficiently, the operating system must support SMP systems such as Winnt, LINUX, and Unix, among other 32-bit operating systems. The ability to multitask and multithreading. Multitasking refers to the ability of the operating system to allow different CPUs to perform different tasks at the same time; multithreading means that the operating system enables different CPUs to accomplish the same task in parallel.

  

To build an SMP system, there is a high demand for the selected CPU, first of all, the internal APIC must be built into the CPU (advanced
Programmable Interrupt Controllers) unit. Intel
The core of the multi-processing specification is the Advanced Programmable Interrupt Controller
Programmable Interrupt
Controllers--apics), again, the same product model, the same type of CPU core, the exact same operating frequency; Finally, keep the same product serial number as possible, as the two production batches of CPUs run as dual processors, It is possible that one CPU is too high, while the other is less burdensome, unable to perform the maximum performance, and worse, it can cause a crash. (
bbz888.mcublog.com)

  

16. NUMA Technology

  

NUMA is a non-uniform access distributed shared storage technology, which is composed of a number of independent nodes connected by a high-speed private network, each of which can be a single CPU or SMP system. In
NUMA, Cache
Consistency there are multiple solutions that require the support of the operating system and special software. Figure 2 is an example of a sequent company NUMA system. There are 3 SMP modules that are combined with a high-speed private network to form a node that can have 12 CPUs per node. Systems like Sequent can reach up to 64 CPUs or even 256 CPUs. Obviously, this is on the basis of SMP, and then using
NUMA technology is a combination of these two technologies. (
bbz888.mcublog.com)

  

17, disorderly execution technology

  

Random Execution (out-of-orderexecution), refers to the CPU allows a number of instructions not in accordance with the program specified in the order of development to each corresponding Circuit unit processing technology. This will be based on the status of the circuit unit and the individual instructions can be carried out in advance of the analysis, the pre-executed instructions can be immediately sent to the corresponding circuit unit execution, during which the instructions are not executed in the specified sequence, and then by rearranging the units of the execution unit results in order to rearrange the orders. The purpose of the chaotic execution technique is to make the CPU internal circuit run at full load and increase the speed of the CPU running program accordingly. Branching technique: (branch) instructions need to wait for the results of the operation, the general non-conditional branches only need to be executed in order, and the conditional branch must be based on the results of processing, and then decide whether to proceed in the original order. (
bbz888.mcublog.com)

  

18, CPU internal memory controller

  

Many applications have more complex read patterns (almost randomly, especially when the cache
Hit is unpredictable) and does not utilize bandwidth effectively. Typically, this type of application is business processing software, even if it has an out-of-order execution (out
of order
Execution) is also limited by memory latency. This allows the CPU to wait until the data required for the operation is dividend loaded to execute the instruction (regardless of the data from the CPU
Cache or the main memory system). The memory latency of the current low-segment system is approximately 120-150ns, while the CPU speed is above 3GHz, and a single memory request can waste 200
-300 cycles of CPU. Even in cache hit
Rate) of 99%, the CPU may also take 50% of the time to wait for the end of the memory request-for example, due to memory latency.

  

You can see the Opteron Integrated memory controller, it's delayed, and the chipset supports dual channel
The latency of the DDR memory controller is much lower than the delay. Intel also consolidates the memory controller within the processor as planned, which makes the North Bridge chip less important. However, changing the way the processor accesses main memory helps increase bandwidth, reduce memory latency, and improve processor performance.

Computer composition, North-South Bridge, octave, communication, frequency of the same can communicate

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.