Processor CPU concept and CPU multithreading

Source: Internet
Author: User
Tags mips instruction set

1 socket, core, thread

(1) Socket is the number of slots on the motherboard to plug the CPU, that is, the administrator said "road"
The chip manufacturer encapsulates one or more cores on a chip, called a socket. Assuming a slot has two cores, the motherboard is plugged into 2 slots, which is the 4-core system.
(2) Core is what we usually call "nuclear", that is, dual-core, 4-core and so on. Single-core (Single-core) and multicore (multi-core) are also known as uniprocessor and multiprocessor
(3) thread is the number of hardware threads per core, that is, Hyper-threading
For example, a server is: 2-way 4-core Hyper-threading (generally default to 2 threads), then, through Cat/proc/cpuinfo see is 2*4*2=16 processor, many people are also accustomed to become 16 core!


[Email protected] ~]# LSCPU
Architecture: x86_64
CPU Op-mode (s): 32-bit, 64-bit
Byte Order: Little Endian
CPU (s): 32
On-line CPU (s) List: 0-31
Thread (s) per core: 2
Core (s) per socket: 8
Socket (s): 2
NUMA node (s): 2

Vendor ID:Genuineintel
CPU Family: 6
Model:45
Model Name:intel (R) Xeon (r) CPU e5-2690 0 @ 2.90GHz
Stepping: 7
CPU MHz:2889.125
Bogomips: 5804.19
Virtualization: Vt-x
L1D Cache: 62.
L1i Cache:62.
L2 Cache: 256K
L3 Cache: 20480K
NUMA node0 CPU (s): 0-7,16-23
NUMA Node1 CPU (s): 8-15,24-31

The server has 32 CPUs, consisting of 2 sockets, 8 cores per socket, and 2 thread per core. In addition, these CPUs can be divided into 2 NUMA node.


2concepts such as SMP, SMT, NUMA, etc.

core (or processor), is a generic, is from the user (or consumer) point of view of the computer system. Therefore, the core, or processor, or processor (CPU), is a logical concept, which refers to a kernel that can be operated and processed independently.
And this core, can exist in any form, for example: a separate chip (such as a single-core processor in the usual sense), a chip on the integration of multiple cores (such as Smp,symmetric multiprocessing), a core implementation of multiple hardware Context to support multithreading (e.g., smt,simultaneous multithreading); This is from the hardware implementation point of view.
Finally, from the point of view of operating system process scheduling, we will look at the core of these different hardware implementations, such as the CPU (32 CPUs) mentioned below, because they all have a common feature: the execution process (or thread).

In the traditional single-core era, the only way to improve processor performance is to increase the frequency. But limited by the physical process, the frequency can not be infinitely improved (such as thermal problems, etc.). For multicore processors, the amount of space available increases and the heat dissipation problem is more easily solved. This is the background of the birth of multiprocessor.

(1) SMT, while multithreading simultaneous multithreading, referred to as Smt. SMT allows multiple threads on the same processor to perform synchronously and share the processor's execution resources by replicating the fabric state on the processor, maximizing the wide emission and disorderly handling, increasing the utilization of processor components, and mitigating access memory latency due to data-related or cache misses. When no multiple threads are available, the SMT processor is almost the same as a conventional wide emission superscalar processor. The most compelling thing about SMT is the small-scale change in the processor core design, which can significantly improve performance with little additional cost. Multithreading technology can be used for high-speed computing cores to prepare more data to be processed, reducing the idle time of the computing core. This is undoubtedly attractive for low-end desktop systems. Intel starts with 3.06GHz Pentium 4 and all processors will support SMT technology. Intel's hyper-threading is actually two-thread SMT.
(2) CMP, on-chip multi-processor (chip multiprocessors, referred to as CMP)CMP is proposed by Stanford University, the idea is that the large-scale parallel processor in the SMP (symmetric multiprocessor) integrated into the same chip, the various processors in parallel to perform different processes. Compared with CMP, the flexibility of SMT processor architecture is more prominent. However, when the semiconductor process enters 0.18 microns, the line delay has surpassed the gate delay, requiring the microprocessor's design to be done by dividing many smaller, more localized basic unit structures. By contrast, because the CMP structure has been divided into multiple processor cores to design, each core is relatively simple, conducive to optimizing the design, and therefore more promising. At present, the IBM Power 4 chip and Sun's MAJC5200 chip all use the CMP structure. Multi-core processors can share caches within the processor, improving cache utilization while simplifying the complexity of multiprocessor system designs.
(3) SMP, symmetric multi-processor (symmetric multi-processors, abbreviated as SMP)refers to the pooling of a set of processors (multi-CPU) on a single computer, and the sharing of memory subsystems and bus structures between CPUs. Supported by this technology, a server system can run multiple processors at the same time and share memory and other host resources. Like double Xeon, which is what we call two, this is one of the most common in symmetric processor systems (Xeon MP can support up to four ways, AMD Opteron can support 1-8-way). There are also a few 16-way. However, in general, the SMP structure of the machine scalability is poor, it is difficult to do more than 100 multiprocessor, the general general is 8 to 16, but this is enough for most users. The most common in high-performance server and workstation-class motherboard architectures, such as UNIX servers that support up to 256 CPUs, in fact QEMU supports up to 256 virtual CPUs from code design.
(4) NUMA (non-uniform Memory Access)The core will share the bus, memory and other resources. If the number of cores is small, there is no problem, but as the core increases, the demand for the bus and memory bandwidth increases significantly, and the final bus and memory becomes the bottleneck of the system performance.
Between some cores, the exclusive bus and memory, called node. Under normal circumstances, the core only accesses memory within node, thus reducing the bandwidth requirements for the bus and memory. However, in some scenarios, the core will inevitably access other node's memory, which can cause a lot of access delays.
As a result, this technique, called NUMA (Non-uniform Memory Access), reduces the bandwidth requirements for bus and memory at the expense of the inconsistency of RAM access. This structure has a high demand on the process scheduling algorithm, minimizing the number of memory accesses across node to improve system performance.

3 SMT3.1 mtmt,multithreading, multi-threaded. refers to executing multiple threads on a single core. There are two types of Mt technology, TMD and Smt.

3.1.1 Tmd,temporal multithreading. Time Division multithreading. A time-sharing task similar to the job system. A period of time open only executes one thread, and multiple threads are executed alternately.

TMD technology is applicable to the CPU of RISC architecture, which is widely used on the server.
There are two kinds of scheduling methods for TMD ticks. CMT and FMT.
cmt,coarse-grained Multithreading. The idea of CMT is that thread 1 is executed, and when thread 1 waits for instructions, it switches to thread 2.
fmt,fine-grained multithreading. The FMT idea is that the average dispatch.

In a simple-structured RISC CPU, CPU,FMT like the MIPS instruction set are advantageous. FMT is the most commonly used MT technology for RISC architectures.
In the more complex RISC CPUs, the FMT does not necessarily have an advantage, and the CPU using CMT is also available.
3.1.2 Smt,simultaneous_multithreading.concurrent Multithreading. All the instructions are mixed together to execute.
SMT technology is suitable for the CISC architecture of the CPU, on the desktop and the low-end server, has been adopted by Intel.
3.2 Smt Synchronous Multithreading (SMT) is a hardware multithreading technique capable of executing instructions from multiple threads within a CPU's clock cycle. In essence, synchronous multithreading is a method of translating thread-level parallel processing (multi-CPU) into instruction-level parallel processing (the same CPU). Synchronous multithreading is the ability of a single physical processor to simultaneously dispatch instructions from multiple hardware thread contexts. Synchronous multithreading is used in commercial environments and for workloads with higher cycle/instruction (CPI) counts for creative performance advantages. The processor adopts superscalar structure, which is most suitable for reading and running instructions in parallel mode. Synchronous multithreading allows you to dispatch two applications simultaneously on the same processor, taking advantage of the processor's superscalar structural nature.

No single application can fully load the processor. When a thread encounters a longer wait-time event, synchronous multithreading also allows instructions in the other threads to use all the execution units. For example, when one thread has a cache miss, another thread can continue to execute. Synchronous multithreading is POWER5? and POWER6? The functionality of the processor, which can be used in conjunction with a shared processor.


SMT performance optimization for commercial transaction processing loads can be up to 30%. SMT is a good choice when paying more attention to the overall throughput of the system rather than the throughput of individual threads.

But not all applications can achieve performance optimization through SMT. Applications whose performance is constrained by the execution unit, or those that deplete the memory bandwidth of all processors, will not be improved by executing two threads on the same processor.

SMT allows multiple threads on the same processor to perform synchronously and share the processor's execution resources by replicating the fabric state on the processor, maximizing the wide emission and disorderly handling, increasing the utilization of processor components, and mitigating access memory latency due to data-related or cache misses. When no multiple threads are available, the SMT processor is almost the same as a conventional wide emission superscalar processor.
Chinese name CPU multi-threaded foreign name simultaneous multithreading effect change processor core design features to achieve wide emission, disorderly order of the superscalar processing
The most compelling thing about SMT is the small-scale change in the processor core design, which can significantly improve performance with little additional cost. Multithreading technology can be used for high-speed computing cores to prepare more data to be processed, reducing the idle time of the computing core. This is undoubtedly attractive for low-end desktop systems. Intel starts with 3.06GHz Pentium 4 and all processors will support SMT technology.



4 SMT, HMT, HT technology for IBM Power processors
SMT concurrent multithreading is a new chip after POWER5, it supports a CPU core processing two instructions at the same time, so can be up to a single CPU twice times the processing speed. On average, turning on SMT support can increase CPU processing power by 30% compared to turning off Smt. SMT's ability comes from a lot of registers in the CPU, but because the CPU usually can only execute one instruction at a time, even if it is pipelining, there is only one instruction for the actual execution of Occupy arithmetic in a point of time, and the other parallel instructions can only be decoded and addressed. So most of the registers are not working and are idle.


In response to this, IBM specifically designed a register structure to track the processing state of the CPU, when there is an idle register, and another thread can use it at this time, while maintaining the current process, so that the other thread to execute simultaneously, so at the same time, A CPU can handle two threads of instruction. It's not really that simple, because a RISC CPU typically executes multiple instructions at the same time in an ultra-pipelined manner, so implementing SMT requires more complex judgment and does not guarantee that the CPU will always be able to execute two of threads. As a result, SMT does not double the performance, but only increases by about 30%.


Similar to SMT, HMT hardware multithreading technology, is also trying to implement a number of instruction threads of a technology, but unlike SMT, the HMT must wait until the current instruction stream idle, to switch to another instruction thread execution. When the CPU execution instruction is interrupted, when the current instruction stream is idle, it is usually possible that the instruction/data cache is missing (because of the jump and address out of bounds, there is no instruction currently required to execute). Because HMT has more stringent requirements on whether the instructions are executed in parallel, the performance improvement is not very significant.


IBM's power series CPU support HMT, to POWER5 began to add SMT support, where HMT is unconditional use, and SMT may be controlled by the operating system switch to use and shutdown.
SMT technology requires not only AIX operating system support, but also the power chip hardware support, this technology allows on a physical CPU (core) to execute two threads concurrently, through this technology can greatly improve the CPU processing unit utilization, Generally can make the system (CPU bottleneck system) performance increased by more than 30%, the following illustration to do a comparison:
--SMT shutdown, CPU Processing unit utilization in a CPU cycle


--SMT the CPU cycles the utilization of the unit in the case of a CPU cycle


Concurrent multithreading is a new technology for POWER5 chips, which supports a CPU core to process two instructions at the same time, so it can achieve up to twice times the processing speed of a single CPU. On average, turning on SMT support can increase CPU processing power by 30% compared to turning off Smt. SMT's ability comes from a lot of registers in the CPU, but because the CPU usually can only execute one instruction at a time, even if it is pipelining, there is only one instruction for the actual execution of Occupy arithmetic in a point of time, and the other parallel instructions can only be decoded and addressed. So most of the registers are not working and are idle.
In response to this, IBM specifically designed a register structure to track the processing state of the CPU, when there is an idle register, and another thread can use it at this time, while maintaining the current process, so that the other thread to execute simultaneously, so at the same time, A CPU can handle two threads of instruction. It's not really that simple, because a RISC CPU typically executes multiple instructions at the same time in an ultra-pipelined manner, so implementing SMT requires more complex judgment and does not guarantee that the CPU will always be able to execute two of threads. As a result, SMT does not double the performance, but only increases by about 30%.
Similar to SMT, HMT hardware multithreading technology, is also trying to implement a number of instruction threads of a technology, but unlike SMT, the HMT must wait until the current instruction stream idle, to switch to another instruction thread execution. When the CPU execution instruction is interrupted, when the current instruction stream is idle, it is usually possible that the instruction/data cache is missing (because of the jump and address out of bounds, there is no instruction currently required to execute). Because HMT has more stringent requirements on whether the instructions are executed in parallel, the performance improvement is not very significant.
IBM's power series CPU support HMT, to POWER5 began to add SMT support, where HMT is unconditional use, and SMT may be controlled by the operating system switch to use and shutdown.

Reference:

1 Processor Concept: http://labs.chinamobile.com/mblog/854855_199265

2 topology:http://www.wowotech.net/pm_subsystem/cpu_topology.html

3 http://www.itpux.com/thread-256-1-1.html

Processor CPU concept and CPU multithreading

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.