25.5 NUMA architecture machines NUMAArchitecture Machine
Although the multi-core CPU seems very powerful on the surface, they also bring new problems. Now, if multiple kernels need to concurrently access other system resources, these resources become the bottleneck of the overall system performance. For example, if two kernels need to access Ram at the same time, because the memory bandwidth limits the overall performance, the dual-core system only improves the performance by 30% to 70% compared with the single-core system. To alleviate this problem, the computer now uses the so-called NUMA (Cache-coherent non-uniform memory access) architecture.
4*4 in the figureItemsCPUIndicates16Bit, which is usually32Bit.
Figure 25-3 shows the architecture of a NUMA-based computer system. The system has four nodes. Each node contains four CPUs, one North Bridge, one south bridge, and the local memory (RAM ). Some nodes are connected to local devices. All memory can be accessed by any node; however, the time spent accessing the memory is inconsistent (non-uniform ). For example, any CPU of Node 1 can access the local memory of Node 1 very quickly. The CPU of Node 1 can also access the memory of Node 2 and node 4, but it may cause high performance loss. The CPU of Node 1 can also access the memory of Node 3, but the performance loss is even greater, because there is no direct communication line between node 1 and node 3. Even if 16 CPUs are distributed across four different nodes, the hardware ensures that the cache of all CPUs is consistent and synchronized with each other.
Win32 APIs provide a number of functions for unmanaged developers. They allow memory allocation on a specific NUMA node and force threads to run on a specific NUMA node. Today, CLR does not have anything specifically designed for the NUMA system. In the future, I think CLR may add something, such as preparing a garbage collection heap for each NUMA node, and may allow applicationsProgramAssigns an object to a specific node. In addition, CLR may allow objects to be migrated from one node to another, depending on which CPU has the highest frequency of accessing objects.
Looking back at the beginning of 1990s, it is hard to imagine that one day a computer has 32 CPUs. Therefore, when 32-bit windows was first launched, it was designed to support a maximum of 32 CPUs installed on one machine. Later, Microsoft began to provide 64-bit windows, which supports a maximum of 64 CPUs installed on one machine. It seems that there are many 64 CPUs, but according to the current trend, it seems that in the near future, it will reach or even exceed this number, which has actually been reached.
Starting from Windows Server 2008 R2, Microsoft Windows supports 256 logical processors on one machine. Figure 25-4 shows how Windows supports all these logical processors. The following describes the content in the figure:
■ One machine has one or more processor groups, each of which includes1To64Logical processors.
■ A processor group has1One or moreNUMANode. Each node includes some logic processors, high-speed cache, and local memory (they are closely integrated ).
■ EachNUMAEach node has one or more sockets.,To insert a silicon chip.
■ Each socket chip contains one or moreCPUKernel.
■ Each kernel contains one or more logical processors.If the chip is hyper-threading,There may be more than one logic processor.
Today's CLR The processor group cannot be used. , So all threads it creates are in the processor group. 0 (Default group. In 64 Bit Windows When running, you can use up 64 Core. Because 32 Bit Windows Only the processor group is supported. 0 And only supports 32 Items CPU Therefore, the Managed Application 32 Bit Windows Running , Up 32 core.