1. Several concepts of NUMA (Node,socket,core,thread)
For Socket,core and thread will have a lot of article introduction, here briefly, specifically see:
Summary: Socket is the CPU socket on the motherboard; Core is a separate set of program execution hardware units in the socket, such as registers, calculation units, etc. Thread: Is the concept of Hyper-threading Hyperthread, the execution unit of logic, the independent execution context, but the sharing of registers and compute units within the core.
The concept of node is more in NUMA architecture, which is actually used to solve the problem of core grouping, see Understanding (the OS CPU in the figure can understand thread, then the core is not drawn in the diagram), you can see that each socket has two node, A total of 4 sockets, each socket 2 node, each node has 8 thread, a total of 4 (socket) x2 (Node) x8 (4corex2 thread) = 64 thread.
In addition each node has its own internal CPU, bus and memory, but also can access other node memory, NUMA's biggest advantage is that it can easily increase the number of CPUs, because node has its own internal bus, so the increase in the number of CPUs can be achieved by increasing the number of node, If you simply increase the number of CPUs, the bus will cause great pressure, so the UMA structure can not support a lot of cores.
"This figure is from: NUMA best Practices for Dell PowerEdge 12th Generation Servers"
As mentioned above, because each node has its own CPU bus and memory, if the Vcpus of a virtual machine across different node, it will cause a CPU in node to access the memory in another node, which leads to increased memory access latency. In some special scenarios, such as in an NFV environment, where performance requirements are high, it is very important that the vcpus of the same virtual machine are allocated to the same node as PCPU, so that the features of the NUMA-aware virtual Machine Scheduler are added in the kilo version of OpenStack.
2. How to view the NUMA topology of a machine
The more commonly used command is LSCPU, the specific output is as follows:
- dylan@hp3000:~$ LSCPU
- Architecture:x86_64
- CPU Op-mode (s): 32-bit, 64-bit
- Byte Order:little Endian
- CPU (s): 48//Total 48 logical CPUs (threads)
- On-line CPU (s) list:0-47
- Thread (s) per Core:2//2 x threads per core
- Core (s) per socket:6//socket with 6 x cores
- Socket (s): 4//A total of 4 sockets
- NUMA node (s): 4//A total of 4 NUMA nodes
- Vendor Id:genuineintel
- CPU Family:6
- Model:45
- Stepping:7
- CPU mhz:1200.000
- bogomips:4790.83
- Virtualization:vt-x
- L1D cache:32k//l1 Data Cache 32k
- l1i cache:32k//l1 Instruction Cache 32k (Cow x machine Performance, von Neumann + Harvard architecture)
- L2 cache:256k
- L3 cache:15360k
- NUMA node0 CPU (s): 0-5,24-29
- NUMA Node1 CPU (s): 6-11,30-35
- NUMA Node2 CPU (s): 12-17,36-41
- NUMA node3 CPU (s): 18-23,42-47
From the output, it can be seen that the current machine has 4 sockets, each sockets contains 1 NUMA node, each NUMA node has 6 cores, each cores contains 2 thread, so the total threads number =4 (sockets) x 1 (node) x6 (cores) x2 (threads) =48.
Alternatively, you can print out the number of socket,core and thread of the current machine using the script below.
- #!/bin/bash
- # Simple Print CPU topology
- # Author:kodango
- function Get_nr_processor ()
- {
- grep ' ^processor '/proc/cpuinfo | Wc-l
- }
- function Get_nr_socket ()
- {
- grep ' physical id '/proc/cpuinfo | Awk-f: ' {
- Print $ | "Sort-un"} ' | Wc-l
- }
- function Get_nr_siblings ()
- {
- grep ' Siblings '/proc/cpuinfo | Awk-f: ' {
- Print $ | "Sort-un"} '
- }
- function Get_nr_cores_of_socket ()
- {
- grep ' CPU cores '/proc/cpuinfo | Awk-f: ' {
- Print $ | "Sort-un"} '
- }
- Echo ' ===== CPU topology Table ===== '
- Echo
- Echo ' +--------------+---------+-----------+ '
- Echo ' | Processor ID | Core ID | Socket ID | '
- Echo ' +--------------+---------+-----------+ '
- while read line; Do
- If [-Z "$line"]; Then
- printf ' | %-12s | %-7s | %-9s |\n ' $p _id $c _id $s _id
- Echo ' +--------------+---------+-----------+ '
- Continue
- Fi
- If echo "$line" | Grep-q "^processor"; Then
- P_id= ' echo ' $line | Awk-f: ' {print $} ' | Tr-d ""
- Fi
- If echo "$line" | Grep-q "^core id"; Then
- C_id= ' echo ' $line | Awk-f: ' {print $} ' | Tr-d ""
- Fi
- If echo "$line" | Grep-q "^physical id"; Then
- S_id= ' echo ' $line | Awk-f: ' {print $} ' | Tr-d ""
- Fi
- Done </proc/cpuinfo
- Echo
- Awk-f: ' {
- if ($ ~/processor/) {
- Gsub (//, "", $);
- p_id=$2;
- } else if ($ ~/physical ID/) {
- Gsub (//, "", $);
- s_id=$2;
- ARR[S_ID]=ARR[S_ID] "" p_id
- }
- }
- end{
- For (i in ARR)
- printf "Socket%s:%s\n", I, arr[i];
- } '/proc/cpuinfo
- Echo
- Echo ' ===== CPU Info Summary ===== '
- Echo
- Nr_processor= ' Get_nr_processor '
- echo "Logical processors: $nr _processor"
- nr_socket= ' Get_nr_socket '
- echo "Physical socket: $nr _socket"
- nr_siblings= ' Get_nr_siblings '
- echo "Siblings in one socket: $nr _siblings"
- nr_cores= ' Get_nr_cores_of_socket '
- echo "Cores in one socket: $nr _cores"
- Let Nr_cores*=nr_socket
- echo "Cores in total: $nr _cores"
- If ["$nr _cores" = "$nr _processor"]; Then
- echo "Hyper-threading:off"
- Else
- echo "Hyper-threading:on"
- Fi
- Echo
- Echo ' ===== END ===== '
————————————————————
NUMA Architecture Detailed