Recently a series of pre-sales demos of Microsoft Virtualization Solutions in fact, for most customers will not be a complete set of "relatively perfect" program in the first time to see very in-depth, but usually more care about some of the most basic functions and configuration options, especially the technical post people will often have "preconceived" mood, This is normal, assuming that the customer is using n years of VMware or that he is a Si jie powder, then you have a Microsoft environment to speak to others, perhaps the other side will first ask some of the Hyper-V Manager functions, and even ask very fine, do not underestimate these seemingly "humble" feature options, It will become a powerful counterweight to the competition between different vendors or a core pillar of the whole programme infrastructure.
"Non-Uniform Memory access (NUMA) is a computer memory design for multiprocessor applications, and memory access time depends on the memory location of the processor." Under NUMA, the processor accesses its own local storage faster than a non-local memory (the processor or memory shared between the storage and the other processor). 】
That is the general meaning of the NUMA architecture, and more about this technology, you can easily check it out on Baidu or Wikipedia. So where is its merit and value?
We use SQL services to illustrate this, and I'll reprint some of the official blog posts in the SQL support Group http://blogs.msdn.com/b/apgcdsd/archive/2011/11/16/sql-server-numa.aspx
The following illustration depicts a more figurative NUMA architecture:
We have two NUMA nodes. Each NUMA node has some CPUs, an internal bus, and its own memory, and can even have its own IO. Each CPU has the most recent memory available for direct access. Therefore, the performance of the system is faster with the NUMA architecture. Under NUMA structure, we can easily increase the number of CPUs. Under non-NUMA architecture, increasing the CPU can cause the system bus load to be heavy and the performance improvement is not obvious.
Each CPU can also access memory on another NUMA node, but such access can be slow. We should try to avoid. Applications that do not realize this structure, on NUMA machines, sometimes perform even worse because they often unconsciously access remote memory causing performance degradation.
####################################################################################
With a general understanding of the NUMA architecture, let's say that the virtual NUMA node configuration in Hyper-V is the CPU attribute on my personal computer, and it's hard not to support the hardware NUMA architecture, but for some services, we can implement the software to simulate multiple NUMA nodes, For example, SQL can be modified by the registry to implement, you can see in the link at the top of this article
Also in Performance Monitor (Perfmon.exe), we can add the native "Hyper-V VM vid-NUMA node" to see the number of nodes currently used by Hyper-V, the following figure or my personal computer shows only one node0, and number of pages 1025851000; page size (4k) =4GB, this is the actual physical memory of my computer.
####################################################################################
Next we change a server to see, the CPU is two, dual core, support Hyper-threading; then 2*2*2=8, you'll see 8 logical CPUs
Also we can see that this device supports hardware NUMA and is two nodes, so we can figure out for ourselves, 32G memory + 8 logical CPU, divided into two parts, then one node is 16G memory +4 logical CPU
Also, we can see two nodes in the Performance Monitor, that is, Node0 and Node1, and the corresponding number of pages and the number of processors are 16GB and 4 respectively LCPU
Next we look for a virtual machine, in the setup, find "Numa" under "processor"; we can see a hardware topology, and if "Hardware Topology" is selected, the system will map the NUMA architecture of this server directly, taking this server as an example, "per-node 16g+4lcpu" But if you do run some high-performance parallel operations on this virtual machine and need NUMA schema support, a simple mapping of the physical server configuration may not be the best choice;
The following figure is an example: this virtual machine has an SQL instance, the virtual machine configuration is 4g+4vcpu; I configured it as 2 virtual NUMA nodes, so each node is 2vcpu+2g memory, and each CPU slot defaults to up to 1 nodes, so I have two sockets (slots) for this VM. If you think of this virtual machine as a physical machine, then it is a machine that has "two physical CPUs, each CPU dual-core or single core dual thread, 4G memory"
In this way, our virtual machine is a machine that supports NUMA architecture, and if you combine the SQL NUMA environment mentioned in this article, we will be more flexible to tune the database service
We start this virtual machine, go to the system, look at the Windows Application log, find the corresponding SQL 17152 log, you can see that there are two, respectively, showing the configuration of two nodes, Node0 and Node1 This shows that the virtual machine has successfully implemented the NUMA architecture simulation according to the previous plan.
Also in the SQL Server properties, you can see the currently recognized NUMA node in the processor option; You can have SQL automatically associated, or manually
Compared to my native SQL configuration, only one node
######################################################################################
Here, I'll reprint the SQL Development Group article:
SQL Server supports NUMA not only on the engine, but also on the connection level. If NUMA is not set at the connection level, then every connection comes in, and SQL Server chooses node for processing according to the Robin method. Within node, SQL Server selects one CPU with the lowest CPU load for processing. The disadvantage of this approach is that it is possible that all CPUs within a node are busy, but all CPUs in the other node are empty. leads to uneven resources. In this case, the use of NUMA schema performance will decrease, rather than using a non-NUMA schema. The system can allocate CPU resources evenly.
As shown in the following illustration, we use the Robin method, which may be very busy with Numa NODE0, while Numa NODE1 is very idle and system resources are not fully utilized. Important connections may be assigned to NODE0, resulting in a failure to be processed in a timely manner and performance is affected
To do this, we can do the setup on the connection. For important operations, we use port 1450, which binds NUMA nodes 0, 1, 2, and for unimportant operations (which may require a lot of resources but unimportant), we use port 1433, which binds NUMA node 3, so that Unimportant operations do not have a performance impact on important operations
How do I set a port to bind to a NUMA node? We can add NUMA node information behind the listening port, or take the virtual machine just shown; there are 2 NUMA nodes, and if you want to use NUMA node 0, the corresponding value is 1, depending on the way the binary is converted; This is the same as when we compute the subnet mask, write 1 under the corresponding number of nodes. And then convert it into a binary.
NUMA NODE Number
1,0
Mask for 1
0,1
And then add the value to the port you want to listen to, for example, I want a service on the 0, 1 node running, then add a [3] after the 1433 port on the line
#######################################################################################
Next we return to Hyper-V topic, the Host support NUMA architecture, then the system will automatically allocate the virtual machine to a node running, we still take just this SQL Server to explain; First, install a winrar, and then run the performance test to give the CPU pressure
You can see that the CPU usage load of the virtual machine has changed immediately.
At this point, take a look at the host, add "hyper-V VM vid partition" to the Performance Monitor, and you can see that the current "10.12_tfs_db" virtual machine runs under node 1 (refer to Preferred NUMA node index=1), And the logical CPU usage takes up only 4
######################################################################################
So there's a problem here, after all, the resources of a NUMA node are limited, which is to make a grouping of the physical resources; To increase the CPU's access to memory by binding several logical CPUs and several sizes of memory, effectively reducing the bottleneck of the front-end bus , but if more than one virtual machine is assigned to the same node, but this node's resources have been eaten full of how to do?
In fact, the CPU in this scenario is usually not a bottleneck, the problem is often memory, so there is always a breakthrough point of the virtual machine to access the memory across the node; this is the problem, NUMA architecture is to divide the node boundaries, so that data interaction as far as possible in the respective nodes, the result of such a cross access , there will be a performance degradation problem, and not only this node of the virtual machine is affected, but also encroach on the resources of the visited node; so is there any way to control it? There must be, first look at the effect of cross-node access
We also open more than one virtual machine, as much as possible to allocate some large memory, as shown in the following figure
Then interesting things appear, we three virtual machine DC, DB and the Web are running in the node 1 above, but the DC this machine "remote physical pages" appeared a number, which is remote access to other nodes of the memory; we have this host on two nodes, So it must be using the resources of node 0, and you can calculate how much of the non-local memory it eats by the page size.