We know that all servers are now multi-core, and the latest intel sandybridge is the NUMA Io structure. Taking the NIC as an example, that is to say, the PCI slot is not connected to the CPU through a bridge like westmere, currently, SNB directly links PCI to the CPU.
Sometimes when we test performance, we can see that the performance is high or low. The reason is hard to find and confused. Next I will list the impact of the following CPUs on the Performance Based on my actual experience:
CPU affinity for multi-core parallel programming and frequence may have a greater impact on performance. If the affinity of the CPU is not set properly and the memory on another CPU is accessed through cross-QPI, the performance gap has been tested, there will be a 10-fold gap (two 10G Network Ports receive packets, the average packet length is 256, the worst case is 2.5 Gbps, the best case is 19.8 Gbps, 64 byte packets can reach 16.5 Gbps on L4 ).
First, we ensure that the hardware CPU is not affected, and cache consistency is not considered here:
1. the maximum CPU running frequency is ensured. Currently, the default value is OnDemand, which automatically adjusts the frequency and c state based on the CPU idle. To ensure that the CPU runs in the P state, that is, the performance status (of course, the power consumption will be larger). You can use the following script to make the following settings:
#!/bin/bashfor f in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governordo echo performance > $fdoneexit 0
This ensures that the CPU runs in the optimal state. The lscpu shows that the CPU frequency is already the highest.
2. the Set affinity or taskset of the CPU is only useful to user-layer programs. After the Program falls into the kernel, it cannot guarantee the CPU on which the current thread runs, ensure that the correct socket is specified through vmalloc_node when memory is allocated on the kernel.
3. the Linux kernel currently provides the offline CPU function. Of course, the CPU core 0 currently cannot be offline (a kernel master I know is working on the CPU 0 offline function, it is estimated that the kernel will support it soon). To eliminate the impact of cpu numa, we can perform logical offline for all the cores on socket 1, in this way, the kernel will not see all the CPU cores on socket 1, and the memory allocation will automatically go to node 0.
Run the following command:
cd /sys/devices/system/cpu/
ls
cat online
You can see the number of the online CPU, that is, the number of CPU that can be used by the kernel.
Take offline cpu11 as an example. Here the number is the core number.
cd cpu11cat onlineecho 0 > online
In this way, offline cpu11 is available. We cannot see cpu11 in the top command or some other CPU observation commands. The specific implementation mechanism here is complicated, if you are interested, you can view the kernel source code.