Recently responsible for the SVR pressure is large, business logic is a little more complex, can optimize the place has been all optimized,
Currently 3k per second, CPU load is still relatively high
Top look, 4 core CPU load is not too balanced, it is intended to consider the business process is assigned to 3 CPU running, the other CPU is responsible for processing network packets; intend to try, if still not, after a period of time, the number of visits to increase, we will add the machine, whining
Add : Test today, the effect is very good, the same process number of cases, CPU binding
Each CPU is used, and the load is much better than the unbound
Analyze the effective reasons:
Read the "Linux kernel design and implementation of" section 42, think that the artificial control of the CPU binding or useful place
1, Linux, SMP load Balancing is based on the number of processes, each CPU has an executable process queue, only one CPU in the executable queue of more than the number of other CPU queue processes more than 25%, will move the process to another idle CPU, That means the number of processes on the cpu0 should be more than the other CPUs, but within 25%
2, our business in four types of CPU consumption, (1) network card interrupt (2) 1 processing network process (3) CPU-consuming N worker process (4) Other less CPU-intensive process
Based on 1 of the load balancing is for the number of processes, then (1) (2) Most of the time will appear on the Cpu0, (3) of the n processes will be scheduled, on average to other multiple CPUs, (4) in the process is assigned to each CPU;
When the network card is interrupted, the CPU is interrupted, the processing network card is interrupted, then the worker process assigned to the CPU0 must not run
When CPU-intensive processes are not available on other CPUs, even if its time slice is very short, it is also to execute, then this time, your worker process is still affected; According to the dispatch logic, a very bad situation is: (1) (2) (3) The process is all allocated to the cpu0, Other less CPU-intensive processes are distributed to CPU1,CPU2,CPU3. Then when the network card outage occurs, your business process will not get the CPU
If from a business standpoint, the more the worker process runs, the faster the business process is, and the more people bundle it to other low load CPUs, the more time the worker process will use the CPU.
An example was found:
Now the trend of multiple CPUs is getting bigger. Sometimes in order to better operate the machine, you need to bind a process to the specific CPU. An example of a process binding to a specific CPU is given below.
#include <stdlib.h> #include <stdio.h> #include <sys/types.h> #include <sys/sysinfo.h> # include<unistd.h> #define __USE_GNU #include <sched.h> #include <ctype.h> #include <string.h> int main (int argc, char* argv[]) {int num = sysconf (_sc_nprocessors_conf); int created_thread = 0; int myid; int i; Int J = 0; cpu_set_t Mask; cpu_set_t get; if (argc!= 2) {printf ("Usage:./CPU num/n"); exit (1);} myID = Atoi (argv[1)); printf ("System has%i processor (s)." /n ", num); Cpu_zero (&mask); Cpu_set (myID, &mask); if (sched_setaffinity (0, sizeof (mask), &mask) = = 1) {printf ("Warning:could not set CPU affinity, continuing.../n"); while (1) {Cpu_zero (&get); if (sched_getaffinity (0, sizeof (GET), &get) = = 1) {printf ("Warning:cound not get CPU affinity, continuing.../n "); for (i = 0; i < Num. i++) {if (Cpu_isset (i, &get)) {printf ("This process%d is running processor:%d/n", Getpid (), i); }} return 0; }
The following results were obtained after the two terminals were executed with a./cpu 0/cpu 2. Effect is more obvious. Quote:cpu0:5.3%us, 5.3%sy, 0.0%ni, 87.4%id, 0.0%wa, 0.0%hi, 2.0%si, 0.0%st
Cpu1:0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2:5.0%us, 12.2%sy, 0.0%ni, 82.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3:0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu4:0.0%us, 0.0%sy, 0.0%ni, 99.7%id, 0.3%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5:0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu6:0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu7:0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st/////////////////////////////////////////// CPU Affinity (CPU Affinity)
CPU affinity refers to the ability to bind one or more processes to one or more processors on a Linux system.
The CPU affinity mask for a process determines which CPU or CPUs the process will run on. In a multiprocessor system, setting the mask for the affinity of the CPU may achieve better performance.
A CPU affinity mask uses a cpu_set_t structure to represent a CPU collection, and the following macros operate on the mask set separately:
Cpu_zero () empty a collection
Cpu_set () and CPU_CLR () are respectively added to a set of a given CPU number or removed from a set.
Cpu_isset () checks whether a CPU number is in this collection.
In fact, these usages are similar to those of the Select () function.
The following two functions are the most important:
Sched_setaffinity (pid_t pid, unsigned int cpusetsize, cpu_set_t *mask)
This function sets the process to PID for this process and lets it run on the CPU set by mask. If the PID value is 0, the current process is specified so that the current process runs on those CPUs set by mask. The second argument cpusetsize is
The length of the number specified by the mask. usually set to sizeof (cpu_set_t). If the CPU specified by the current PID is not running at this time on any of the CPUs specified by mask, the specified process is migrated from the other CPUs to the mask specified
Run on one CPU.
Sched_getaffinity (pid_t pid, unsigned int cpusetsize, cpu_set_t *mask)
The function obtains the CPU bit mask of the process indicated by the PID and returns the mask to the structure pointed to by the mask. Gets the specified PID to which CPUs are currently running. Similarly, if the PID value is 0, it also represents the current process.
The specific usage of these macros and functions is explained earlier.
About the definition of cpu_set_t
# define __cpu_setsize 1024
# define __NCPUBITS (8 * sizeof (__CPU_MASK))
typedef unsigned long int __cpu_mask;
# define __cpuelt (CPU) (CPU)/__ncpubits
# define __CPUMASK (CPU) (__cpu_mask) 1 << ((CPU)% __ncpubits)
typedef struct
{
__cpu_mask __bits[__cpu_setsize/__ncpubits];
} cpu_set_t;
# define __cpu_zero (CPUSETP)/
do {/
unsigned int __i; /
cpu_set_t *__arr = (CPUSETP); /
for (__i = 0; __i < sizeof (cpu_set_t)/sizeof (__cpu_mask); ++__i)/
__arr->__bits[__i] = 0; /
} while (0)
# define __cpu_set (CPU, CPUSETP)/
((CPUSETP)->__bits[__cpuelt (CPU)] |= __cpumask (CPU))
# define __CPU_CLR (CPU, CPUSETP)/
((CPUSETP)->__bits[__cpuelt (CPU)] &= ~__cpumask (CPU))
# define __cpu_isset (CPU, CPUSETP)/
(((CPUSETP)->__bits[__cpuelt (CPU)] & __cpumask (CPU))!= 0)
On my machine the size of sizeof (cpu_set_t) is 128, that is, a total of 1024 bits. The first digit represents a CPU number. A 1 indicates that a process can run on the CPU represented by that bit. For example
Cpu_set (1, &mask);
The 2nd bit of the mask is set to 1.
At this point if printf ("%d/n", Mask.__bits[0]), print out 2, indicating that the 2nd bit is set to 1.
Specifically I refer to the functions in the man sched_setaffinity document.
Then refer to an explanation on IBM's developerworks.
Http://www.ibm.com/developerworks/cn/linux/l-affinity.html