In general, MySQL optimization can be divided into two levels: one is Optimization at the MySQL level, such as SQL rewriting, index addition, and configuration of various MySQL parameters; the other layer is the operating system and hardware. Operating system-level optimization in general, MySQL optimization can be divided into two levels, one is on the MySQL-level optimization, such as SQL rewriting, index addition, the configuration of various MySQL parameters. The other layer is the optimization of the operating system and the hardware layer. For operating system-level optimization, you must first locate the resource bottleneck-CPU, memory, hard disk, and network, and then start tuning. In fact, MySQL tuning is not that simple. it requires us to have a deep understanding of hardware, OS, and MySQL. For example, how the NUMA-based CPU allocates the CPU, how the memory is allocated, and how to avoid the occurrence of SWAP; in Linux, which of the following types of I/O scheduling algorithms is used? CFQ, DEADLINE, NOOP, or ANTICIPATORY; in MySQL, redo logs and undo logs are written. which one is sequential and which one is random?
Therefore, if you want to optimize MySQL, you must have a good grasp of the hardware, OS, and internal implementation principles of MySQL. This article describes the CPU and process of MySQL optimization.
1. NUMA and SMP in CPU architecture
SMP: Shared Memory Mulpti Processors, also known as symmetric CPU architecture)
NUMA: Non-Uniform Memory Access)
Their most important difference is whether the memory is bound to each physical CPU and how the CPU accesses the memory:
The CPU in the SMP architecture is not bound with memory. all CPUs compete for one bus to access all the shared memory. the advantage is resource sharing, and the disadvantage is that the competition for bus is fierce. As the number of CPUs on the PC server increases (not just the number of CPU cores), the disadvantages of bus contention become increasingly apparent. Therefore, Intel launched the NUMA architecture on the Nehalem CPU, AMD also launched Opteron CPU based on the same architecture.
The biggest feature of NUMA is the introduction of the concepts of node and distance. the node has multiple CPUs and bound memory. The CPU and memory of each node are generally equal. Distance is used to define the resource overhead for each node to call. Shows the relationship between memory and CPU in the NUMA architecture:
(The node has multiple CPUs and memory. each node has inter-connect)
The NUMA architecture is proposed to adapt to multiple CPUs. we can see that the memory access by the CPU inside the node is divided into two types:
1) access to the node's internal memory is generally called local access. Obviously, the access speed is the fastest;
2) access to the memory in other nodes is generally called remote access. because the access speed is slower due to inter-connect;
Because the CPU and memory are bound to form a node, it involves how the CPU is allocated and how the memory is allocated:
1) NUMA's CPU allocation policies include cpunodebind and physcpubind. Cpunodebind specifies the cores on which the process runs, while physcpubind can more precisely specify the cores on which the process runs.
2) NUMA memory allocation policies include localalloc, preferred, membind, and interleave. Localalloc requires that the process request to allocate memory from the current node. preferred is loose and specifies a recommended node to obtain the memory. if the recommended node does not have enough memory, the process can try other nodes. Membind can specify several nodes, and the process can only request memory allocation from these specified nodes. Interleave requires that the process request to allocate memory from all nodes in an RR algorithm to achieve random and even memory allocation from each node.
Problems caused by NUMA architecture -- SWAP
The NUMA-based CPU uses the localalloc memory allocation policy by default. the processes running on the internal CPU of the node will allocate memory from the internal memory of the node, if the memory is insufficient, swap will be generated, seriously affecting performance! It does not apply for memory from other nodes. This is his biggest problem.
To avoid SWAP generation, you must set the memory allocation policy of the NUMA architecture CPU to interleave. in this way, the memory allocation of any process will be randomly applied to each node, although remote access causes a slight performance loss, it eliminates the severe performance loss caused by SWAP. Therefore, interleave virtualizes the memory of each node in the NUMA architecture into a shared memory. However, unlike SMP, because each node has inter-connect, therefore, the SMP architecture bus contention is avoided.
2. CPU and Linux processes
Process should be the most important concept in Linux. The process runs on the CPU and is the object allocated to all hardware resources. In Linux, a task_struct structure is used to describe the process, and various information, attributes, and resources of the process are described.
The lifecycle of processes in Linux and the calls they involve:
1) the parent process calls fork () to generate a new self-process;
2) the sub-process calls exec () to specify the code to be executed;
3) the sub-process calls exit () to exit and enters the zombie state;
4) the parent process calls wait () and waits for the child process to return and reclaim all its resources;
Thread:
Is an execution unit. all threads in the same process share the resources of the process. A thread is generally called a lightweight LWP (Light Weight Process) Process. So the term thread is not so mysterious. we can treat it as a special process.
Process Priority and nice:
Process Scheduling involves the priority of the process. The priority value is represented by nice level. the value range is 19 ~ -20. The smaller the value, the higher the priority. the default value is 0.
Generally, if we want to reduce the scheduling frequency of a thread, we can increase its nice value (the more nice it is, the less it will compete for CPU ).
Context switch of the process:
Context switching is an extremely important concept because it has a great impact on performance. Process Scheduling involves process context switching. Context switching refers to Flushing all the information of the process from the CPU register to the memory, and replacing it with the context of other processes. Frequent context switching will greatly affect performance.
CPU interrupt handling:
CPU interrupt processing is one of the tasks with the highest priority. Interruptions are classified into hard interrupte and soft interrupt.
If the interrupt occurs, the interrupt processing program will be run, and the context switch will be triggered. Therefore, excessive interruptions will also lead to performance degradation.
Various states of the process:
The status of the process must be clearly understood. Otherwise, the load average (top command and uptime Command) and other information of the subsequent process will not be understood.
1) TASK_RUNNING: In this state, a process is running on a CPU or waiting to run in the queue (run queue ).
2) TASK_STOPPED: A process susponded by certain signals (for example SIGINT, SIGSTOP) is in this state. The process is waiting to be
Resumed by a signal such as SIGCONT.
3) TASK_INTERRUPTIBLE: In this state, the process is suincluded and waits for a certain condition to be satisfied. If a process is in
TASK_INTERRUPTIBLE state and it records es a signal to stop, the process state is changed and operation will be interrupted. A typical
Example of a TASK_INTERRUPTIBLE process is a process waiting for keyboard interrupt.
4) TASK_UNINTERRUPTIBLE: Similar to TASK_INTERRUPTIBLE. While a process in TASK_INTERRUPTIBLE state can be interrupted,
Sending a signal does nothing to the process in TASK_UNINTERRUPTIBLE state. A typical example of a TASK_UNINTERRUPTIBLE
Process is a process waiting for disk I/O operation.
5) TASK_ZOMBIE: After a process exits with exit () system call, its parent shoshould know of the termination. In TASK_ZOMBIE state,
Process is waiting for its parent to be notified to release all the data structure.
Except that the process in TASK_RUNNING may be running, the process in other states is not running. But in fact, TASK_UNINTERRUPTIBLE is special. it can be seen as running, because it is waiting for the disk operation to complete, so in fact, from the system perspective, rather than from the process perspective, in fact, the system is running for the process. That is why the value in the load average is the average value of processes in the TASK_RUNNING and TASK_UNINTERRUPTIBLE states.
How to use memory for processes:
To run a process, you must apply for and use the memory. The most important tasks include heap and stack:
The memory used by the process. you can use pmap and ps to wait for the command line to view the memory. There will be some station memory tuning articles later.
CPU scheduling:
The preceding section describes the CPU scheduling process priority and nice level. in Linux, the process scheduling algorithm is O (1), so the number of processes does not affect the process scheduling efficiency.
Process Scheduling involves two priority arrays (priority queue): active and expired. The CPU schedules the processes in the active queue according to the priority. after all the processes in the queue are scheduled, switch the active queue and the expired queue to continue scheduling.
When the NUMA architecture CPU is scheduled, it generally does not take down node nodes for scheduling, unless a node's CPU is overloaded and requests for load balancing. The CPU scheduling of the kill node affects the performance.
3. how to measure CPU usage in Linux
1) CPU utilization: the most intuitive and important thing is the CPU usage. If it exceeds 80% for a long time, it indicates that the CPU has encountered a bottleneck;
2) User time: the CPU used by the User process. The higher the value, the better, indicating that the more CPU is used in the User's actual work.
3) System time: the CPU used by the kernel, including the CPU used for hard interrupt and soft interrupt. The lower the value, the better. the higher the value indicates a bottleneck in the network and driver layer;
4) Waiting: the CPU time spent Waiting for IO operations. this value is very high, indicating that the IO subsystem encounters a bottleneck;
5) Idel time: CPU idle time;
6) Load average: the average of the sum of TASK_RUNNING and TASK_UNINTERRUPTIBLE processes. if processes that request CPU time are blocked (which means that the CPU has no time to process them), the load average will increase. on the other hand, if each
Process gets immediate access to CPU time and there are no CPU cycles lost, the load will decrease.
7) Context Switch: Context Switch;
How to detect CPU:
The best tool for detecting CPU usage is the top command:
3.1 top command to view CPU information
To optimize, you must understand the output of many command line tools. The top command above contains a lot of information:
The first line:
Top-14:35:55 up 25 min, 4 users, load average: 0.10, 0.07, 0.14
The current system time, the 25 min value indicates that the system has been running for 25 minutes, and the 4 users value indicates that four logged-on users are in the system;
Load average: average CPU load for the last 1 minute, 5 minutes, and 15 minutes, respectively.
(Load average: the average of the sum of TASK_RUNNING and TASK_UNINTERRUPTIBLE processes );
Load average
Row 2:
Tasks: 92 total, 1 running, 91 sleeping, 0 stopped, 0 zombie
Number of processes in the system: 92 processes in total, 1 running, 91 running, 0 stopped, and 0 botnets;
Row 3:
Cpu (s): 0.0% us, 1.7% sy, 0.0% ni, 97.7% id, 0.0% wa, 0.3% hi, 0.3% si, 0.0% st
This line provides the most important information about CPU usage, indicating users time, system time, nice time, idle time, wait time, hard interrupte time, soft interrupted time, steal time; in the end, what is required is: users time, system time, wait time, idle time, and so on. Nice time indicates the time spent on tuning the nice level of the process.
Row 4:
Mem: total, used, free, buffers
The information about the memory is provided, because Linux will try its best to use the memory for caching, so this information has little value, and the free value is small, does not mean there is a memory bottleneck;
Fifth line:
Swap: total, used, free, cached
The provided information is about the usage of swap partitions, which is of little value because of the memory usage mechanism of Linux. A large used value does not indicate a memory bottleneck;
The remaining information is about the resources used by each process. various information about the process is sorted by the CPU usage. the meaning of each field is as follows:
PID: Process ID; USER: indicates the USER who runs the process; PR: indicates the process priority; NI: indicates the nice value of the process. The default value is 0;
VIRT: The total amount of virtual memory used by the task. it includes des all code, data and shared libraries plus pages that have been swapped out. the virtual memory size occupied by the process, including the page of swap out memory;
RES: Resident size (kb ). The non-swapped physical memory a task is using. The resident memory used by The process, excluding The memory of swap out;
SHR: Shared Mem size (kb ). The amount of shared memory used by a task. It simply reflects memory that cocould be potentially shared
With other processes. the shared memory allocated by the shmget () system call can be shared among multiple processes.
S: Status of the process: R: Running; S: sleeping; T: stoped; D: interrupted; Z: zomobie;
% CPU: the CPU used by the process;
% MEM: memory occupied by the process;
% TIME +: process running TIME;
COMMAND: process running COMMAND;
Reading top and other related command line information is the basis for optimization. In fact, the meaning of the output of these command lines is described in man top. Just read the manual with patience.
The top command above is displayed in process units by default. we can also display it in thread units: top-H
We can see that the number of Tasks Tol increases significantly in the unit of thread, with 92 increasing to 134. because mysqld is a thread architecture, there are multiple background threads in the background.
If you only want to view the CPU load average, use the uptime Command:
[Root @ localhost ~] # Uptime
15:26:59 up, 4 users, load average: 0.00, 0.02, 0.00
3.2 vmstat view CPU information
Meanings of the above fields:
FIELD DESCRIPTION FOR VM MODE Procs r: The number of processes waiting for run time. b: The number of processes in uninterruptible sleep. Memory swpd: the amount of virtual memory used. free: the amount of idle memory. buff: the amount of memory used as buffers. cache: the amount of memory used as cache. inact: the amount of inactive memory. (-a option) active: the amount of active memory. (-a option) Swap si: Amount of memory swapped in from disk (/s). so: Amount of memory swapped to disk (/s). IO bi: Blocks received from a block device (blocks/s). bo: Blocks sent to a block device (blocks/s). System in: The number of interrupts per second, including the clock. cs: The number of context switches per second. CPU These are percentages of total CPU time. us: Time spent running non-kernel code. (user time, including nice time) sy: Time spent running kernel code. (system time) id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time. wa: Time spent waiting for IO. Prior to Linux 2.5.41, included in idle. st: Time stolen from a virtual machine. Prior to Linux 2.6.11, unknown.
CPU-related information includes user time, system time, idle time, wait time, and steal time;
In addition, information about interruptions and context switching is provided. In System, indicating the number of interruptions per second, and cs indicates the number of context switches per second;
Other fields are about memory and disk.
3.3 iostat command to view CPU information
[root@localhost ~]# iostatLinux 2.6.32-504.el6.i686 (localhost.localdomain) 09/30/2015 _i686_ (1 CPU)avg-cpu: %user %nice %system %iowait %steal %idle 0.27 0.02 4.74 0.28 0.00 94.69Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtnscd0 0.01 0.08 0.00 536 0sda 1.64 42.27 8.97 290966 61720[root@localhost ~]# iostat -cLinux 2.6.32-504.el6.i686 (localhost.localdomain) 09/30/2015 _i686_ (1 CPU)avg-cpu: %user %nice %system %iowait %steal %idle 0.26 0.02 4.72 0.27 0.00 94.72
Provides the average cpu usage.
3.4 ps command to view CPU information of a process and a thread
For example, I want to view information about mysqld:
[root@localhost ~]# pidof mysqld2412[root@localhost ~]# ps -mp 2412 -o THREAD,pmem,rss,vsz,tid,pidUSER %CPU PRI SCNT WCHAN USER SYSTEM %MEM RSS VSZ TID PIDmysql 6.7 - - - - - 42.8 441212 752744 - 2412mysql 6.5 19 - - - - - - - 2412 -mysql 0.0 19 - - - - - - - 2414 -mysql 0.0 19 - - - - - - - 2415 -mysql 0.0 19 - - - - - - - 2416 -mysql 0.0 19 - - - - - - - 2417 -mysql 0.0 19 - - - - - - - 2418 -mysql 0.0 19 - - - - - - - 2419 -mysql 0.0 19 - - - - - - - 2420 -mysql 0.0 19 - - - - - - - 2421 -mysql 0.0 19 - - - - - - - 2422 -mysql 0.0 19 - - - - - - - 2423 -mysql 0.0 19 - - - - - - - 2425 -mysql 0.0 19 - - - - - - - 2426 -mysql 0.0 19 - - - - - - - 2427 -mysql 0.0 19 - - - - - - - 2428 -mysql 0.0 19 - - - - - - - 2429 -mysql 0.0 19 - - - - - - - 2430 -mysql 0.0 19 - - - - - - - 2431 -mysql 0.0 19 - - - - - - - 2432 -mysql 0.0 19 - - - - - - - 2433 -mysql 0.0 19 - - - - - - - 2434 -
Obtain the basic PID 2412 of mysqld first,
Then view the THREAD information: ps-mp 2412-o THREAD, pmem, rss, vsz, tid, pid (-p specifies the process PID,-o specifies the output format, see man ps)
The first column is information about the thread CPU. In addition, we can also find the tid that occupies a thread with a high CPU.
We generally use ps-elf for ps. if you want to view the thread, you can use ps-elLf, where L indicates the Leight weight process lightweight process.
3.5 mpstat (multi processor stat) command to view information about each CPU core in a multi-core CPU
[Root @ localhost ~] # Mpstat
Linux 2.6.32-504. el6.i686 (localhost. localdomain) 09/30/2015 _ i686 _ (1 CPU)
At 04:11:50 CPU % usr % nice % sys % iowait % irq % soft % steal % guest % idle
04:11:50 all 0.26 0.02 4.30 0.27 0.26 0.15 0.00 0.00 94.74
Here, because the single-core CPU in the virtual machine is displayed, only all is displayed, and the CPU usage of other cores is not displayed.
4. CPU optimization
1) use the tools described above: top, vmstat, iostat, ps-mp xxx-o, and mpstat to check whether there is a CPU bottleneck. Then confirm the user time, system time, wait time, context switch ...... that occupies a high proportion, and confirm which process occupies a high CPU. If you can confirm that the process is related to mysqld, you can start with mysql for optimization. For example, you can use the mysql command show processlist to check which SQL result is caused, locate the SQL, optimize or rewrite it, or put it on slave for running.
2) If a non-essential process occupies the CPU, you can kill it and use cron to execute it during off-peak hours, or use the renice command to lower its priority;
[Root @ localhost ~] # Renice-n-10-p 2041
2041: old priority 10, new priority-10
Set the priority of process 2041 to-10.
3) you can trace the process and find the cause: strace-aef-p spid-o file
[root@localhost ~]# strace -aef -p 2041 -o mysql.txtProcess 2041 attached - interrupt to quit^CProcess 2041 detached[root@localhost ~]# ll mysql.txt-rw-r--r-- 1 root root 10091 Sep 30 16:44 mysql.txt[root@localhost ~]# head mysql.txtread(0, "\33", 1) = 1read(0, "[", 1) = 1read(0, "A", 1) = 1write(1, "select version();", 17) = 17read(0, "\n", 1) = 1write(1, "\n", 1) = 1ioctl(0, SNDCTL_TMR_STOP or TCSETSW, {B38400 opost isig icanon echo ...}) = 0rt_sigprocmask(SIG_BLOCK, [HUP INT QUIT TERM CONT TSTP WINCH], [], 8) = 0rt_sigaction(SIGINT, {0x8053ac0, [INT], SA_RESTART}, NULL, 8) = 0rt_sigaction(SIGTSTP, {SIG_DFL, [], 0}, NULL, 8) = 0
4) for a better CPU.