Linux commands-analyze CPU bottlenecks

Source: Internet
Author: User
Linux command ---- analyze CPU bottlenecks to measure CPU performance indicators: 1. CPU usage; CPU running regular user processes CPU running nicedprocessCPU running real-time processes 2. CPU usage; for IO management: interrupt and driver for memory management: page switch user process management Linux command ---- analyze CPU bottleneck metrics to measure CPU performance: 1. CPU usage; CPU runs regular user processes CPU runs niced process CPU runs real-time process 2, CPU usage; for I/O management: interrupt and driver for memory management: page switch User process Management: process start and context switch 3, WIO: the rate at which the CPU is idle when the process waits for disk I/O. 4. CPU idle rate: in addition to the preceding WIO idle time 5, the ratio of CPU used for context switching is 6, nice 7, real-time 8, and the length of the running process queue is 9, average load tools commonly used in Linux to monitor the overall CPU performance include: mpstat not only can view the average information of all CPUs, but also can view the information of the specified CPU. Vmstat: only the average information of all CPUs can be viewed; CPU queue information can be viewed; iostat: only the average information of all CPUs can be viewed. Sar: like mpstat, it can not only view the average CPU information, but also the information of the specified CPU. Top: the displayed information is similar to ps, but top can understand the CPU consumption and update the display based on the time specified by the user. I. vmstat [root @ localhost ~] # Vmstat-n 3 (refresh every 3 seconds) procs ----------- memory -------------------- swap -- ---- io ---- system ---- ------ cpu -------- r B swpd free buff cache si so bi bo in cs us sy id wa 10 144 186164 105252 0 0 0 18 2386848 83 2 48 21 31 0 20 144 189620 105252 2386848 0 0 177 1039 1210 34 10 56 0 00 144 214324 105252 0 0 0 10 2386848 32 5 63 0 00 1071 670 144 2386848 0 0 0 189 1035 20 3 77 0 20 144 158772 105252 2386848 0 0 0 203 1065 2832 70 14 15 0 Red content indicates the CPU-related parameter PROC (ESSES) -- r: if the sequence (process r) runs in processes) the number of consecutive CPUs greater than the number of CPUs in the system indicates that the system is currently running slowly, and most processes are waiting for the CPU. if the number of r outputs is four times the number of available CPUs in the system, the system is facing a CPU shortage problem, or the CPU speed is too low. most processes in the system are waiting for the CPU, the process in the system is slow. SYSTEM -- in: the number of interrupts generated per second -- cs: The number of context switches generated per second. the greater the value, the greater the CPU time consumed by the kernel. CPU-us: when the CPU time consumed by a user process is high, it indicates that the user process consumes more CPU time. However, if the CPU usage exceeds 50% for a long time, so we should consider optimizing program algorithms or accelerating (such as PHP/PE RL)-sy: Percentage of CPU time consumed by kernel processes (when the sy value is high, it indicates that the system kernel consumes a lot of CPU resources, which is not a benign performance. we should check the cause) -wa: The percentage of CPU time consumed by IO wait when the value of wa is high, it indicates that the IO wait is serious, which may be caused by a large number of random access to the disk, disk bottlenecks may also occur (block operations ). -Id: percentage of time when the CPU is idle. if the idle time (cpu id) persists to 0 and the system time (cpu sy) is twice the user time (cpu us) the system is facing a shortage of CPU resources. solution: when the preceding problem occurs, adjust the CPU usage of the application. so that the application can use the CPU more effectively. you can also consider adding more CPUs. the CPU usage can also be combined with mpstat, ps aux top prstat-a, and other related commands to comprehensively consider the specific CPU usage, and those processes are occupying a lot of CPU time. generally, application problems may be larger. for example, some SQL statements are unreasonable. 2. sar [options] [-A] [-o file] t [n] in the command line, the n and t parameters are combined to define the sampling interval and number of times, t indicates the sampling interval, which is a required parameter. n indicates the number of samples and is optional. the default value is 1.-o file indicates The result is stored in the file in binary format. the file is not a keyword but a file name. Options is the command line option, and there are many options for the sar Command. below, only common options are listed:-A: Total of all reports. -U: CPU usage-v: process, I node, file, and lock table status. -D: Hard disk usage report. -R: Memory and swap space usage statistics. -G: Serial port I/O. -B: buffer usage. -A: file read/write status. -C: System call status. -Q: Report queue length and average system load-R: Process activity. -Y: terminal device activity. -W: System exchange activity. -X {pid | SELF | ALL}: Reports statistics of the specified process ID. the SELF keyword is the statistics of the sar process, and the ALL keyword is the statistics of ALL system processes. Analysis of CPU utilization using sar # sar-u 2 10 Linux 2.6.18-53. el5PAE (localhost. localdomain) 03/28/2009 07:40:17 pm cpu % user % nice % system % iowait % steal % idle 07:40:19 PM all 12.44 0.00 6.97 1.74 0.00 07:40:21 PM all 78.86 26.75 0.00 12.50 16.00 0.00 44.75 07:40:23 PM all 16.96 0.00 7.98 0.00 0.00 75.06 07:40:25 PM all 22.50 0.00 7.00 3.25 0.00 07:40:27 PM all 67.25 7.25 0.00 2.75 2.50 07:40:29 PM all 20.05 0.00 8.56 2.93 0.00 68.46 07:40:31 PM all 13.97 0.00 6.23 3.49 0.00 07:40:33 PM all 76.31 8.25 0.00 0.75 3.50 0.00 07:40:35 PM all 87.50 13.25 0.00 5.75 07:40:37 PM all 4.00 0.00 77.00 10.03 0.00 0.50 0.00 86.97 Average: all 15.15 0.00 5.91 3.99 0.00 the displayed content includes: % user: percentage of time when the CPU is in user mode. % Nice: Percentage of CPU time in user mode with NICE value. % System: Percentage of CPU time in system mode. % Iowait: Percentage of CPU waiting for input/output completion time. % Steal: Percentage of unconscious waiting time of the virtual CPU when the hypervisor maintains another virtual processor. % Idle: Percentage of idle CPU time. In all the displays, we should pay attention to % iowait and % idle. The value of % iowait is too high, indicating that the hard disk has an I/O bottleneck, and the value of % idle is high, indicating that the CPU is idle, if the % idle value is high but the system response is slow, it may be that the CPU is waiting for memory allocation. in this case, the memory capacity should be increased. If the value of % idle is lower than 10, the CPU processing capability of the system is relatively low, indicating that the most important resource to be solved in the system is the CPU. Analyze the queue length of running processes using sar: # sar-q 2 10 Linux 2.6.18-53. el5PAE (localhost. localdomain) 03/28/2009 07:58:14 PM runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15 07:58:16 PM 0 493 0.64 0.56 0.49 07:58:18 PM 1 491 0.64 0.56 07:58:20 PM 1 0.49 488 0.59 07:58:22 PM 0 0.55 0.49 487 0.59 07:58:24 PM 0 485 0.59 0.55 07:58:26 PM 1 0.49 483 0.78 07:58:28 PM 0 0.59 0.50 481 0.78 07:58:30 PM 1 0.59 0.50 480 8 0.50 07:58:32 PM 0 477 0.72 0.58 07:58:34 PM 0 0.50 474 0.72 0.58 Average: 0 0.50 484 0.68 0.57 runq-sz prepare the running process running queue. Plist-sz number of processes and threads in the process queue ldavg-1 previous minute average system load (load average) ldavg-5 previous five minutes average system load (load average) by the way, load avarage can be understood as the number of processes that the CPU is waiting to run per second. in Linux systems, commands such as sar-q, uptime, w, and top all have average system load and average output. what is the average system load? The average system load is defined as the average number of tasks in the queue during a specific time interval. If a process meets the following conditions, it will be in the running queue: -it does not have the result of waiting for the I/O operation-it does not take the initiative to enter the waiting state (that is, it does not call 'wait')-It is not stopped (for example, waiting for termination) for example: # uptime 20:55:40 up 24 days, :06, 1 user, load average: 8.13, 5.90, the final output content of the 4.94 command indicates the average number of processes in the queue running in the past 1, 5, and 15 minutes. Generally, as long as the number of active processes of each CPU is not greater than 3, the system performance is good. if the number of tasks of each CPU is greater than 5, it indicates that the performance of this machine has a serious problem. In the preceding example, if the system has two CPUs, the current number of tasks for each CPU is 8.13/2 = 4.065. This indicates that the system performance is acceptable. III. iostat # iostat-c 2 10 Linux 2.6.18-53. el5PAE (localhost. localdomain) 03/28/2009 avg-cpu: % user % nice % system % iowait % steal % idle 30.10 0.00 4.89 5.63 0.00 avg-cpu: % user % nice % system % iowait % steal % idle 8.46 0.00 1.74 0.25 0.00 avg-cpu: % user % nice % system % iowait % steal % idle 22.06 0.00 11.28 1.25 0.00 4. mpstat is short for Multiprocessor Statistics and is a real-time system monitoring tool. Its report and CPU statistics are stored in the/proc/stat file. In a multi-CPUs system, it can not only view the average status information of all CPUs, but also view information about specific CPUs. The following describes only the CPU-related parameters of mpstat. the syntax of mpstat is as follows: mpstat [-P {| ALL}] [internal [count] has the following meanings: parameter description-P {| ALL} indicates the CPU to be monitored. The interval between two adjacent sampling times of internal in [0, cpu count-1] is count the number of sampling times, count can only be used with delay. if there is no parameter, mpstat displays the average value of all information after the system starts. The average information of the first line since the system was started when interval exists. From the second line, the output is the average information of the previous interval period. CPU-related output meaning: the parameter explanation obtains the CPU processor ID from/proc/stat. during the internal period, the user-mode CPU time (% ), does not contain the nice value as negative process sputum ?? E /?? N? Why ?? 100 during the internal period, the nice value is the CPU time of the negative process (% )? @??? Pan /?? N? Why ?? 100 system core Time (% )???? Pan ?? /?? N? Why ?? 100 iowait hard disk I/O wait time (%) during internal period )?? N? Why? H /?? N? Why ?? 100 irq in the internal period, the soft interrupt time (% )?? E? W /?? N? Why ?? 100 soft interrupt time in internal period (% )?? N records ??? E? W /?? N? Why ?? 100 during the internal period of idle, the idle time (%) of the CPU except for the disk I/O operation due to any reason )?? Ya ??? N? Why ?? 100 intr/s: the number of interruptions received by the CPU per second in the internal period ?? '?? E /?? N? Why ?? 100 total CPU time = total_cur = user + system + nice + idle + iowait + irq + softirq total_pre = pre_user + pre_system + pre_nice + pre_idle + pre_iowait + pre_irq + alert? Pan? E = user_cur-user_pre ?? N? Why ?? In the total_cur-total_pre, _ cur represents the current value, and _ pre represents the value before the interval time. All values in the preceding table can be two decimal places. # Mpstat-p all 2 10 Linux 2.6.18-53. el5PAE (localhost. localdomain) 03/28/2009 10:07:57 CPU % user % nice % sys % iowait % irq % soft % steal % idle intr/s 10:07:59 PM all 20.75 0.00 10.50 1.50 0.25 0.25 0.00 66.75 10:07:59 PM 0 1294.50 16.00 0.00 9.00 1.50 0.00 0.00 0.00 73.50 10:07:59 PM 1 1000.50 25.76 0.00 12.12 1.52 0.00 0.51 0.00 60.10 294.00
 
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.