Linux View system resources and load, and performance monitoring __linux

Source: Internet
Author: User
Tags memory usage switches time interval cpu usage nfsd


command-Related:


1, view the disk
Df-h
2, view memory size
Free
Free [-m|g] Press MB,GB to display memory
Vmstat
3, view CPU
Cat/proc/cpuinfo
Look only at the number of CPUs grep "model name"/proc/cpuinfo | Wc-l
4, View system memory
Cat/proc/meminfo
5, view the situation of each process
Cat/proc/5346/status 5347 is a PID
6, view the load
W
Uptime
7, view system overall status
Top
An explanation of some of the last output information:
Load average:0.09, 0.05, 0.01

The three numbers represent the system average load (one minute, five minutes, and 15 minutes) for different periods of time, and their number is, of course, the smaller the better. "How many cores are there to load?"

Rule: in multi-core processing, your system mean should not be higher than the total number of processor cores
The process uses memory that can be used with top, 3 columns Virt RES SHR, indicating the memory used by the process, Virt identifies the total amount of memory the process can use, including the memory that the process is actually using, the mapped files, and the memory shared by other processes. Res identifies the amount of memory that this process is actually consuming. The SHR identifies the memory and library sizes that can be shared with other processes.
8, performance monitoring SAR command
Sar-u output displays CPU information. The-u option is the default option for SAR. This output shows the CPU usage as a percentage
Cpu
CPU number
%user
The time it takes to run a process in user mode
%nice
The time it took to run the normal process
%system
Time spent running processes in kernel mode (System)
%iowait
The time the processor waits for I/O to complete without a process executing on that CPU
%idle
No time for process execution on the CPU
SAR 5 SAR obtains 10 samples in 5 second interval
Sar-u-P All 5 5 minute cup display
sar-n {DEV | Edev | NFS | NFSD | Sock | All}
SAR provides six different syntax options for displaying network information. -N option uses 6 different switches: DEV | Edev | NFS | NFSD | Sock | All. DEV Displays network interface information, Edev displays statistics about network errors, NFS client Information for NFS statistics activities, NFSD statistics Server for NFS, sock displays socket information, all displays all 5 switches. They can be used individually or together.
The meanings of sar-n DEV parameters
Iface
LAN interface
rxpck/s
Packets Received per second
txpck/s
Packets Sent per second
rxbyt/s
Number of bytes received per second
txbyt/s
Number of bytes sent per second
rxcmp/s
Compressed packets received per second
txcmp/s
Compressed packets sent per second
rxmcst/s
Multicast packets received per second
9, view command history (including timestamp)
Export histtimeformat= '%F%T '; history| More
10, view folder and file size
Du-h--max-depth=0 DM view DM directory Size
Du-h--max-depth=1 DM View DM directory size, and DM file folder size

Du-h--max-depth=0 View current folder size







Analyze related


Linux system load high how to check

Load is an important indicator of the Linux machine, which intuitively reflects the current state of the machine. If the machine load is too high, then the operation of the machine will be difficult.

Linux load is high, mainly due to CPU usage, memory usage, IO consumption of three components. Excessive use of any item will result in a sharp rise in server load.

There are a variety of commands to view the server load, W or uptime can display the load directly,

$ uptime
12:20:30 up, 21:46,  2 users,  load average:8.99, 7.55, 5.40
$ w
12:22:02 up, 21:48,  2 users,  load average:3.96, 6.28, 5.16

The load average corresponds to the last 1 minutes, 5 minutes, and 15 minutes on average.

what is load. What is load Average?

Load is the measure of how much you work on your computer (wikipedia:the system Load is a measure of the amount of work that a compute system is doing) simply the length of the process queue Degree. Load Average is the average load for a period of time (1 minutes, 5 minutes, 15 minutes)

How to determine if the system has over Load
For the general system, according to the number of CPUs to judge. If the average load is always below 1.2, and you have 2 cup machines. So basically does not have the CPU insufficient use situation. That is, the load average is less than the number of CPUs, generally based on the 15-minute load average for first .

These two commands simply reflect the load, and Linux provides a more powerful, and more practical, top command to view the server load.

$top

The tasks line shows the current total number of processes and their status, pay attention to zombie, indicating that the zombie process, not 0, indicates that there is a process problem.

The CPU (s) line shows the current CPU state, US indicates that the user process is CPU-intensive, SY indicates that the kernel process is CPU-intensive, the ID represents the free CPU percentage, and WA represents the percentage of CPU time that the IO wait takes. WA occupies more than 30% means IO pressure is very high .

The MEM line shows the current state of memory, total is the overall memory size, userd is used, free is remaining, buffers is the directory cache.

Swap line with the Mem line, cached represents the cache, the user has opened the file. if the used of swap is high, the system is low on memory .

Under the top command, press 1 to show how many CPUs the server has and how each CPU is used

Generally speaking, the reasonable load of the server is the CPU kernel number *2. In other words, for the 8-core CPU, the load within 16 indicates that the machine runs very stable and smooth. If the load is more than 16, it indicates that the server is running a certain amount of pressure.

Under the top command, press SHIFT + C to sort the process from large to small by CPU usage , pressing shift+ "P" to sort the process from large to small by memory usage , and it is easy to locate which services are consuming higher CPU and memory.

Simply having the top command is not enough because it only shows CPU and memory usage, and there is another important reason why the load is--io not clearly demonstrated. Linux provides the Iostat command to understand the cost of IO.

Enter the iostat-x 1 command to start monitoring the input and output status,x to display all parameter information, 1 to monitor every 1 seconds, 10 to monitor 10 times .

Where rsec/s is read in, WSEC/S is written every second, and these two parameters, at a particularly high time, represent a great pressure on disk io, util for IO usage, and if it's close to 100%, it says IO is running at full capacity.

View system load Vmstat

$vmstat
procs-----------memory-------------Swap-------io-----system--------CPU-----
r b swpd free buff Cache si so bi bo in CS us sy ID WA St
0 0 0 689568 121068 1397252 0 0 77 8 110 745 4 1 93 1

RThe column represents the number of processes running and waiting for a CPU time slice, and if it is longer than 1, the CPU is insufficient and needs to be increased.
bThe column represents the number of processes waiting for a resource, such as waiting for I/O, or memory exchange. CPU indicates the CPU usage state
USThe column shows the percentage of CPU time spent in the user's way. US is a high value, indicating that the user process consumes more CPU time, but if it is longer than 50%, you need to consider optimizing the user's program.
SyThe column shows the percentage of CPU time spent by the kernel process. The reference value for us + sy is 80%, and if Us+sy is greater than 80%, there may be a lack of CPU.
WAThe column shows the percentage of CPU time that the IO wait takes. The reference value for WA is 30%, and if WA exceeds 30%, the IO wait is serious, possibly due to a large amount of random access to the disk or the bandwidth bottleneck of the disk or disk access controller (mostly block operations).
IDThe column shows the percentage of time that the CPU is idle, system displays the number of interrupts that occurred during the collection interval
inThe column represents the number of device interrupts per second observed during a time interval. The CS column represents the number of context switches generated per second, such as when CS is much higher than disk I/O and network packet rates, and should be investigated further.
Memory
SWPDThe amount of memory that is switched to the memory Swap area (k). If the value of SWPD is not 0, or relatively large, such as over 100m, as long as SI, so the value of long-term 0, system performance is still normal
FreeThe current list of free pages of memory (k) Buff as buffer cache memory amount, generally to block device read and write only need to buffer. Cache: As page cache of memory, generally as a file system cache, if the cache is larger, indicating that the cache used more files, if the IO in the bi relatively small, indicating file system efficiency is better. Swap
siThe number of memory swap areas entered by memory. So from the memory swap area into the amount of memory. Io
BiThe total amount of data read from the block device (read disk) (KB per second). The total amount of data written by a Bo block device (write disk) (KB per second)
Here we set the Bi+bo reference value of 1000, if more than 1000, and the WA value should be considered balanced disk load, can be combined with iostat output to analyze


Load misunderstanding:
1: System load high must be a performance problem.
Truth: High load may be due to CPU-intensive computing
2: System load high must be CPU capacity problem or insufficient quantity.
Truth: High load only represents a plethora of queues that need to be run. But the tasks in the queue may actually be CPU-consuming or i/0 and other factors.
3: The system long-term load high, first increase the CPU
Truth: Load is just a symptom, not a substance. Increase CPU individual cases will temporarily see load drop, but the symptoms do not cure.

2: How to identify the bottleneck of the system in the case of high load average.
Low CPU, or IO not fast enough or out of memory.

2.1: View system Load Vmstat
Vmstat
procs-----------Memory-------------Swap-------io------System------CPU----
R b swpd free buff cache si so bi bo in CS US sy ID WA
0 0 100152 2436 97200 289740 0 1 34 45 99 33 0 0-99 0

The

procs
R column represents the number of processes running and waiting for a CPU time slice, and if it is longer than 1, it indicates that there is not enough CPU to increase the CPU. The
B column represents the number of processes waiting for a resource, such as waiting for I/O, or memory exchange. The
CPU indicates the use state of the CPU
US column shows the percentage of CPU time spent in user mode. US is a high value, indicating that the user process consumes more CPU time, but if it is longer than 50%, you need to consider optimizing the user's program. The
Sy column shows the percentage of CPU time spent by the kernel process. The reference value for us + sy is 80%, and if Us+sy is greater than 80%, there may be a lack of CPU. The
WA column shows the percentage of CPU time that the IO wait takes. The reference value for WA is 30%, and if WA exceeds 30%, the IO wait is serious, possibly due to a large amount of random access to the disk or the bandwidth bottleneck of the disk or disk access controller (mostly block operations). The
ID column shows the percentage of time the CPU is idle
system displays the number of interrupts that occurred during the collection interval
in column represents the number of device interrupts per second observed during a time interval. The
CS column represents the number of context switches generated per second, such as when CS is much higher than disk I/O and network packet rates, and should be investigated further. The amount of memory that the
memory
SWPD switches to the memory Swap area (k). If the value of SWPD is not 0, or larger, such as more than 100m, as long as SI, so the value of long-term 0, system performance or normal
Free the current list of idle pages of memory (k)
Buff as buffer cache memory quantity, Generally, the reading and writing of block devices need to be buffered.
Cache: As page cache memory number, generally as a file system cache, if the cache is larger, indicating that the cache file more, if the IO in the bi is relatively small, indicating file system efficiency is better.
Swap
Si is the number of memory-swap areas entered by memory.
So the memory swap area enters the amount of memory.
IO
Bi reads the total amount of data (read disk) (KB per second) from a block device.
The total amount of data written by a Bo block device (KB)
Here we set the Bi+bo reference value of 1000, if more than 1000, and the WA value is larger should consider balanced disk load, can be combined with iostat output to analyze.

2.2: View disk load Iostat
The disk IO information is counted every 2 seconds until you press CTRL + C to terminate the program, the-D option indicates the statistics disk information, the-K representation is in kilobytes per second,-T requires time information to print out, and 2 indicates output every 2 seconds. The first output of the disk IO load condition provides statistics about the time since the system started. Each subsequent output is the average IO load status between each interval.

# iostat-x 1 10
Linux 2.6.18-92.el5xen 02/03/2009
AVG-CPU:%user%nice%system%iowait%steal%idle
1.10 0.00 4.82 39.54 0.07 54.46
device:rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await SVCTM%util
SDA 0.00 3.50 0.40 2.50 5.60 48.00 18.48 0.00 0.97 0.97 0.28
SDB 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
SDC 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
SDD 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
SDE 0.00 0.10 0.30 0.20 2.40 2.40 9.60 0.00 1.60 1.60 0.08
SDF 17.40 0.50 102.00 0.20 12095.20 5.60 118.40 0.70 6.81 2.09 21.36
SDG 232.40 1.90 379.70 0.50 76451.20 19.20 201.13 4.94 13.78 2.45 93.16
RRQM/S: Number of read operations per second for merge. Delta (rmerge)/s
wrqm/s: Number of write operations per second for merge. Delta (wmerge)/s
R/S: Number of Read I/O devices completed per second. Delta (RIO)/s
W/S: Number of write I/O devices completed per second. Delta (WIO)/s
RSEC/S: Number of sectors read per second. Delta (rsect)/s
WSEC/S: Number of sector writes per second. Delta (wsect)/s
RKB/S: The number of K bytes read per second. Is half the rsect/s, because each sector size is 512 bytes. (Need to calculate)
WKB/S: The number of K bytes written per second. is half the wsect/s. (Need to calculate)
Avgrq-sz: The average data size (sector) per device I/O operation. Delta (rsect+wsect)/delta (Rio+wio)
Avgqu-sz: Average I/O queue length. The Delta (AVEQ)/s/1000 (because the Aveq unit is in milliseconds).
Await: The average wait time (in milliseconds) for each device I/O operation. Delta (ruse+wuse)/delta (Rio+wio)
SVCTM: Average service time (in milliseconds) per device I/O operation. Delta (use)/delta (RIO+WIO)
%util: How much time is spent in a second for I/O operations, or how many times in a second I/O queues are non-empty. The delta (use)/s/1000 (because the unit of use is in milliseconds)

If the%util is close to 100%, which indicates that there are too many I/O requests, the I/O system is full load, the disk
There may be a bottleneck.
Idle less than 70% io pressure is larger, the general reading speed has more wait.

You can also combine vmstat to view the B parameter (the number of processes waiting for the resource) and the WA parameter (percentage of CPU time spent on Io wait, high io pressure above 30%)

In addition, you can also refer to
So so:
SVCTM < await (because the waiting time for the waiting request is repeatedly computed),
The size of the SVCTM is generally related to disk performance: cpu/memory load can also have an impact on it, too many requests will indirectly lead to SVCTM increase.
The size of the await:await generally depends on the service time (SVCTM) and the length of the I/O queue and the emit mode of I/O requests.
If the SVCTM is closer to await, there is almost no wait time for I/O;
If the await is much larger than the SVCTM, the I/O queue is too long and the response time is slower to apply.
If the response time exceeds the user's allowable range, consider replacing a faster disk, adjusting the kernel elevator algorithm, optimizing the application, or upgrading the CPU.
Queue Length (AVGQU-SZ) can also be used as an indicator of system I/O load, but since Avgqu-sz is average per unit time, it does not reflect instantaneous I/O floods.





Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.