The Vmstat&iostat of "Linux" system

Last Update:2015-11-02 Source: Internet

Author: User

Tags disk usage

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Linux system has a performance problem, generally we can use top, iostat, free, vmstat and other commands to view the initial positioning problems.

iostat Common usage:

$iostat-D-K 1 #查看TPS和吞吐量信息
$iostat-D-x-k 1 #查看设备使用率 (%util), Response time (await)
$iostat-C 1 #查看cpu状态
Parameters

-D indicates that the device (disk) usage status is displayed;
-K Some columns that use block units are forced to use kilobytes, which can also use-M.
-X obtains more statistical information, such as Avgrq-sz Avgqu-sz await SVCTM%util.
-C Gets the CPU partial state value.
1 10 indicates that the data display is refreshed every 1 seconds and is displayed for a total of 10 times.

Use the Iostat command to view r/s (read request), w/s (write request), Avgrq-sz (average request queue Length), await (IO wait), SVCTM (number of service milliseconds, not including queue time)r/s, W/S is the number of read/write requests per second. Util is the utilization of equipment. If it is close to 100%, it usually indicates that the device is saturated (not absolutely, such as the device has write cache, of course, if it is a multi-disk, even if the%util is 100%, because of the concurrency of the disk, so disk usage may not be the bottleneck). Sometimes more than 100% may occur, which is mostly due to rounding when calculating.
SVCTM is the average service time per request. await is the average wait time for each request. This time includes the queue time and service time , that is, in general, await is greater than SVCTM, their difference is smaller, the shorter the queue time, conversely, the greater the difference, the longer the queue time, indicating that the system has a problem.

Single disk:
(r/s+w/s) * (svctm/1000) =util. Example: If the util reaches 100%, then Svctm=1000/(R/S+W/S), assuming that IOPS is 1000, then svctm about 1 milliseconds, if longer than this value, the system has a problem.

Multiple disks:
Compute the number of concurrent requests for the device service:
concurrency = (r/s+w/s) * (svctm/1000) = (AVGQU-SZ*SVCTM)/await
Understand:
1. Finish processing (R+W)/s time = (r+w)/s* (svctm/1000, units converted to seconds), and then processing Unit 1s, equivalent to a second processing so many requests, if the result is more than 11 will have concurrency, otherwise how to deal with it?
2. Processing the request in the queue requires AVGQU-SZ*SVCTM milliseconds, and the actual time to wait, but only to wait for the await time, so consider whether there is concurrency? If it is greater than 1, there must be.

Vmstat Common usage:

Vmstat-s M 1 3
-S: Displayed using the specified units. The parameters are K, K, M, M, respectively, representing 1000, 1024, 1000000, 1048576 bytes (byte). The default unit is K (1024x768 bytes)

How Virtual Memory Works

Each process running in the system needs to use memory, but not every process needs to use the system's allocated memory space every moment. When the system is running more memory than the actual physical memory, the kernel frees some or all of the physical memory that some processes occupy but unused, stores that data on disk until the next call to the process, and provides the freed memory for use by the required process.

In the Linux memory management, mainly through "paging paging" and "exchange swapping" to complete the above memory scheduling. Paging algorithm is to swap the most recently used pages in memory to disk, leaving the active page in memory for the process to use. The switching technique is to swap the entire process, not some pages, to disk.

The process of paging (page) writing to disk is called Page-out, and Paging (page) back to memory from disk is called page-in. Paging error (page Fault) occurs when the kernel needs a paging, but finds that the paging is not in physical memory (because it has been page-out).

When the system kernel discovers that it is running out of memory, it releases a portion of the physical memory through Page-out. Operating page-out is not a frequent occurrence, but if the page-out occurs frequently, the system's performance will drop sharply until the kernel manages paging more than the time the program is running. At this point the system is already running very slowly or going into a paused state, which is also known as thrashing (bump).

Field Description:
procs (process):
R: Run the number of processes in the queue and wait for the CPU to dispatch
B: Number of processes waiting for IO, for non-disruptive hibernation (usually means waiting for IO, such as disk/network/user input, etc.)
memory (RAM):
SWPD: Using virtual memory size (how many blocks have been swapped to disk)
free: Available memory size
Buff: The amount of memory used as a buffer (output)
Cache : Memory size (write) as cached
Swap:
si: Write to memory size per second from swap area
So : The amount of memory written to the swap area per second
IO: (now the size of the Linux version block is 1024bytes)
bi: Number of blocks read per second
bo: Number of blocks written per second
System:
in: Number of interrupts per second, including clock interrupts.
CS: The number of context switches per second.
CPU (expressed as a percentage):
US: User Process Execution Time
sy: System Process Execution Time
ID: Idle time (including IO wait time)
wa: Waiting for IO time

An important hint: the memory, swap area, and I/O statistics are the number of blocks instead of bytes. In Gnu/linux, the block size is 1024 bytes by default.

CPU-intensive servers typically have high values in US columns, or they can appear in the SY column, which is more than 20% disturbing.
In general, there is no need to worry about context switching, unless more than 100 000 times per second or more, a context switch occurs when the operating system stops a process and runs another process. If a non-overwritten scan index is executed, the element is first read from the index, then the page is read from the disk according to the index, and if the page is not in the operating system cache, a physical read from the disk will cause the context switch to interrupt process processing until I/O is complete.
Under IO-intensive workloads, the CPU spends a lot of time waiting for I/O requests, which means that Vmstat will show many processors in the non-interruptible hibernation (b-column) state, and the value of this column in WA will be high.

The Vmstat&iostat of "Linux" system

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More