Linux performance monitoring: disk I/O

Last Update:2014-05-14 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

A disk is usually the slowest subsystem of a computer, and it is also the most prone to performance bottlenecks, because the disk is the farthest from the CPU and the CPU access to the disk involves mechanical operations, such as rotating shafts and track searching. The speed difference between hard disk access and memory access is calculated by an order of magnitude, just like the difference between one day and one minute. To monitor IO performance, it is necessary to understand the basic principle and how Linux processes IO between hard disk and memory.

Memory page

Last Linux performance monitoring: Memory mentioned that the I/O between Memory and hard disk is measured in pages. The size of one page on Linux is 4 kB. Run the following command to view the default page size:

$/Usr/bin/time-v date

...

Page size (bytes): 4096

...

Page disconnection

Linux uses virtual memory to greatly expand the program address space, so that the original physical memory can not be stored in the program can also be constantly exchanged between the memory and the hard disk (swap memory pages that are not currently used to the hard disk, and read the memory pages from the hard disk to the memory) to win more memory, it looks like the physical memory is expanded. In fact, this process is completely transparent to the program. The program does not need to care about what part of the program itself is, when it is exchanged into the memory, and everything is completed through the virtual memory management of the kernel. When the program starts, the Linux kernel first checks the CPU cache and physical memory. if the data is already in the memory, if the data is not in the memory, a Page failure occurs. then, the system reads the missing pages from the hard disk and caches the missing pages to the physical memory. Page-missing interruptions can be divided into Major Page Fault and Minor Page Fault. the interruption caused by reading data from the disk is a Page-missing master interruption; data has been read into the memory and cached. the interruption caused by reading data from the memory cache rather than from the hard disk is a page-missing interruption.

The memory cache area above plays the role of a pre-read hard disk. the kernel first searches for missing pages in the physical memory. if there is no such page missing interruption, it will be searched from the memory cache, if not, read from the hard disk. Obviously, taking out excess memory into a memory cache improves the access speed. There is also a hit rate problem, if you are lucky, it will greatly improve the performance if you can read from the memory cache every time you miss a page. An easy way to increase the hit rate is to increase the memory cache area. The larger the cache area, the more pages are pre-stored, and the higher the hit rate. The following time command can be used to check the number of master page missing interruptions and page missing interruptions that occur when a program is started for the first time:

$/Usr/bin/time-v date

...

Major (requiring I/O) page faults: 1

Minor (reclaiming a frame) page faults: 260

...

File Buffer Cache

Reading pages from the above memory Cache (also called File Buffer Cache) is much faster than reading pages from the hard disk, therefore, the Linux kernel is expected to generate as many page-missing interruptions as possible (read from the file cache), and avoid page-missing master interruptions (read from the hard disk) as much as possible. as the number of page-missing interruptions increases, the file cache area also increases gradually until the system only releases some unused pages when a small amount of physical memory is available. After running Linux for a period of time, we will find that although there are not many programs running on the system, the available memory is always very small, which creates the illusion that Linux is inefficient at memory management, in fact, Linux uses unused physical memory for pre-storage (memory cache. The following figure shows the physical memory and file cache on a Sun server of VPSee:

$ Cat/proc/meminfo

MemTotal: 8182776 kB

MemFree: 3053808 kB

Buffers': 342704 kB

Cached: 3972748 kB

This server has a total of 8 GB of physical memory (MemTotal), about 3 GB of available memory (MemFree), and about 343 MB for disk cache (Buffers ), around 4 GB is used for file Cache (Cached). It can be seen that Linux uses a lot of physical memory for Cache, and this Cache area can continue to grow.

Page type

There are three types of memory pages in Linux:

Read pages, Read-only pages (or code pages), pages that are Read from the hard disk through the master missing pages, including static files that cannot be modified, executable files, library files, and so on. When the kernel needs them, it reads them into the memory. when the memory is insufficient, the kernel releases them to the idle list, when the program needs them again, it needs to read the memory again through page missing interruption.

Dirty pages, Dirty pages, refers to the data pages that have been modified in the memory, such as text files. These files are synchronized to the hard disk by pdflush. when the memory is insufficient, kswapd and pdflush write the data back to the hard disk and release the memory.

Anonymous pages, Anonymous pages, which belong to a process but are not associated with any files, cannot be synchronized to the hard disk, when the memory is insufficient, kswapd is responsible for writing them to the swap partition and releasing the memory.

IO's Per Second (IOPS)

Each disk IO request takes a certain amount of time, which is simply unbearable compared with the access memory. On a typical 1 GHz PC in 8,000,000, 200, nanosec = 8 millisec is required for random access to a word on a disk. nanosec is required for sequential access to a word; accessing a word from the memory requires only 10 nanosec. (Data from: Teach Yourself Programming in Ten Years) this hard disk can provide 125 IOPS (1000 MS/8 MS ).

Sequential IO and random IO

IO can be divided into sequential IO and random IO. before performance monitoring, you need to find out whether the system is biased towards sequential IO or random IO. Sequential IO refers to requests for a large amount of data at the same time. for example, the database executes a large number of queries and streaming media services. sequential IO can quickly move a large amount of data at the same time. The IOPS performance can be evaluated by dividing the number of read/write IO bytes per second by the number of read/write IOPS per second, rkB/s by r/s, and wkB/s by w/s. the following shows the I/O status for 2 consecutive seconds. it can be seen that the data written for each IO increases (45060.00/99.00 = 455.15 KB per IO, 54272.00/112.00 = 484.57 KB per IO ). Compared with random IO, sequential IO should pay more attention to the throughput of each IO (KB per IO ):

$ Iostat-kx 1

Avg-cpu: % user % nice % system % iowait % steal % idle

0.00 0.00 2.50 25.25 0.00 72.25

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm % util

Sdb 24.00 19995.00 29.00 99.00 4228.00 45060.00 770.12 45.01 539.65 7.80 99.80

Avg-cpu: % user % nice % system % iowait % steal % idle

0.00 0.00 1.00 30.67 0.00 68.33

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm % util

Sdb 3.00 12235.00 3.00 112.00 768.00 54272.00 957.22 144.85 576.44 8.70 100.10

Random I/O refers to the random request data. the I/O speed depends on the size and arrangement of the data and the number of I/O operations per second on the disk, for example, the data of each request such as Web service and Mail service is very small, and random IO generates more requests per second at the same time. Therefore, the number of disk I/O operations per second is critical.

$ Iostat-kx 1

Avg-cpu: % user % nice % system % iowait % steal % idle

1.75 0.00 0.75 0.25 0.00 97.26

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm % util

Sdb 0.00 52.00 0.00 57.00 0.00 436.00 15.30 0.03 0.54 0.23 1.30

Avg-cpu: % user % nice % system % iowait % steal % idle

1.75 0.00 0.75 0.25 0.00 97.24

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm % util

Sdb 0.00 56.44 0.00 66.34 0.00 491.09 14.81 0.04 0.54 0.19 1.29

According to the formula above, we can conclude that 436.00/57.00 = 7.65 KB per IO, 491.09/66.34 = 7.40 KB per IO. compared with sequential IO, it is found that the random io kb per IO is small to negligible. it can be seen that for random IO, what is important is the number of IOPS per second, instead of the throughput of each IO (KB per IO ).

SWAP

The swap device is used when the system does not have enough physical memory to handle all requests. the swap device can be a file or a disk partition. However, be careful that the cost of using swap is very high. If the system does not have physical memory available, swapping will occur frequently. if the data that the swap device and program are accessing is on the same file system, it will encounter a serious IO problem, eventually, the entire system slows down or even crashes. Swapping between the swap device and memory is an important reference for determining the performance of the Linux system. we already have many tools to monitor swap and swapping conditions, such: top, cat/proc/meminfo, vmstat, etc:

$ Cat/proc/meminfo

MemTotal: 8182776 kB

MemFree: 2125476 kB

Buffers': 347952 kB

Cached: 4892024 kB

SwapCached: 112 kB

...

SwapTotal: 4096564 kB

SwapFree: 4096424 kB

...

$ Vmstat 1

Procs ----------- memory ---------- --- swap -- ----- io ---- system -- ----- cpu ------

R B swpd free buff cache si so bi bo in cs us sy id wa st

1 2 260008 2188 144 6824 11824 2584 12664 2584 1347 1174 14 0 86 0

2 1 262140 2964 128 5852 24912 17304 24952 17304 4737 2341 86 10 0 0 4

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Linux performance monitoring: disk I/O

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Linux performance monitoring: disk I/O

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support