Disks are usually the slowest subsystem of the computer and are the most prone to performance bottlenecks because the disk is farthest from the CPU and the CPU accesses the disk to involve mechanical operations such as rotating shafts, track-seeking, etc. The speed difference between accessing the hard disk and accessing the memory is calculated in order of magnitude, just like the difference between 1 days and 1 minutes. To monitor IO performance, it is important to understand the fundamentals and how Linux handles IO between the hard disk and the memory.
Disks are usually the slowest subsystem of the computer and are the most prone to performance bottlenecks because the disk is farthest from the CPU and the CPU accesses the disk to involve mechanical operations such as rotating shafts, track-seeking, etc. The speed difference between accessing the hard disk and accessing the memory is calculated in order of magnitude, just like the difference between 1 days and 1 minutes. To monitor IO performance, it is important to understand the fundamentals and how Linux handles IO between the hard disk and the memory.
Memory pages
Previous Linux performance monitoring: memory mentions that the IO between the RAM and the hard disk is in pages, and the size of the 1 page on the Linux system is 4K. You can view the default page size of the system with the following command:
$/usr/bin/time-v Date ... Page size (bytes): 4096 ...
Missing pages Interrupt
Linux uses virtual memory to greatly expand the program address space, so that the original physical memory can not tolerate the program also through the memory and the hard disk of the constant exchange (the temporarily unused memory pages to the hard disk, the memory pages needed to read from the hard disk memory) to win more memory, It looks just like the physical memory is enlarged. In fact, the process is completely transparent to the program, the program completely ignores what part of itself, when swapped into memory, everything has the kernel of virtual memory management to complete. When the program starts, the Linux kernel checks the CPU's cache and physical memory first, if the data is already in memory, if the data is not in memory, it causes a Page Fault, and then reads the pages from the hard disk and caches the pages in the physical memory. The page break can be divided into main page fault (Major pages Fault) and the secondary page fault (Minor pages Fault), the interrupt to read the data from the disk is the primary page fault, the data has been read into the memory and cached. Interrupts that are generated from memory buffers instead of reading data directly from the hard disk are secondary page faults.
The above memory buffer plays the role of the pre-read hard disk, the kernel first in the physical memory to look for page faults, no words to produce a fault from the memory cache to find, if you have not found the words from the hard disk read. Obviously, the extra memory is taken out into the memory buffer to improve the speed of access, there is a problem of hit ratio, if the luck of every time the pages can be read from the memory buffer, it will greatly improve performance. A simple way to increase the hit rate is to increase the memory buffer area, the larger the buffer, the more pages are stored, the higher the hit rate will be. The following time command can be used to see how many Main page and second page faults were generated when a program first started:
$/usr/bin/time-v Date ... Major (requiring I/O) page Faults:1minor (reclaiming a frame) page faults:260 ...
File Buffer Cache
Reading a page from the memory buffer above (also called the file buffer cache) is much faster than reading the page from the hard disk, so the Linux kernel wants to have as many pages as possible to interrupt (read from the file buffer), and to avoid the loss of main pages (read from the hard disk) as much as possible. In this way, with the increase in the number of page faults, the file buffers gradually increase, until the system only a small amount of available physical memory when Linux began to release some unused pages. When we run Linux for a while, we will find that although there are not many programs running on the system, but the available memory is always very low, which gives you the illusion that Linux is inefficient in memory management, in fact, Linux uses the physical memory which is temporarily unused to do the pre-storage (memory buffer). The following prints the physical memory and file buffers on a Sun server in Vpsee:
$ cat/proc/meminfomemtotal: 8182776 kbmemfree: 3053808 kbbuffers: 342704 kbcached: 3972748 KB
This server has a total of 8GB of physical memory (memtotal), about 3GB usable memory (Memfree), about 343MB to do disk cache (buffers), 4GB around to do the file buffer (Cached), it can be seen that Linux really use a lot of physical memory to do Cache, and this cache can grow.
Page type
There are three types of memory pages in Linux:
- Read pages, page-only (or code pages), pages that are interrupted from the hard drive through the main pages, including static files, executables, library files, etc. that cannot be modified. When the kernel needs them to read them into memory, when the memory is low, the kernel frees them to the free list, and when the program needs them again, it needs to read the memory again through the fault pages.
- Dirty pages, dirty pages, refers to pages of data that have been modified in memory, such as text files. These files are synchronized to the hard disk by Pdflush, and the data is written back to the hard disk and the memory is freed by KSWAPD and Pdflush when the memory is low.
- Anonymous pages, anonymous pages, which belong to a process but are not associated with any files, cannot be synced to the hard disk, and the KSWAPD is responsible for writing them to the swap partition and freeing the memory when the memory is low.
IO ' s Per Second (IOPS)
Each disk IO request takes a certain amount of time, and this wait time is simply unbearable compared to accessing memory. On a 2001-year typical 1GHz PC, disk random access to a word requires 8,000,000 nanosec = 8 millisec, sequential access to a word requires a nanosec, while accessing a word from memory requires only ten NA Nosec. (Data from:Teach yourself programming in Ten years) This hard drive can provide 125 IOPS (+ MS/8 ms).
Sequential io and random IO
IO can be divided into sequential io and random io Two, before performance monitoring needs to clarify whether the system is biased to sequential IO applications or random IO applications. Sequential IO refers to the simultaneous request of large amounts of data, such as database execution of a large number of queries, streaming media services, sequential IO can quickly move large amounts of data. You can evaluate the performance of IOPS by dividing the number of read/write Io bytes per second by the number of read and write IOPS per second, rkb/s divided by r/s,wkb/s divided by w/s. The following shows the 2-second IO case, which indicates that the data written for each IO is incremented (45060.00/99.00 = 455.15 kb per io,54272.00/112.00 = 484.57 KB per IO). In the case of relative random io, sequential IO should pay more attention to the throughput capability of each IO (KB per io):
$ iostat-kx 1avg-cpu: %user %nice%system%iowait%steal%idle 0.00 0.00 2.50 25.25 0.00 72.25Device: rrqm/s wrqm/s r/s w/s rkb/s wkb/s Avgrq-sz Avgqu-sz await svctm %utilsdb 24.00 19995.00 29.00 99.00 4228.00 45060.00 770.12 45.01 539.65 7.80 99.80avg-cpu: %user %nice%system%iowait%steal% Idle 0.00 0.00 1.00 30.67 0.00 68.33Device: rrqm/s wrqm/s r/s w/s rkb/s wkb/s avgrq-sz avgqu-sz await svctm %utilsdb 3.00 12235.00 3.00 112.00 768.00 54272.00 957.22 144.85 576.44 8.70 100.10
Random IO refers to the random request data, its IO speed does not depend on the size and arrangement of the data, depending on the number of times per second of the disk, such as WEB services, Mail services, such as the data of each request is very small, random io per second and there will be more requests generated, so how many times per second the disk is the key.
$ iostat-kx 1avg-cpu: %user %nice%system%iowait%steal%idle 1.75 0.00 0.75 0.25 0.00 97.26Device: rrqm/s wrqm/s r/s w/s rkb/s wkb/s avgrq-sz Avgqu-sz await SVCTM %utilsdb 0.00 52.00 0.00 57.00 0.00 436.00 15.30 0.03 0.54 0.23 1.30avg-cpu: %user %nice%system%iowait%steal%idle 1.75 0.00 0.75 0.25 0.00 97.24Device: rrqm/s wrqm/s r/s w/s rkb/s wkb/s avgrq-sz avgqu-sz await svctm %utilsdb 0.00 56.44 0.00 66.34 0.00 491.09 14.81 0.04 0.54 0.19 1.29
According to the above formula: 436.00/57.00 = 7.65 kb per io,491.09/66.34 = 7.40 kb per IO. Compared with sequential Io, the KB per IO of the random Io is small enough to be negligible, and it is visible that the number of IOPS per second is important for random IO, rather than the throughput capacity (KB per io) of each IO.
SWAP
Swap devices are used when the system does not have enough physical memory to handle all requests, a swap device can be a file, or it can be a disk partition. But be careful, the cost of using swap is very high. If the system does not have physical memory available, it will be frequently swapping, if the swap device and the program is about to access the data on the same file system, it will encounter serious IO problems, eventually causing the whole system to slow down, or even crash. The swapping state between swap devices and memory is an important reference for determining the performance of Linux systems, and we already have many tools to monitor swap and swapping situations such as top, cat/proc/meminfo, Vmstat, etc.:
$ cat/proc/meminfomemtotal: 8182776 kbmemfree: 2125476 kbbuffers: 347952 kbcached: 4892024 Kbswapcached: ... Swaptotal: 4096564 kbswapfree: 4096424 kb...$ vmstat 1procs-----------memory-------------Swap-------io---- --system-------CPU------R B swpd free buff cache si so bi bo In CS US sy ID WA St 1 2 260008 2188 144 6824 11824 2584 12664 2584 1347 1174 >0 0 2 1 262140 2964 5852 24912 17304 24952 17304 4737 2341 0 0 4
Original: http://www.vpsee.com/2009/11/linux-system-performance-monitoring-io/
Linux performance monitoring: Disk IO Chapter