IO Monitoring and Analysis in Centos

Source: Internet
Author: User

I am going to have a Linux I/O training in the company in the near future. I will sort out the relevant materials and share them with you.

Position of various I/O monitoring tools in the Linux I/O Architecture

From Linux Performance and Tuning Guidelines.pdf

1 system-level IO monitoring iostat

Iostat-xdm 1 # personal habits

% Util indicates the disk busy. 100% indicates that the disk is busy, and 0% indicates that the disk is idle. However, note that busy disks do not mean high disk (bandwidth) Utilization

The size of the I/O request that argrq-sz submits to the driver layer, which is generally no less than 4 K and not greater than max (readahead_kb, max_sectors_kb)

It can be used to determine the current IO mode. Generally, when the disk is busy, the larger the value indicates the order, and the smaller the value indicates the random

The service time of an I/O Request by svctm. When a single disk is completely read randomly, it is about 7 ms, which is not only the seek + rotation Delay Time


Note: Relationship between statistics

========================================================

% Util = (r/s + w/s) * svctm/1000 # queue length = arrival rate * Average service time
Avgrq-sz = (RMB/s + wMB/s) * 2048/(r/s + w/s) #2048 is 1 M/512

========================================================

Summary:

Iostat calculates the IO data submitted directly to the device after the common block layer is merged (rrqm/s, wrqm/s), which can reflect the overall IO status of the system, however, there are two disadvantages:

1. It is far away from the business layer and does not correspond to the write and read operations in the Code (it is difficult to match the system pre-read + pagecache + IO Scheduling Algorithm)

2. It is a system level and cannot be accurate to processes. For example, you can only tell you that the disk is busy, but you cannot tell you who is busy or what is busy?

2 process-level IO monitoring iotop and pidstat (only rhel6u Series)

As the name suggests, iotop

Pidstat, as its name implies, counts the stat of a process (pid). The stat of a process naturally includes the IO status of the process.

Both of these commands can calculate the IO status by process, so you can answer the following two questions:

    1. Which processes in the current system are occupying I/O, and what is the percentage?

    2. What processes are reading IO? Still writing? What is the read/write volume?

There are many pidstat parameters, which only give a few personal habits

Pidstat-d 1 # show only IO

Pidstat-u-r-d-t 1 #-d IO information,

#-R page missing and memory information
#-U CPU usage
#-T takes the thread as the statistical unit
#1 second statistics

Iotop, which is very simple. Just press the command


Block_dump, iodump

Iotop and pidstat are used very well, but both depend on the statistical information exported from the/proc/pid/io file. This is not available for earlier kernels, such as rhel5u2.

Therefore, we had to replace the above two commands for the poor:

Echo 1>/proc/sys/vm/block_dump # enable block_dump and input the io information to dmesg.

# Source code: submit_bio@ll_rw_blk.c: 3213

Watch-n 1 "dmesg-c | grep-oP \" \ w + \ (\ d + \): (WRITE | READ) \ "| sort | uniq-c"

# Non-stop dmesg-c

Echo 0>/proc/sys/vm/block_dump # disable when not in use


You can also use the off-the-shelf script iodump, see http://code.google.com/p/maatkit/source/browse/trunk/util/iodump? R = 5389.


Iotop. stp

The systemtap script is a copy of the iotop command by the poor. You need to install Systemtap, which outputs information every 5 seconds by default.

Stap iotop. stp # examples/io/iotop. stp

Summary

Process-level IO monitoring,

  1. You can answer two questions that system-level IO monitoring cannot answer.

  2. Relatively close to the business layer (for example, you can count the read/write volume of processes)

However, there is no way to associate with the read and write operations on the business layer, and the granularity is coarse. There is no way to tell you which files are read and written by the current process? Time consumed? Size?

3. Business-level IO monitoring ioprofile

The ioprofile command is essentially lsof + strace, specific download visible http://code.google.com/p/maatkit/

Ioprofile can answer the following three questions:

1. What files (read and write) have been read and written by the current process at the business level within a certain period of time )?

2. What is the number of reads and writes? (Number of read and write calls)

3. What is the read/write data volume? (Number of bytes of read and write)

Assume that an action triggers an IO action of the program. For example, if A page is clicked, the background reads files A, B, and C"

========================================================== ====

./Io_event # assume that I/O behavior is simulated, and file A is read once, file B is read 500 times, and file C is read 500 times.

Ioprofile-p 'pidof io_event '-c count # read/write count

Ioprofile-p 'pidof io_event '-c times # read/write duration


Ioprofile-p 'pidof io_event '-c sizes # read/write size


Note: ioprofile only supports multi-threaded programs and does not support single-threaded programs. For IO business-level analysis of single-threaded programs, strace is sufficient.

Summary:

Ioprofile is essentially strace, so we can see the call track of read and write, and perform io analysis at the business layer (mmap is powerless)

4. File-level IO monitoring

File-level IO monitoring can be used with/Supplemented with "business-level and process-level" IO Analysis

File-Level I/O analysis mainly targets a single file and answers which processes are currently performing read/write operations on a file.

1 lsof or ls/proc/pid/fd

2 inodewatch. stp

Lsof tells you which processes open the current file

Lsof ../io # The io directory is currently opened by the bash and lsof processes.

The lsof command can only answer static information, and "open" is not necessarily "read". For commands such as cat and echo, opening and reading are instantaneous and it is difficult for lsof to capture

Inodewatch. stp can be used to compensate

Stap inodewatch. stp major minor inode # master device number, auxiliary device number, file inode node number

Stap inodewatch. stp 0xfd 0x00 523170 # master device number, auxiliary device number, and inode number, which can be obtained through the stat command

5 IO Simulator

Iotest. py # See Appendix

Developers can use ioprofile (or strace) for detailed analysis of the system's IO path, and then optimize the program.

However, it usually takes a lot of money to adjust the program, especially when you are not sure whether the modification scheme can be effective, it is best to have a simulated way to quickly verify.

Take our business as an example. When we find a query, the system's IO access mode is as follows:

Access File A once

Access File B 500 times, 16 bytes each time, with an average interval of 502 KB

500 accesses to the C file, 200 bytes each time, with an average interval of 4 MB

Here, files B and C are staggered

1 first access B, read 16 bytes,

2. Access C and read 200 bytes,

3. Return to B, skip 502K, and then read 16 bytes,

4. Return to C, jump 4 MB, and then read 200 bytes.

5 repeat 500 times

The strace file is as follows:

A simple idea is to change B and C to batch read B and then batch read C. Therefore, adjust the strace file as follows:

Assign the adjusted strace file to iotest. py as the input. iotest. py simulates the corresponding IO according to the access mode in the strace file.

Iotest. py-s io. strace-f fmap

Fmap is a ing file, which maps fd, such as 222,333 in strace, to the actual file.

======================================

111 =/opt/work/io/A. data
222 =/opt/work/io/B. data
333 =/opt/work/io/C. data
======================================

6. disk fragmentation

One sentence: as long as the disk capacity is less than 80% for years, there is basically no need to worry about fragmentation issues.

If you are worried, use the defrag script.

7. Other IO-related commands

Blockdev Series

========================================================

Blockdev -- getbsz/dev/sdc1 # view the block size of the sdc1 Disk

Block blockdev -- getra/dev/sdc1 # view the pre-read (readahead_kb) Size of the sdc1 Disk

Blockdev -- setra 256/dev/sdc1 # Set the pre-read size (readahead_kb) of the sdc1 disk. The low version of the kernel is set through/sys and sometimes fails, which is less reliable than blockdev.

========================================================

Appendix iotest. py

#! /Usr/bin/env python #-*-coding: gbk-*-import osimport reimport timeit from ctypes import CDLL, create_string_buffer, c_ulong, c_longlongfrom optparse import OptionParserusage = ''' % prog-s strace. log-f fileno. map ''' _ glibc = None_glibc_pread = None_c_char_buf = None_open_file = [] def getlines (filename): _ lines = [] with open (filename, 'R') as _ f: for line in _ f: if line. strip ()! = "": _ Lines. append (line. strip () return _ lines def parsew.line (): parser = OptionParser (usage) parser. add_option ("-s", "-- strace", dest = "strace_filename", help = "strace file", metavar = "FILE") parser. add_option ("-f", "-- fileno", dest = "fileno_filename", help = "fileno file", metavar = "FILE") (options, args) = parser. parse_args () if options. strace_filename is None: parser. error ("strace is not specified. ") Ifnot OS. path. exists (options. strace_filename): parser. error ("strace file does not exist. ") if options. fileno_filename is None: parser. error ("fileno is not specified. ") ifnot OS. path. exists (options. strace_filename): parser. error ("fileno file does not exist. ") return options. strace_filename, options. fileno_filename # [type,...] # [pread, fno, count, offset] # pread (15, "", 4348,140 156928) def pars E_strace (filename): lines = getlines (filename) action = [] _ regex_str = R' (pread | pread64) [^ \ d] * (\ d + ), \ s * [^,] *, \ s * ([\ dkKmM * + \-.] *), \ s * ([\ dkKmM * + \-.] *) 'for I in lines: _ match = re. match (_ regex_str, I) if _ match is None: continue # Skip invalid rows _ type, _ fn, _ count, _ off = _ match. group (1), _ match. group (2), _ match. group (3), _ match. group (4) _ off = _ off. replace ('k', "* 1024 "). replace ('k', "* 1024 "). replace ('M', "* 1048576 "). replace ('M', "* 1048576") _ count = _ count. replace ('k', "* 1024 "). replace ('k', "* 1024 "). replace ('M', "* 1048576 "). replace ('M', "* 1048576") # print _ off action. append ([_ type, _ fn, str (int (eval (_ count), str (int (eval (_ off)]) return action def parse_fileno (filename): lines = getlines (filename) fmap ={} for I in lines: if I. strip (). startswith ("#"): continue # comment row _ spl It = [j. strip () for j in I. split ("=")] if len (_ split )! = 2: continue # invalid row fno, fname = _ split [0], _ split [1] fmap [fno] = fname return fmap def simulate_before (strace, fmap ): global _ open_file, _ c_char_buf rfmap ={} for I in fmap. values (): _ f = open (I, "r + B") # print "open {0 }:{ 1 }". format (_ f. fileno (), I) _ open_file.append (_ f) rfmap [I] = str (_ f. fileno () # reverse ing to_read = 4*1024 # default 4 K buffor I in strace: I [1] = rfmap [fmap [I [1] # fid-> fname-> fid ing conversion to_read = max (to_read, int (I [2]) # print "read buffer len: % d Byte" % to_read _ c_char_buf = create_string_buffer (to_read) def simulate_after (): global _ open_file for _ f in _ open_file: _ f. close () def simulate (actions): # timeit. time. sleep (10) # rest for 2 seconds so that the IO interval start = timeit. time. time () for act in actions: _ simulate _ (act) finish = timeit. time. time () return finish-start def _ simulate _ (act): global _ glibc, _ glibc_pread, _ c_char_buf if "pread" in act [0]: _ fno = int (act [1]) _ buf = _ c_char_buf _ count = c_ulong (int (act [2]) _ off = c_longlong (int (act [3]) _ glibc_pread (_ fno, _ buf, _ count, _ off) # print _ glibc. time (None) else: passpassdef loadlibc (): global _ glibc, _ glibc_pread _ glibc = CDLL ("libc. so.6 ") _ glibc_pread = _ glibc. pread64 if _ name _ = "_ main _": _ strace, _ fileno = parsesponline () # parse the command line parameter loadlibc () # loading dynamic library _ action = parse_strace (_ strace) # parsing action file _ fmap = parse_fileno (_ fileno) # parsing file name ing file simulate_before (_ action, _ fmap) # preprocessing # print "total io operate: % d" % (len (_ action) # for act in _ action: print "". join (act) print "% f" % simulate (_ action)


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.