IO monitoring and analysis under CentOS

Source: Internet
Author: User
Tags systemtap dmesg

Recently in the company to do a Linux IO training, to collate the information at hand to share the next

Location of various IO monitoring tools in the Linux IO architecture

From Linux performance and Tuning Guidelines.pdf

1 system level IO monitoring Iostat

Iostat-xdm 1 # Personal habits

%util indicates how busy the disk is. 100% indicates that the disk is busy and 0% indicates that the disk is idle. Note, however, that disk busy does not represent high disk (bandwidth) utilization

Argrq-sz submitted to the drive layer IO request size, generally not less than 4 K, not greater than Max (readahead_kb, max_sectors_kb)

Can be used to determine the current IO mode, in general, especially when the disk is busy, the larger the order, the smaller it represents the random

SVCTM the service time of an IO request, for a single disk, completely random read, basically around 7ms, both seek + rotation delay time

Note: The relationship between the statistics

=======================================

%util = (r/s + w/s) * svctm/1000 # Queue Length = arrival Rate * Average service time
Avgrq-sz = (rmb/s + wmb/s) * 2048/(r/s + w/s) # 2048 for 1m/512

=======================================

Summarize:

Iostat Statistics is the general block layer after merging (rrqm/s, wrqm/s), directly to the device to submit the IO data, can reflect the overall system IO status, but there are the following 2 disadvantages:

1 is far from the business layer, and the code in the Write,read does not correspond to (due to the system pre-read + Pagecache + IO scheduling algorithm and other factors, it is difficult to correspond)

2 is the system level, can not be accurate to the process, such as only to tell you that the disk is busy now, but there is no way to tell you who is busy, what is busy?

2 process level IO monitoring iotop and Pidstat (rhel6u series only)

Iotop as the name implies, the IO version of the top

Pidstat as the name implies, statistical process (PID) stat, Process Stat Naturally includes the IO status of the process

Both commands can count the IO status by process, so you can answer the following two questions

      1. Which processes in the current system occupy Io, and what percentage is it?

      2. Is the process that consumes IO read? or write? What is the read and write volume?

Pidstat a lot of parameters, give only a few personal habits

Pidstat-d 1 #只显示IO

Pidstat-u-r-d-T 1 #-D IO Information,

#-R pages and memory information
#-U CPU Usage
#-T with threads as the statistical unit
# 1 1 seconds to count

Iotop, it's simple, just hit the command.

Block_dump, Iodump

Iotop and Pidstat are very cool, but both rely on the/proc/pid/io file to export statistics, this is not for older kernels, such as Rhel5u2

Therefore had to use the above 2 poor version of the command to replace:

echo 1 >/proc/sys/vm/block_dump # Open Block_dump, this will input IO information into DMESG

# Source: [Email protected]_rw_blk.c:3213

Watch-n 1 "dmesg-c | Grep-op \ "\w+\ (\d+\): (write| READ) \ "| Sort | Uniq-c "

# Keep on Dmesg-c

echo 0 >/proc/sys/vm/block_dump # Off when not in use

You can also use the ready-made script iodump, see http://code.google.com/p/maatkit/source/browse/trunk/util/iodump?r=5389

Iotop.stp

Systemtap script, a look to know is the poor copy of Iotop command, need to install SYSTEMTAP, default output information every 5 seconds

STAP IOTOP.STP # EXAMPLES/IO/IOTOP.STP

Summarize

Process level IO monitoring,

    1. Can answer 2 questions that system level IO monitoring cannot answer

    2. Relatively close to the business layer (for example, you can count the amount of process read and write)

But there is no way to connect with the business layer of read,write, while coarse granularity, there is no way to tell you, the current process read and write what files? Take? Size?

3 Business level IO monitoring ioprofile

The Ioprofile command is essentially lsof + strace, and the specific download is visible http://code.google.com/p/maatkit/

Ioprofile can answer your following three questions:

1 What files (read, write) are read and written at the business level at any given time in the current process?

2 What is the number of reads and writes? (Read, write number of calls)

3 What is the volume of read and write data? (Read, write byte number)

Suppose a behavior triggers a program IO action, for example: "One page click, which causes the background to read a,b,c files"

============================================

./io_event # Suppose to simulate an IO behavior, read a file once, B file 500 times, C file 500 times

Ioprofile-p ' pidof io_event '-C count # Read/write times

Ioprofile-p ' pidof io_event '-C times # Read and write time


Ioprofile-p ' pidof io_event '-C sizes # read-Write size

Note: Ioprofile only supports multithreaded programs and is not supported for single threaded program. For the IO Business-level analysis of single-threaded threads, strace is sufficient.

Summarize:

Ioprofile is essentially a strace, so you can see the read,write call trajectory, can do business layer IO analysis (mmap way powerless)

4 file-level IO monitoring

FILE-level IO monitoring can match/supplement "business-level and process-level" IO analysis

FILE-level IO analysis, primarily for individual files, answers which processes are currently reading or writing a file.

1 lsof or LS/PROC/PID/FD

2 INODEWATCH.STP

Lsof tells you which processes are open for the current file

Lsof.. /io # IO directory currently open by bash and lsof two processes

The lsof command can only answer static information, and "open" does not necessarily "read", for the cat, echo such command, open and read are instantaneous, lsof difficult to capture

You can use INODEWATCH.STP to compensate.

Stap INODEWATCH.STP Major Minor inode # main device number, auxiliary device number, file inode node number

Stap inodewatch.stp 0xfd 0x00 523170 # Main device number, auxiliary device number, inode number, can be obtained by STAT command

[[email protected] ~] # stat test.c   File: ' test.c'  size:375             blocks:8          IO block:4096   Regular filedevice:803h/ 2051d      inode:1208533     links:1access: (0755/-rwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root) access:2016-06-28 23:27:56.327543866 +0800modify:2015-12-27 22:13:27.654476214 +0800change:2015-12-27 22:13:27.823852491 +0800

5 IO Simulator

iotest.py # See Appendix

Developers can use Ioprofile (or strace) to make detailed analysis of the system's IO path, and then make the appropriate optimizations at the program level.

However, in general the adjustment procedures, the cost is relatively large, especially when uncertain whether the modification scheme can be effective, it is best to have some kind of simulation approach to quickly verify.

Consider our business as an example, when a query is found, the system's IO access mode is as follows:

Accessed the a file once

Accessed the B file 500 times, 16 bytes each time, average interval 502K

Accessed the C file 500 times, 200 bytes each time, average interval 4M

Here the B,c file is interleaved, both

1 Access B First, read 16 bytes,

2 again access c, read 200 bytes,

3 Go back to B, jump 502K and then read 16 bytes,

4 back to C, jump 4M, then read 200 bytes

5 Repeat 500 times

The Strace file is as follows:

A simple and naïve idea, will b,c interleaved read, change to first batch read B, and then bulk read C, so adjust the Strace file as follows:

The adjusted strace file, as input to iotest.py, iotest.py according to the access mode in Strace file, simulate the corresponding IO

Iotest.py-s io.strace-f Fmap

Fmap is mapped to the map file, and the FD, such as 222,333 in Strace, is mapped to the actual file

===========================

111 =/opt/work/io/a.data
222 =/opt/work/io/b.data
333 =/opt/work/io/c.data
===========================

6 Disk Defragmentation

Bottom line: As long as the disk capacity does not maintain more than 80% years, basically do not worry about fragmentation problem.

If you're really worried, you can use the defrag script

7 Other IO-related commands

Blockdev Series

=======================================

Blockdev--GETBSZ/DEV/SDC1 # View the block size of the SDC1 disk

Block Blockdev--GETRA/DEV/SDC1 # View the pre-read (readahead_kb) size of the SDC1 disk

Blockdev--setra 256/DEV/SDC1 # Set the pre-read (readahead_kb) size of the SDC1 disk, the lower version of the kernel through the/sys settings, sometimes will fail, rather than blockdev reliable

=======================================

Appendix Iotest.py

#! /usr/bin/env python#-*-coding:gbk-*-import osimport reimport timeit from ctypes import Cdll, Create_string_buffer, C_u   Long, c_longlongfrom optparse import optionparserusage = '%prog-s strace.log-f fileno.map ' ' _GLIBC = None_glibc_pread        = None_c_char_buf = None_open_file = []def getlines (filename): _lines = [] with open (filename, ' R ') as _f: For line in _f:if line.strip ()! = "": _lines.append (Line.strip ()) return _lines def Pars                      Ecmdline (): parser = optionparser (usage) parser.add_option ("-S", "--strace", dest= "Strace_filename", help= "Strace file", metavar= "file") parser.add_option ("-F", "--fileno", dest= "Fileno_filena    Me ", help=" Fileno file ", metavar=" File ") (options, args) = Parser.parse_args ()    If Options.strace_filename is None:parser.error ("Strace are not specified.") Ifnot os.path.exists (options.strace_filename): Parser.error ("Strace file does not exist. ")    If Options.fileno_filename is None:parser.error ("Fileno are not specified.")        Ifnot os.path.exists (options.strace_filename): Parser.error ("Fileno file does not exist.") Return options.strace_filename, Options.fileno_filename # [Type, ...] #   [Pread, FNO, Count, offset]# pread (A, "", 4348, 140156928) def parse_strace (filename): lines = Getlines (Filena Me) action = [] _regex_str = R ' (PREAD|PREAD64) [^\d]* (\d+), \s*[^,]*,\s* ([\dkkmm*+\-.] *), \s* ([\dkkmm*+\-.] *) ' For I in lines: _match = Re.match (_regex_str, i) if _match is none:continue# skip Invalid line _type, _FN, _c  Ount, _off = _match.group (1), _match.group (2), _match.group (3), _match.group (4) _off = _off.replace (' k ', "* 1024  "). Replace (' K '," * 1024x768 "). Replace (' m '," * 1048576 "). Replace (' m '," * 1048576 ") _count = _count.replace (' K '," * 1024x768 "). Replace (' K ', ' * 1024x768 '). Replace (' m '," * 1048576 "). Replace (' m '," * 1048576 ") #print _ofF Action.append ([_type, _FN, str (int (eval (_count))), str (int (eval (_off))]) return action def Parse_file  No (filename): lines = getlines (filename) fmap = {} for I in Lines:if I.strip (). StartsWith ("#"): continue# Comment Line _split = [J.strip () for J in I.split ("=")] If Len (_split)! = 2:continue# Invalid row fno, fname = _spli T[0], _split[1] fmap[fno] = fname return fmap def simulate_before (Strace, Fmap): Global _open_file, _c_cha  R_buf Rfmap = {} for I in Fmap.values (): _f = open (i, "r+b") #print "open {0}:{1}". Format (_f.fileno (),                    i) _open_file.append (_f) rfmap[i] = str (_f.fileno ()) # reverse Map To_read = 4 * 1024 # default 4K buffor i in strace:i[1] = rfmap[fmap[i[1]] # FID, fname, FID mapping conversion to_read = MAX (t            O_read, int (i[2])) #print "read buffer len:%d Byte"% to_read _c_char_buf = Create_string_buffer (to_read) Def simulate_after ():   Global _open_file for _f in _open_file: _f.close () def simulate (actions): #timeit. Time.sleep ( )                  # rest for 2 seconds for io interval start = timeit.time.time () for ACT in Actions: __simulate__ (ACT) finish = Timeit.time.time () return Finish-start def__simulate__ (ACT): Global _GLIBC, _glibc_pre AD, _c_char_buf if "pread" in act[0]: _fno = Int (act[1]) _buf = _c_char_buf _count = C_ulong                (int (act[2])) _off = c_longlong (int (act[3])) _glibc_pread (_fno, _buf, _count, _off)    #print _glibc.time (None) else:passpassdef loadlibc (): Global _GLIBC, _glibc_pread _glibc = Cdll ("libc.so.6") _glibc_pread = _glibc.pread64 if__name__ = = "__main__": _strace, _fileno = Parsecmdline () # Parse command-line arguments L OADLIBC () # load Dynamic Library _action = Parse_strace (_strace) # parse action file _fmap = Parse_ Fileno (_fileno) # ParsingFileName Map File Simulate_before (_action, _fmap) # preprocessing #print ' total IO operate:%d '% (len (_action)) #for Act in _action:  Print "". Join (ACT) print "%f"% simulate (_action)

IO monitoring and analysis under CentOS

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.