Disk optimization for Linux performance Optimization (iii)

Source: Internet
Author: User
Tags disk usage


The content of this chapter, design more things. There will be a file system, disk, CPU and other aspects of knowledge, as well as related to the performance of the investigation and so on.


The file system alleviates the impact of disk latency on applications by means of caching and buffering, as well as asynchronous I/O. For a more detailed understanding of the file system, here are some related terms:

    • File system: A way to organize data into files and directories, to provide a file-based access interface, and to control access through file permissions. In addition, some special file types that represent devices, sockets, and pipelines, and metadata that contain file access timestamps.
    • File system cache: An area of main memory (usually DRAM) used to cache the contents of a file system, possibly containing various data and metadata.
    • Action: The operation of the file system is a request to the file system, including read, write, open, close, create, and other operations.
    • I/O: input/output. There are several definitions of file system I/O, which refers only to direct read and write (perform I/O) operations, including read, write, status statistics, and creation. I/O does not include opening files and closing files.
    • Logical I/O: the I/O to the file system by the application.
    • Physical I/O: I/O sent directly to disk by the file system.
    • Throughput: The data transfer rate between the current application and the file system, in units of B/s.
    • Inode: An index node is a data structure that contains metadata about the file system objects, with access rights, timestamps, and data pointers.
    • VFS: A virtual file system, a kernel interface for abstracting and supporting different file system types.

Disk-related terminology:

    • Simulation of the storage device. In the system view, this is a physical disk, but it may consist of multiple disks.
    • Transport bus: The physical bus used to communicate, including data transfer and other disk commands.
    • Sector: A block of storage on a disk, usually a size of 512B.
    • I/O: for disks, strictly speaking only read, write, and not include other disk commands. I/O consists of at least the direction (read or write), the disk address (location), and the size (in bytes).
    • Disk command: In addition to read and write, the disk is also assigned to perform other non-data transfer commands (such as cache writeback).
    • Bandwidth: The maximum data transfer rate that a storage transmission or controller can achieve.
    • I/O delay: The execution time of an I/O operation, which is widely used in the operating system field, has already exceeded the device layer.

Related concepts

File system delay
File system latency is a major indicator of file system performance, which refers to the time at which a file system logic request starts and ends. It includes the time spent on the file system, the kernel disk I/O subsystem, and the physical I/O waiting for disk devices. The thread of the application is usually blocked at the time of the request, waiting for the end of the file system request. In this case, the latency of the file system is directly proportional to the performance of the application. In some cases, applications are not directly affected by the file system, such as non-blocking I/O or I/O is initiated by an asynchronous thread.


After the file system starts, it uses main memory (RAM) as a cache to provide performance. The cache size grows over time while the operating system's spare memory is shrinking, and the kernel should quickly free up some memory space from the file system cache when the application needs more memory. The file system uses caching (caching) to improve read performance, while Buffering (buffering) improves write performance. File systems and block device subsystems typically use multiple types of caching.

Random I/O and sequential I/O
A series of file system logical I/O, according to the file offset of each I/O, can be divided into random I/O and sequential I/O. Each I/O in sequential I/O starts at the address of the last I/O end. Random I/O can not find the relationship between I/O, the offset random change. Random file system loads also include access to random files. Due to some performance characteristics of the storage device, the file system has been sequentially and continuously storing file data on disk in an effort to reduce the number of random I/O. When the file system fails to achieve this goal, the placement of the files becomes disorganized, and the sequential logical I/O is decomposed into random physical I/O, which is known as fragmentation.

Tip: For more information about the file system, please consult the relevant theory yourself. For example, you also need to understand the file system pre-read, prefetch, write back cache, synchronous write, bare I/O, direct I/O, memory mapped files, metadata and other related knowledge.

Performance analysis

Having background knowledge is what you need to know when analyzing performance issues. such as the hardware cache, and then the operating system kernel. The behavior details of the application are often involved with these things, and these underlying things affect the performance of the application in unexpected ways, such as the inability of some programs to take full advantage of the cache, resulting in degraded performance. such as unnecessary calls to excessive system calls, resulting in frequent kernel/user switching and so on. If you want to learn more about Linux systems, it is recommended to purchase related books for systematic learning. Below we describe how to analyze disk performance tools (in fact, not just disks):


Summarizes statistics on individual disks, providing metrics for disk load, usage, and saturation. A single line of system summary information is displayed by default, including the kernel version, hostname, log, schema, and number of CPUs, each of which contains one line of disk devices.

[[email protected] ~]# iostat Linux 3.10.0-514.el7.x86_64 (localhost.localdomain) September 18, 2017 _x86_64_ (1 CPU) avg-cpu:< C0/>%user   %nice%system%iowait  %steal   %idle           0.74    0.00    1.24    1.35    0.00   96.67Device:            TPs    kb_read/s    kb_wrtn/s    kb_read    kb_wrtnsda              14.43       456.85        60.82     218580      29098scd0              0.02         0.09         0.00         ,          0dm-0             13.65       404.58        56.50     193571      27030dm-1              0.27         2.23         0.00       1068          0

Parameter description

    • TPS: Number of things per second (IOPS).
    • KB_READ/S, KB_WRTN/S: reads the number of kilobytes per second and writes the number of kilobytes per second.
    • Kb_read, Kb_wrtn: The total number of kilobytes read and written.

To output more detailed content, try the following combination of commands:

[[email protected] ~]# iostat-xkdz 1Linux 3.10.0-514.el7.x86_64 (localhost.localdomain) September 18, 2017 _x86_64_ (1 CPU) Device:         rrqm/s   wrqm/s     r/s     w/s    rkb/s    wkb/s avgrq-sz avgqu-sz   await r_await w_await  SVCTM  %utilsda               0.01     2.43   13.81    2.32   510.51    67.96    71.74   0.22 13.94    8.72   44.95   2.37   3.82scd0              0.00     0.00    0.03    0.00     0.10     0.00     8.00     0.00    0.27    0.27    0.00   0.27   0.00dm-0   0.00 0.00 10.52    4.73   452.10    63.13    67.56     0.44   28.56   10.41   68.93   2.47   3.76dm-1              0.00     0.00    0.30    0.00     2.49     0.00    16.69     0.00    1.50    1.50    0.00   1.38   0.04

Parameter description

    • rrqm/s: The number of read requests that are merged into the drive request queue per second (when the system call needs to read the data, VFS sends the request to each FS, and if FS finds that different read requests are reading the same block data, FS merges the request into merge).
    • wrqm/s: The number of write requests that are merged into the drive request queue per second.
    • rsec/s: The number of read requests sent to disk devices per second.
    • wsec/: Number of write requests sent to disk devices per second.
    • rkb/s: The number of kilobytes read per second from the disk device.
    • wkb/s: The number of kilobytes per second written to the disk device. The
    • Avgrq-sz the average per request size, in sectors (512B). The average number of requests for the
    • Avgqu-sz in the drive request queue and active in the device.
    • await: Average I/O response time, including waiting in the drive request queue and the I/O response time (ms) of the device. Generally, the system I/O response time should be less than 5ms, if greater than 10ms is relatively large. This time includes the queue time and service time, that is, in general, await is greater than SVCTM, their difference is smaller, then the shorter the queue time, conversely, the greater the difference, the longer the queue time, indicating that the system has a problem.
    • SVCTM: Average I/O response time (ms) for disk devices. If the value of SVCTM is close to await, indicating that there is little I/O waiting, disk performance is good, and if the value of await is much higher than the value of SVCTM, the I/O queue waits too long for applications running on the system to become slower.
    • %util: Percentage of the device busy processing I/O requests (utilization). All processing IO time in the statistical time, divided by the total statistical time. For example, if the statistic interval is 1 seconds, the device has 0.8 seconds to process Io, and 0.2 seconds is idle, then the device's%util = 0.8/1 = 80%, so this parameter implies the device's busy level. Generally, if this parameter is 100% indicates that the device is already running close to full load (of course if it is a multi-disk, even if%util is 100% because of the concurrency of the disk, disk usage may not be the bottleneck).

Since Avgrq-sz is a combined number, a small size (16 sectors or smaller) can be seen as a sign of the actual I/O load that cannot be combined. Large size may be large I/O, or a combined continuous load. The most important indicator in output performance is await. If applications and file systems use a method that reduces write latency, w_await may be less important and should focus on r_await.
%util is still important for resource usage and capacity planning, but remember that this is only a measure of the rush (non-idle time) and is not significant for virtual devices that are backed by multiple disks later. These devices can be better understood by applying a load: IOPS (r/s + w/s) and throughput (rkb/s + wkb/s).


Top tool that contains disk I/O.

Batch mode (-B) provides scrolling output. The following demo shows only the I/O process (-O), output once every 5 seconds (-D5):

[Email protected] ~]# IOTOP-BOD5
Total DISK read:0.00 b/S | Total DISK write:8.76 k/sactual DISK read:0.00 b/S |        Actual disk write:24.49 k/s TID PRIO USER DISK READ DISK WRITE swapin IO COMMAND21203 BE/3 Root 0.00 B/S 815.58 b/s 0.00% 0.01% [jbd2/dm-2-8]22069 BE/3 root 0.00 b/S 0.00 B/s 0.00% 0.01% [JBD2/DM -1-8] 1531 be/0 root 0.00 b/S 6.37 k/s 0.00% 0.01% [Loop0] 3142 BE/4 root 0.00 b/S 0.00 B/S 0.00 % 0.01% [kworker/7:0]21246 BE/4 root 0.00 b/s 1631.15 B + 0.00% 0.00 Java-djava.util.logging.config.file=/u Sr/local/tomcat/conf/logging.properties-djava.util.logging.manager=org.apache.juli.classloaderlogmanager- Djdk.tls.ephemeraldhkeysize=2048-djava.endorsed.dirs=/usr/local/tomcat/endorsed-classpath/usr/local/tomcat/bin /bootstrap.jar:/usr/local/tomcat/bin/tomcat-juli.jar-dcatalina.base=/usr/local/tomcat-dcatalina.home=/usr/ Local/tomcat-djava.io.tmpdir=/usr/local/tomcat/temp Org.apache.catalina.startup.Bootstrap start 


The output shows that the Java process is applying a disk write load at a rate of approximately 1631.15 b/s. Other useful options are-a, which can output cumulative I/O instead of averaging over time, option-O to print only those processes that are performing disk I/O.

Of course, the command to display the disk, such as the SAR, Iosnoop, perf, Blktrace and other commands, here only the common commands can be listed.

Performance tuning

File system optimization

With regard to file system optimization, there is not much content to be explained. In the current situation, the Redhat Enterprise 7 Series is replaced by default for better performance XFS, which is also due to the fact that XFS is performing well in performance. In the course of our use, it is recommended to do some simple optimizations for XFS, such as specifying additional parameters when performing a format, specifying additional mount parameters when mounting the partition, which can improve the performance of the file system.

Parameters when formatting:

mkfs.xfs-d agcount=256-l SIZE=128M,LAZY-COUNT=1,VERSION=2/DEV/DISKA1

Parameters for Mount:


Disk-related optimizations

    • Operating system tunable parameters

Includes Ionice, resource control, and kernel tunable parameters.


The Ionice command in Linux can set a process I/O scheduling level and priority. The dispatch level is an integer, 0 means none, does not specify a level, the kernel picks a default value, the priority is selected according to the nice value of the process, 1 is real-time, the highest level of access to the disk, if the misuse causes other processes to starve; 2 indicates the best effort, the default scheduling level, including the priority 0~7,0; I/O is allowed after a period of idle disk time. As follows:

Ionice-c 3-p 65552


Provides a storage device resource control mechanism for a process or process group through Cgroup. Generally seldom used, do not consider.

Adjustable parameters

/sys/block/sda/queue/scheduler: Select I/O scheduler policy, is empty operation, deadline, an or CFQ;

    • Disk device tunable parameters

The Hdparm (Disk Test Utility) tool on Linux can set adjustable parameters for a variety of disk devices.

    • Disk controller tunable parameters


Disk optimization for Linux performance Optimization (iii)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.