Disk Io Performance Analysis

Source: Internet
Author: User

I. Concept

Disk Io, as its name implies, is the input and output of the disk. Write Data to and read data from the disk.

I/O read/write type. In general, I/O types can be divided:

Read/write I/O, large/small I/O, continuous/random I/O, and sequential/concurrent I/O. In these types, we mainly discuss: Large/small I/O, continuous/random I/O, sequential/concurrent I/O.

1. read/write I/O

Disks are used to access data. Therefore, when I/O operations are involved, there will be two corresponding operations. When storing data, they correspond to write Io operations, when data is retrieved, it corresponds to the read Io operation.

When the Controller that controls the disk receives the read Io operation instruction from the operating system, the controller sends a read data instruction to the disk, at the same time, the address of the data block to be read is transmitted to the disk. Then, the disk transmits the read data to the Controller, and the Controller returns the data to the operating system to complete the read Io operation; similarly, a write Io operation is similar. The controller receives the write Io Operation Command and the data to be written, and passes it to the disk, after the data is written to the disk, the operation result is transmitted back to the Controller, and then the Controller returns the result to the operating system to complete an I/O write operation. A single Io operation is to complete a write Io or read Io operation.

2. Large/small I/O

This value refers to the number of consecutive read sectors given in the Controller command. If the number is large, such as 64,128, we can think of it as a large block I/O; otherwise, if it is small, such as, we will think it is a small block I/O. In fact, there is no clear boundary between a large block and a small block I/O.

3. Continuous/random I/O

Continuous I/O refers to the fact that the initial slice address given by this I/O is completely continuous or not much apart from the end slice address of the last I/O. If the difference is large, it is counted as a random I/O.

The reason why continuous I/O is more efficient than random I/O is: When performing continuous I/O, the head almost does not need to change the track, or the change time is very short; for random I/O, if this I/O is too large, the head will not stop changing, resulting in a great reduction in efficiency.

4. Sequential/concurrent I/O

In terms of concept, concurrent I/O refers to sending an I/O command to a disk without waiting for it to respond, and then sending an I/O command to another disk. For a striped RAID (Lun), I/O operations on it are concurrent, such as RAID 0 + 1 (1 + 0) and RAID 5. Otherwise, it is sequential I/O.

Ii. Factors affecting disk Performance

A traditional disk is essentially a type of mechanical device, such as FC, SAS, and SATA disks, with a speed of 5400/7200/10 k/15 K rpm. The key factor affecting the disk is the disk service time, that is, the time it takes for the disk to complete an I/O request. It consists of three parts: Seeking time, rotation delay, and data transmission time.

1. Seek time

Tseek refers to the time required to move the read/write head to the correct track. The shorter the tracing time, the faster I/O operations are. Currently, the average tracing time of a disk is generally 3-15 ms.

2. Rotation Delay

Trotation refers to the time required for disk rotation to move the sector where the request data is located to the bottom of the read/write head. The rotation delay depends on the disk speed, which is usually expressed by 1/2 of the time required for disk rotation for one week. For example, the average rotation latency of a 7200 RPM disk is about 60*1000/7200/2 = 4.17 ms, while the average rotation latency of a 15000 rpm disk is 2 ms.

3. Data Transmission Time

Ttransfer refers to the time required to complete the data requested for transmission. It depends on the data transmission rate, and its value is equal to the data size divided by the data transmission rate. At present, IDE/ATA can reach 133 Mb/s, and sata ii can reach the interface data transmission rate of 300 MB/S. The data transmission time is usually far less than the time consumed by the first two parts. This parameter can be ignored during simple computing.

The average physical tracing time of a common disk is as follows:

The average physical seek time of the 7200 RPM stat hard disk is 10.5 Ms.

The average physical seek time of the 10000 RPM stat hard disk is 7 ms.

The average physical seek time for 15000 rpm SAS hard disks is 5 ms.

The rotation delay of common hard disks is as follows:

The average rotation delay of a 7200 RPM disk is about 60*1000/7200/2 = 4.17 Ms.

The average rotation delay of a 10000 RPM disk is about 60*1000/10000/2 = 3 ms,

The average rotation delay of a disk with 15000 rpm is about 60*1000/15000/2 = 2 ms.

Theoretical Calculation Method for maximum iops:

Iops = 1000 MS/(seek time + rotation delay ). The data transmission time can be ignored.

7200 RPM disk iops = 1000/(10.5 + 4.17) = 68 iops

10000 RPM disk iops = 1000/(7 + 3) = 100 iops

15000 rpm disk iops = 1000/(5 + 2) = 142 iops

Factors Affecting testing:

In actual measurement, iops values are affected by many factors, including I/O load characteristics (read/write ratio, order and random, number of working threads, queue depth, and data record size), system configuration, operating system, disk drive, and so on. Therefore, when comparing disk iops measurement, it must be performed under the same test benchmark. Even so, random uncertainty may occur.

Iii. Important performance indicators

Common disk types include ata, SATA, FC, SCSI, and SAS. Among these disks, SAS and FC disks are commonly used for servers, and SSD disks are also used for some high-end storage. The performance of each type of disk is different. The continuous read/write behavior of the mechanical hard disk is very good, but the random read/write performance is very poor, mainly because it takes time to move the head to the correct track, during random read/write, the head needs to be constantly moved, and the time is wasted on the head addressing, so the performance is not high. When storing small files, random read/write iops is an important indicator. When storing large files like videos, sequential read/write iops is an important indicator.

There are two main indicators:

1, iops

The number of I/O reads and writes to the disk within one second. Iops mainly depends on the array algorithm, cache hit rate, and number of disks. The array algorithms vary with arrays. For example, we recently encountered an hds usp where the ldev (Lun) has queue or resource restrictions, while the iops of a single ldev is not supported. The cache hit rate depends on the data distribution, cache size, data access rules, and cache algorithms. I only emphasize the hit rate of one cache here. If an array is used, the higher the hit rate of the read cache, the better. It generally means that it supports more iops. Hard Disk restrictions, each physical hard disk can handle iops is limited, if an array has 120 15 k rpm Optical Fiber hard drive, then, the maximum iops supported is 120*150 = 18000. This is the theoretical value of the hardware limitation. If this value is exceeded, the hard disk response may become very slow and cannot provide services normally.

2. Swallow Measurement

It is also called disk bandwidth. the disk I/O traffic per second is the total size of data written and read by the disk. It mainly depends on the architecture of the disk array, the size of the channel, and the number of disks. Different disk arrays have different architectures, but they all have their own internal bandwidth (such as the main line or star). However, in general, internal bandwidth is well designed, there is no bottleneck. Data channels between disk arrays and servers have a significant impact on throughput. Generally, an I/O bottleneck occurs when the actual Disk Throughput exceeds 85% of the Disk Throughput. Below is the bandwidth of common channels:

2 Gbit/s Fiber Channel (250 Mb/s), 4 Gbit/s Fiber Channel (500 Mb/s), SCSI maximum speed is 320 Mb/s, Sata is 150 Mb/s, and IED is 133 Mb/s. At last, it is hard disk restrictions. Currently, the data transmission rate of SCSI hard disks is at most 80 Mb/s, and that of SAS hard disks is at most 80-100 Mb/s. For discrete Writing of small data in a database, the transmission rate quickly fails to reach this value, mainly because disk addressing and other factors waste too much time.

The following is an example:

If it takes 0.1 s to write a 10 M file, the disk calculates that the disk bandwidth is 100 m/s. If it takes 10 s to write 10000 files with a size of 1 kb, the disk bandwidth is only 1 m/s.

Relationship between iops and throughput:

I/O throughput per second = iops * Average I/O size. The formula shows that the higher the I/O size, the higher the iops, the higher the I/O throughput per second. Therefore, we will think that the higher the iops and throughput value, the better. In fact, for a disk, the two parameters have their maximum values, and the two parameters also have a certain relationship.

Iv. Tools used to test disk Performance

1. download and install the FIO tool:

# Git clone git: // git. kernel. dk/FIO. Git

# Yum install libaio-devel
# Cd FIO

#./Configure
# Make
# Make install

2. asynchronous Io performance test:

Different applications use different Io types for Io read/write. Therefore, different applications should use different Io engines for testing.

In asynchronous mode, Linux native AIO such as libaio is used to submit a batch of requests at a time, and then wait for the completion of the batch to reduce the interactions and improve the efficiency.

# Cat nvdisk-test
[Global]
BS = 512
Ioengine = libaio
Userspace_reap
RW = randrw
Rwmixwrite = 20
Time_based
Runtime = 180
Direct = 1
Group_reporting
Randrepeat = 0
Norandommap
Ramp_time = 6
Iodepth = 16
Iodepth_batch = 8
Iodepth_low = 8
Iodepth_batch_complete = 8
Exitall

Size = 5g
[Test]
Filename =/data/test. Data
Numjobs = 1

The parameters and options used are as follows. For more information, see man FIO:

BS = 16 K the size of the block file for a single Io is 16 kioengine I/O engine uses the asynchronous libaio method userspace_reap libaio special options. By default, FIO uses io_getevents to call the time for the new return from the harvest, after this option is enabled, the harvest is directly completed in the user space. RW = randwrite test random write I/orwmixwrite in hybrid read/write mode, write percentage of time_based does not reach the specified runtime, but the test task has been completed, the program still does not stop, repeat the test until the specified Runtime is reached. Runtime running sdirect = 1 the test process bypasses the buffer that comes with the machine. Make the test results more realistic. For Asynchronous Io model testing, group_reporting must be enabled to summarize information about each process or this thread about the displayed results. Randrepeat configures the generator for random Io loads so that the path is predictable, so that the generated sequence is the same for each repeated execution of norandommap. In general, FIO performs random Io, it will overwrite every block of the file. If this option is set, FIO will only get a new random offset without querying the history of the past. This means that some blocks may not be read or written, and some blocks may need to be read/written many times. Ramp_time specifies the time for running a specific load before recording any performance information. After stable performance, record the log result iodepth = 16iodepth_batch = 8iodepth_low = 8iodepth_batch_complete = 8libaio asynchronous engine will use this iodepth value to call io_setup to prepare a context where iodepth Io can be submitted at, at the same time, I/O Request queues are applied to maintain Io. During the stress test, the system generates a specific Io request and throws it into the IO Request queue. When the number of IO in the queue reaches the iodepth_batch value, call io_submit to submit the request in batches, and then call io_getevents to start harvesting the completed Io. How much is each harvest? Because the timeout value is set to 0 during harvest, you can harvest up to iodepth_batch_complete values even if the number of completed items is reached. With the harvest, there will be less IO in the IO queue, so new IO needs to be added. When can I add it? When the number of Io drops to the iodepth_low value, it is refilled to ensure that the OS can see at least the iodepth_low Number of Io queues at the elevator. Size specifies the size of the data file for this test. By default, the size of a disk is 4 kb. Exitall when a job is completed, it exits filename. In general, FIO generates a file name based on the job name, thread number, and file number. If you want to share the same file among multiple jobs, you can use this parameter to set a file name to replace the default name. This file must be in the entire column of the disk to be tested. Numjobs test process count

Note the following points in FIO task Configuration:
1. open the file directly when libaio is working.
2. The block size must be a multiple of the slice (512 bytes.
3. userspace_reap improves the asynchronous Io Harvesting Speed.
4. ramp_time is used to reduce the impact of logs on high-speed IO.
5. fsync will not happen if direct is enabled.

3. synchronous I/O performance test case:

Sequential read:
FIO-filename =/data/test. data-Direct = 1-iodepth 1-thread-RW = read-ioengine = psync-BS = 16 K-size = 5G-numjobs = 30-runtime = 1000-group_reporting-name = mytest

Random write:
FIO-filename =/data/test. data-Direct = 1-iodepth 1-thread-RW = randwrite-ioengine = psync-BS = 16 K-size = 5G-numjobs = 30-runtime = 1000-group_reporting-name = mytest

Sequential write:
FIO-filename =/data/test. data-Direct = 1-iodepth 1-thread-RW = write-ioengine = psync-BS = 16 K-size = 5G-numjobs = 30-runtime = 1000-group_reporting-name = mytest

Mixed random read/write:
FIO-filename =/data/test. data-Direct = 1-iodepth 1-thread-RW = randrw-rwmixread = 70-ioengine = psync-BS = 16 K-size = 5 GB-numjobs = 30-runtime = 100 -group_reporting-name = mytest-ioscheduler = Noop



This article is from the "jia" blog, please be sure to keep this source http://leejia.blog.51cto.com/4356849/1552807

Disk Io Performance Analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.