How do we measure the performance of a store? IOPS (input/output operationsper Second), which is the number of read/write (I/O) operations per second , is a common international storage performance metric, The higher the IOPS means that the more user requests that the system can handle at the same time, the greater the pressure on the user's access, the lower the configuration to meet the same performance requirements, which can lead to higher productivity and value for the customer.
There are two ways for each storage vendor to increase the IOPS value of the product under test: Try to use small capacity (such as 36GB, 73GB), high-speed (15krpm) disks, as much as possible to increase the number of products under test, because the smaller the capacity of a single disk, the higher the speed, its The higher the IOPS value, the greater the number of disks, the higher the iops value obtained through RAID 0, and the raid 10 setting as much as possible, because RAID 10 has the highest ioPS value in various RAID levels .
-------------------------------------------------------------
Fio is a great tool for testing IOPS, for stress testing and validation of hardware, supporting 13 different I/O engines,
Includes: Sync,mmap, Libaio, Posixaio, SG v3, splice, NULL, network, Syslet, Guasi, Solarisaio, etc. FIO website Address: http://freshmeat.net/projects/fio/
First, FIO installation
wget http://brick.kernel.dk/snaps/fio-2.1.7.tar.bz2
Yum Install libaio-devel tar-jxvf fio-2.1.7.tar.bz2 cd fio-2.1.7./configuremake & Make Install
The Fio tool supports many types of tests, and there are many parameters that can be used to obtain usage information from the Help documentation. The following is a brief description of how to view the Help documentation.
Main parameter Description:
--HELP: Get help information.
--cmdhelp: Gets the help document for the command.
--enghelp: Get the help documentation for the engine.
--debug: View the details of the test by using debug mode. (process, file, Io, mem, blktrace, verify, Random, Parse, diskutil, job, mutex, profile, time, net, rate)
--output: The test results are output to a file.
--output-format: Sets the output format. (Terse, JSON, normal)
--crctest: Test CRC performance. (MD5, CRC64, CRC32, crc32c, CRC16, Crc7, SHA1, SHA256, sha512, Xxhash:)
--cpuclock-test:cpu is always tested.
second, random reading test:Random read:fio-filename=/dev/sdb1-direct=1-iodepth 1-thread-rw=randread-ioengine=psync-bs=16k-size=200g-numjobs=10-run Time=1000-group_reporting-name=mytestDescription: FILENAME=/DEV/SDB1 test file name, usually select the data directory of the disk to be tested. The Direct=1 test process bypasses the machine's own buffer. Make test results more realistic iodepth: Set the depth of the IO queue thread FIO use threads instead of processesRw=randread test for Random read I/oioengine=psync IO engine uses psync modebs=16k Single IO block file size is 16k bsrange=512-2048 Ibid, the size range of the data block size=200g the test file size is 200g, test every 4k IO numjobs=10 This time the test thread for the 10runtime=1000 test time is 1000 seconds, if not write the 5g file will always be divided into 4k each time group_reporting about the results, summarize the information of each process in addition rwmixwrite=30 in mixed read and write mode , write-up 30% lockmem=1g only uses 1g of memory to test Zero_buffers with 0 initialization system buffernrfiles=8 the number of files generated per process sequence read: Fio-filename=/dev/sdb1-direct=1- Iodepth 1-thread-rw=read-ioengine=psync-bs=16k-size=200g-numjobs=30-runtime=1000-group_reporting-name=mytest Random Write: fio-filename=/dev/sdb1 -direct=1-iodepth 1-thread-rw=randwrite-ioengine=psync-bs=16k-size=200g-numjobs=30-runtime=1000-group_reporting-name=mytest Sequential write: FIO-FILENAME=/DEV/SDB1 -direct=1-iodepth 1-thread-rw=write-ioengine=psync-bs=16k-size=200g-numjobs=30-runtime=1000-group_reporting-name=mytest mixed Random Read and write: fio-filename=/dev/ Sdb1-direct=1-iodepth 1-thread-RW=RANDRW -rwmixread=70-ioengine=psync-bs=16k-size=200g-numjobs=30-runtime=100-group_reporting-name=mytest-ioscheduler= noop three, the actual test sample: test results show detailed system information, including IO, latency, bandwidth, CPU and other information, detailed as follows: [[email protected] ~]# Fio-filename=/dev/sdb1-direct=1-iodepth 1-thread-rw=randrw-rwmixread=70-ioengine=psync-bs=16k-size=200g-numjobs =30 -runtime=100-group_reporting-name=mytest1 mytest1: (g=0): RW=RANDRW, Bs=16K-16K/16K-16K, Ioengine=psync, Iodepth=1 ... mytest1: (g=0): RW=RANDRW, bs=16k-16k/16k-16k, Ioengine=psync, iodepth=1 fio 2.0.7 starting threads jobs:1 (f=1): [________________m_____________] [3.5% done] [6935K/3116K/ S] [423/190 IOPS] [eta 48m:20s] S] mytest1: (Groupid=0, jobs=30): err= 0:pid=23802
read: IO (measured data volume) =1853.4MB,
bw (bandwidth) =18967kb/s, iops=1185 , runt (total run time) =100058msec Clat (USEC): min=60, max=871116, avg=25227.91, stdev=31653.46 lat (usec): min=60, max=871117, avg=25228.08, stdev=3165 3.46 Clat percentiles (msec): | 1.00th=[3], 5.00th=[5], 10.00th=[6], 20.00th=[8], | 30.00th=[], 40.00th=[[50.00th=[], 60.00th=[19], | 70.00th=[], 80.00th=[(Notoginseng), 90.00th=[, 95.00th=[79], | 99.00th=[151], 99.50th=[202], 99.90th=[338], 99.95th=[383], | 99.99th=[523] BW (KB/S): min=, max= 1944, per=3.36%, avg=636.84, stdev=189.15
write: io=803600kb,
bw=8031.4kb/s, iops=501 , RUNT=100058MSEC  Clat (USEC): min=52, max=9302, avg=146.25, Stdev=299.17 lat (USEC): min=52, max=9303, avg=147.19, STDEV=299.17&NB Sp;clat percentiles (USEC): | 1.00th=[], 5.00th=[[10.00th=[], 20.00th=[74], | 30.00th=[], 40.00th=[, 50.00th=[], 60.00th=[90], | 70.00th=[], 80.00th=[, 90.00th=[], 95.00th=[370], | 99.00th=[1688], 99.50th=[2128], 99.90th=[3088], 99.95th=[3696], | 99.99th=[5216] BW (kb/s): min=, max= 1117, per=3.37%, avg=270.27, Stdev=133.27 lat (USEC): 100=24.32%, 250 =3.83%, 500=0.33%, 750=0.28%, 1000=0.27% lat (msec): 2=0.64%, 4=3.08%, 10=20.67%, 20=19.90%, 50=17.91% lat (ms EC): 100=6.87%, 250=1.70%, 500=0.19%, 750=0.01%, 1000=0.01% cpu:usr=1.70%, sys=2.41%, ctx=5237835, majf=0, minf=63 44162 io depths:1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit:0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete:0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued:total=r=118612/w=50225/d=0, short=r=0/w=0/d=0 Run Status Group 0 (all jobs): READ:IO=1853.4MB, aggrb=18966kb/s, minb=18966kb/s, maxb=18966kb/s, Mint=100058msec, maxt=100058msec write:io=803600kb, aggrb=8031kb/s, minb=8031kb/s, maxb=8031kb/s, Mint=100058msec, maxt= 100058msec disk stats (read/write): sdb:ios=118610/50224, merge=0/0, ticks=2991317/6860, in_queue=2998169 , util=99.77%
** disk array throughput and IOPS analysis ** 1, throughput throughput mainly depends on the architecture of the array, the size of the Fibre Channel (now the array is generally a fiber array, as for the SSAS array such as SCSI, We don't discuss) and the number of hard drives. The array's architecture differs from each array, and they all have internal bandwidth (similar to the PC's system bus), but in general, the internal bandwidth is well designed, not the bottleneck. The impact of fiber channel is still relatively large, such as data Warehouse environment, the traffic requirements for data is very large, and a 2Gb of fiber-optic card, 77 can support the maximum flow should be 2GB/8 (small b) =250mb/s (large B) of the actual flow, when 4 blocks of optical fiber card to achieve 1gb/ S of actual traffic, so the data Warehouse environment can consider changing the 4Gb fiber optic card. Finally talk about the limitations of the hard disk, here is the most important, when the front of the bottleneck no longer exist, it is necessary to look at the number of hard disks, I listed below the different hard drives can support the flow size: k rpm, ata ——— ——— ——— &n Bsp 10m/s 13m/s 8m/s So, suppose an array has 120 15K rpm of the optical drive, then the largest on the hard disk can support the flow of 120*13=1560mb/s, if it is 2Gb fiber card, it may take 6 blocks to be able, and 4Gb of fiber-optic card, 3-4 Bucks is enough. 2, iops determines the IOPS based on the array algorithm, the cache hit rate, and the number of disks. Array algorithms are different for different arrays, as we have recently encountered on HDS USP, possibly because Ldev (LUNs) have queue or resource limitations, and the ioPS of a single ldev is not up, so it is necessary to understand some of the algorithm rules and limitations of this store before using this storage. The hit rate of cache depends on the distribution of the data, the size of cache size, the rules of data access, and the cache algorithm, which, if fully discussed, will become very complex and can be discussed in a good day. I only emphasize one cache hit rate, if an array, read the cache hit rate is better, generally said it can support more iops, why so? This is related to the hard disk IOPS we are going to discuss below. HDD limit, the IOPS per physical hard drive can be limited, such as K rpm ata ——— ——— ——— &nBsp 50 Similarly, if an array has 120 15K rpm optical drive, then it can support the maximum iops of 120*150=18000, this is the theoretical value of the hardware limit, if more than this value, The response of the hard disk may become very slow and not provide business normally. on RAID5 and RAID10, there is no difference in read ioPS, but the same business write ioPS, and ultimately the ioPS on disk, is different, and we're evaluating the disk's IOPS, and if the disk limit is reached, performance is definitely up. Then we assume a case that the IOPS of the business is 10000, the read cache hit rate is 30%, the read ioPS is 60%, the Write ioPS is 40%, the number of disks is 120, and the RAID5 and RAID10 are calculated separately. What is the ioPS per disk. raid5: IOPS of a single disk = (10000* (1-0.3) *0.6 + 4 * (10000*0.4))/120 = (4200 + 16000)/120 = 168 Here the 10000* (1-0.3) *0.6 is read ioPS, the ratio is 0.6, get rid of the cache hit, actually only 4,200 iops and 4 * (10000*0.4) represents the written ioPS, because each write, in RAID5, 4 io is actually occurring, so the ioPS written is 16,000 in order to consider RAID5 in the write operation, that 2 read operation can also hit, so more accurate calculation for: single disk IOPS = (10000* (1-0.3) *0.6 + 2 * ( 10000*0.4) * (1-0.3) + 2 * (10000*0.4))/120 = (4200 + 5600 + 8000)/120 = 148 Calculates a single disk with 148 ioPS, which basically reaches the disk limit raid10 IOPS of a single disk = (10000* (1-0.3) *0.6 + 2 * (10000*0.4))/120 = (4200 + 8000)/120 = 102 As you can see, because RAID10 has only 2 io for a write operation, the same pressure, same disk, only 102 IOPS per disk, is far below the magneticThe ultimate IOPS of the disk. In a real case, a heavy recovery standby (mainly written, and small io write), using the RAID5 scheme, found that performance is poor, through analysis, IOPS per disk in the peak period, quickly reached 200, resulting in a huge response speed. This performance problem was avoided after the RAID10 was changed, and IOPS per disk dropped to about 100.
Test disk IOPS with Fio