Search for high I/O latency issues on Linux servers
0. First, check the system status on top.
Two parameters are found to be abnormal. One is high average load, and the other is that cpu % wa is always above 50%.
Check the meaning of the % wa parameter:
- wa -- iowait
- Amount of time the CPU has been waiting for I/O to complete.
1. view disk read/write data
View disk status with iostat
- $ iostat -x 2 5
- avg-cpu: %user %nice %system %iowait %steal %idle
- 3.66 0.00 47.64 48.69 0.00 0.00
-
- Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
- sda 44.50 39.27 117.28 29.32 11220.94 13126.70 332.17 65.77 462.79 9.80 2274.71 7.60 111.41
- dm-0 0.00 0.00 83.25 9.95 10515.18 4295.29 317.84 57.01 648.54 16.73 5935.79 11.48 107.02
- dm-1 0.00 0.00 57.07 40.84 228.27 163.35 8.00 93.84 979.61 13.94 2329.08 10.93 107.02
2. view the process status
Process status table
- PROCESS STATE CODES
- D uninterruptible sleep (usually IO)
- R running or runnable (on run queue)
- S interruptible sleep (waiting for an event to complete)
- T stopped, either by a job control signal or because it is being traced.
- W paging (not valid since the 2.6.xx kernel)
- X dead (should never be seen)
- Z defunct ("zombie") process, terminated but not reaped by its parent.
View the Ddisk sleep process in the process.
- # for x in `seq 1 1 10`; do ps -eo state,pid,cmd | grep "^D"; echo "----"; sleep 5; done
Check that the kjournald process exists.
Check what this process does?
Kjournald is an ext3 file system logging process.
View Process status information
- # cat /proc/487/io
- rchar: 48752567
- wchar: 549961789
- syscr: 5967
- syscw: 67138
- read_bytes: 49020928
- write_bytes: 549961728
- cancelled_write_bytes: 0
Check where everything is written.
- # lsof -p 487
The cause of kjournald status D is still being checked ....