Background information:
Recently, a lot of datanode node disk IO is relatively high, the main reason is due to the increase in job number, as well as the size of the increase.
But any way to reduce disk IO consumption, we can all try.
For example, we can often see HDFs users executing the "du-sk" command:
code is as follows |
copy code |
[Root@idc1-serv Er2 ~]# ps-ef| grep "Du-sk" hdfs 17119 10336 1 00:57? 00:00:04 du-sk/data1/dfs/dn/current/bp-1281416642-10.100.1.2-1407274717062 hdfs 17142 10336 1 00:57? 00:00:03 du-sk/data5/dfs/dn/current/ bp-1281416642-10.100.1.2-1407274717062 hdfs 17151 10336 1 00:57? 00:00:05 du-sk/data6/dfs/dn/current/bp-1281416642-10.100.1.2-1407274717062 |
...
As the data on the Datanode continues to grow, such frequent du operations can take a long time, and when CPU and disk IO are idle, it takes about 5 seconds each time, and when the server load is high, such operations can take long.
Instead, we consider replacing the original du command by writing a new Du command based on the DF command.
The code is as follows |
Copy Code |
[Root@idc1-server2 ~]# Mv/usr/bin/du/usr/bin/du.orig [Root@idc1-server2 ~]# Vim/usr/bin/du #!/bin/sh mydf=$ (DF-PK $ | grep-ve ' ^filesystem|tmpfs|cdrom ' | awk ' {print $} ') ECHO-E "$mydft $" [Root@idc1-server2 ~]# chmod +x/usr/bin/du |
But in this case, the statistical results are not accurate?
But the specific situation, in general, Hadoop Datanode will use different disk and partition to store data, then use DF statistics results, the error should be very small.