Http://wiki.apache.org/hadoop/FAQ#On_an_individual_data_node.2C_how_do_you_balance_the_blocks_on_the_disk.3F
On this issue, Hadoop does not provide automatic solutions for the time being, it has been put on the agenda, Jira on the record.
A manual processing solution is mentioned on the Hadoop wiki. As shown on the link above.
Problem description, the datanode.dir of a Datanode node configures multiple disks or directories, and if for some reason, such as bad disk replacement or disk selection policy issues,
Cause some disk directories to use a lot, and some or some of the disk usage is very low, this time can be manually processed.
1. The datanode process of this datanode node is stopped first, so that no data is written before the next step can be done.
2. For example, from/data1/block move to/data2/, as far as possible to ensure that the directory structure consistent, wiki refers to the new version of the sub-directory inconsistency is not.
MV, to pay attention to the permissions, do not use the root user mv end, do not change permissions, which will also cause unreadable problems.
I myself tested the environment cdh5.0.2 hadoop2.3
Data.dir/hdp2/dfs/data, add a new path,/HDP2/DFS/DATA2
Because the Data2 directory is empty, if startup Datanode initializes the directory, such as creating a version file.
I created the data pool and other directories directly in the DATA2 to finalized, then moved the first block and meta below data, and didn't move the SubDir directory.
After I started the datanode process directly, I found that the data I moved was Deleted (why)
But fortunately, the secondary factor is 2, and the data will be synced over. I stopped the datanode process and put the SUBDIR10 (which has data) directly to the/DATA2 corresponding directory,
Start the Datanode process again, WebUI on the check found that the block was found.
To be safe, work in a production environment, make sure to back up the data, or upload a large file and then test the blocks for this file before you scale it.