Unbalanced HDFS file uploading and the Balancer is too slow
If a file is uploaded to HDFS from a datanode, the uploaded data will overwrite the current datanode disk, which is very unfavorable for running distributed programs.
Solution:
1. Upload data from other non-datanode nodes
You can copy the Hadoop installation directory to a node that is not in the cluster (you can directly upload the file from a non-datanode namenode, but this is not good, it will increase the burden on namenode, after a long time, the namenode will put a variety of messy files). hadoop processes are not started on this node, but can be used as a client. Upload files to the cluster.
You can also write a program to upload files and run it on other non-cluster nodes to upload files. In the program, you must set the necessary configurations, such as the namenode url and number of copies. If this parameter is not set, the default configuration in the hadoop jar package included in your program will be used, instead of the default cluster configuration.
2. Use balancer
You can use
Hdfs balancer-threshold XX
Xx is a percentage. The usage of this command is found on the Internet.
However, by default, this balance is very slow, because the default hadoop does not allow balancer to occupy a large amount of network bandwidth.
You can use
Hdfs dfsadmin-setBalanacerBandwidth newbandwidth
To set the bandwidth, in bytes
-------------------------------------- Split line --------------------------------------
Copy local files to HDFS
Download files from HDFS to local
Upload local files to HDFS
Common commands for HDFS basic files
Introduction to HDFS and MapReduce nodes in Hadoop
Hadoop practice Chinese version + English version + Source Code [PDF]
Hadoop: The Definitive Guide (PDF]
-------------------------------------- Split line --------------------------------------