Detailed description of hadoop cluster balance Tool

Source: Internet
Author: User

During online hadoop cluster O & M, hadoop's balance tool is usually used to balance the distribution of file blocks in each datanode in the hadoop cluster, to avoid the high usage of some datanode disks (this problem may also lead to higher CPU usage of the node than other servers ).

1) usage of the hadoop balance tool:

To start:bin/start-balancer.sh [-threshold <threshold>]Example: bin/ start-balancer.shstart the balancer with a default threshold of 10%bin/ start-balancer.sh -threshold 5start the balancer with a threshold of 5%To stop:bin/ stop-balancer.sh 

2) several parameters that affect the hadoop balance tool:

-Threshold: 10 by default. value range: 0-100. parameter description: target parameter used to determine whether the cluster is balanced, the difference between the storage usage of each datanode and the total storage usage of the cluster should be smaller than this threshold value. Theoretically, the smaller the value set for this parameter, the more balanced the entire cluster, but in the online environment, during the balance operation, the hadoop cluster is still writing and deleting data concurrently. Therefore, the configured balance parameter value may not be reached.

DFS. Balance. bandwidthpersec default setting: 1048576 (1 m/s), parameter meaning: Set the bandwidth occupied by the balance tool during running. setting too large may cause mapred to slow down

3) Other features of the hadoop balance tool:

During the running process of the balance tool, the file block is moved from the high-usage datanode to the low-usage datanode iteratively. The data volume in each iteration cannot exceed the smaller of the following two values: 10 Gb or a specified threshold * capacity, and each iteration cannot exceed 20 minutes. After each iteration, the balance tool updates the file block distribution of the datanode. The following is an English description of the official documentation:

The  tool moves  blocks from  highly utilized datanodes  to  poorly utilized datanodesiteratively. In each iteration a datanode moves or receives no more than the lesser of 10Gbytes or the threshold fraction of its capacity. Each iteration runs no more than 20minutes. At the end of each iteration, the balancer obtains updated datanodes informationfrom the namenode.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.