Hadoop2.0 parameter optimization Summary

Source: Internet
Author: User
Tags hdfs dfs
1. io. file. buffer. size is used to set the cache size for IO operations. The unit is byte. The default value is 4 kb. It is recommended to set it to 64 KB, that is, 655362. dfs. balance. the bandwidth of the bandwithPerSec cluster between the dn During the balance operation. You can use-threshold to specify the balance valve during the balance operation, but the bandwidth During the balance is specified statically by this parameter,

1. io. file. buffer. size is used to set the cache size for IO operations. The unit is byte. The default value is 4 kb. It is recommended to set it to 64 KB, that is, 65536 2. dfs. balance. the bandwidth of the bandwithPerSec cluster between the dn During the balance operation. You can use-threshold to specify the balance valve during the balance operation, but the bandwidth During the balance is specified statically by this parameter,

1. io. file. buffer. size

This parameter is used to set the cache size for IO operations. The unit is byte. The default value is 4 kb. It is recommended to set the cache size to 64 KB, that is, 65536 kb.

2. dfs. balance. bandwithPerSec

The bandwidth between the dn during the cluster's balance Operation. You can use-threshold to specify the balance threshold during the balance operation, but the bandwidth During the balance is specified statically by this parameter, therefore, when configuring a cluster, you should comprehensively consider the bandwidth and busy degree of the network environment where the cluster is located, because once set, you need to restart the cluster when you need to adjust it. setting too low will lead to a long balance time.

3. dfs. datanode. du. reserved

This parameter is set to reserve some space for the local computing of mapred, because when datanode finds that all mounted disks are full, it enters the read-only mode and the task cannot run. We recommend that you reserve 10 Gb of intermediate data for mapred for each disk. The unit is byte, that is, 10737418240.

4. dfs. namenode. handler. count

When nn processes the rpc and daemon of the dn, the size of the working thread pool must be used. If the number of working thread pools is too small, the nn connection times out or is rejected. If the number of working thread pools is too large, the rpc latency increases, we recommend that lg (N) * 20 and N be the cluster size.

You can use python scripts for computation:

import math;print int (math.log(N)*20)
5. dfs. datanode. failed. volumes. tolerated

When the disk mounted on the dn fails, the entire dn is out of service by default. At this time, the nn will re-distribute the block, and the cost is too high. We recommend that you set this value to 1, that is, when a maximum of one disk is allowed to fail, the dn continues to serve. In this case, you need to monitor the logs on the dn so that you can detect faults and replace the disk as soon as possible.

6. fs. trash. interval

Sets the recycle bin retention time. this parameter is listed to demonstrate how to skip the recycle bin to delete data even if the recycle bin retention time is set, it is similar to the shift + delete function in windows:

hdfs dfs -rm -skipTrash yourfilename

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.