1. io. file. buffer. size is used to set the cache size for IO operations. The unit is byte. The default value is 4 kb. It is recommended to set it to 64 KB, that is, 655362. dfs. balance. the bandwidth of the bandwithPerSec cluster between the dn During the balance operation. You can use-threshold to specify the balance valve during the balance operation, but the bandwidth During the balance is specified statically by this parameter,
1. io. file. buffer. size is used to set the cache size for IO operations. The unit is byte. The default value is 4 kb. It is recommended to set it to 64 KB, that is, 65536 2. dfs. balance. the bandwidth of the bandwithPerSec cluster between the dn During the balance operation. You can use-threshold to specify the balance valve during the balance operation, but the bandwidth During the balance is specified statically by this parameter,
1. io. file. buffer. size
This parameter is used to set the cache size for IO operations. The unit is byte. The default value is 4 kb. It is recommended to set the cache size to 64 KB, that is, 65536 kb.
2. dfs. balance. bandwithPerSec
The bandwidth between the dn during the cluster's balance Operation. You can use-threshold to specify the balance threshold during the balance operation, but the bandwidth During the balance is specified statically by this parameter, therefore, when configuring a cluster, you should comprehensively consider the bandwidth and busy degree of the network environment where the cluster is located, because once set, you need to restart the cluster when you need to adjust it. setting too low will lead to a long balance time.
3. dfs. datanode. du. reserved
This parameter is set to reserve some space for the local computing of mapred, because when datanode finds that all mounted disks are full, it enters the read-only mode and the task cannot run. We recommend that you reserve 10 Gb of intermediate data for mapred for each disk. The unit is byte, that is, 10737418240.
4. dfs. namenode. handler. count
When nn processes the rpc and daemon of the dn, the size of the working thread pool must be used. If the number of working thread pools is too small, the nn connection times out or is rejected. If the number of working thread pools is too large, the rpc latency increases, we recommend that lg (N) * 20 and N be the cluster size.
You can use python scripts for computation:
import math;print int (math.log(N)*20)
5. dfs. datanode. failed. volumes. tolerated
When the disk mounted on the dn fails, the entire dn is out of service by default. At this time, the nn will re-distribute the block, and the cost is too high. We recommend that you set this value to 1, that is, when a maximum of one disk is allowed to fail, the dn continues to serve. In this case, you need to monitor the logs on the dn so that you can detect faults and replace the disk as soon as possible.
6. fs. trash. interval
Sets the recycle bin retention time. this parameter is listed to demonstrate how to skip the recycle bin to delete data even if the recycle bin retention time is set, it is similar to the shift + delete function in windows:
hdfs dfs -rm -skipTrash yourfilename