Map-side Tuning parameters
|
Type |
Default value |
Description |
Io.sort.mb |
Int |
100 |
The size of the memory buffer used when sorting the map output, in M. When the node memory is large, the parameter can be increased to reduce the number of disk writes. |
Io.sort.record.percent |
Float |
0.05 |
Used as a scale for storing map output (IO.SORT.MB) records. The remaining space is used to store the map output record itself |
|
Io.sort.spill.percent |
Float |
0.80 |
The map output starts writing disk thresholds. |
Io.sort.factor |
Int |
10 |
The maximum number of streams that are merged at a time when the map output is sorted. This property is also used in reduce. |
Min.num.spills.for.combine |
Int |
3 |
The minimum number of overflow files required to run combiner (when combiner is specified). The default is 3, which means that when the number of map spill is greater than or equal to 3 o'clock, the combine operation is performed on each spill before the merge operation of the map to reduce the number of files written to disk. |
Mapred.compress.map.output |
Boolean |
False |
Whether to compress the map output |
Mapred.map.output.compression.codec |
Class Name |
Org.apache.hadoop.io. Compress. Defaultcodec |
For map output compression codecs |
Tasktracker.http.threads |
Int |
40 |
The number of worker threads per tasktracker that is used to output the map to reducer. This property is a cluster-wide setting and cannot be set by a single job |
Mapred.map.max.attempts |
Int |
4 |
After the map task fails, retry the execution count. The default value is 4, and if the task map task fails more than 4, the entire calculation task will fail. |
Mapred.max.map.failures.percent |
|
|
Maximum percentage of map task failures allowed without triggering job failure |
Mapred.map.tasks.speculative.execution |
Boolean |
True |
Whether to initiate the speculative execution of the map task. By default, Hadoop launches a new map instance when a map task is operating longer than the average map time. |
reduce End tuning Parameters
Property name |
Type |
Default value |
Description |
Mapred.reduce.parallel.copies |
Int |
5 |
The number of threads used to copy the map output to reduce |
Mapred.reduce.copy.backoff |
Int |
300 |
Before declaring a task to fail, reducer gets the maximum event, in seconds, that the map output spends. If the task fails, reducer can try to retransmit again within this time |
Io.sort.factor |
Int |
10 |
The maximum number of merged streams at a time when a file is sorted. This property is also used on the map side. |
Mapred.job.shuffle.input.buffer.percent |
Float |
0.70 |
In the Shuffle replication phase (copy), the buffer allocated to the map output represents the percentage of the reduce heap space. |
Mapred.job.shuffle.merge.percent |
Float |
0.66 |
The threshold value of the map output buffer (defined by mapred.job.shuffle.input.buffer.percent) is used to initiate the merge output and the disk overflow write process. |
Mapred.inmem.merge.threshold |
Int |
1000 |
The number of threshold values for the map output that initiated the merge output and the disk overflow write process. A number of 0 or smaller means there is no threshold limit, and overflow write behavior is controlled by mapred.job.shuffle.merge.percent alone |
Mapred.job.reduce.input.buffer.percent |
Float |
0 |
In the reduce process, the amount of space stored in the map output in memory is proportional to the total heap space. When the reduce phase begins, the map output size in memory cannot be greater than this value. By default, all map outputs are merged onto disk before the reduce task starts to provide as much memory as possible for reducer. If the reducer requires less memory, you can increase the value to minimize the number of times the disk is accessed and increase computational efficiency. |
Mapred.reduce.max.attempts |
Int |
4 |
After the reduce task fails, retry the execution count. The default value is 4, and the entire calculation task will fail if the task reduce is more than 4 failures. |
Mapred.max.reduce.failures.percent |
|
|
Maximum percentage of allow reduce task failure without triggering job failure |
Mapred.reduce.tasks.speculative.execution |
Boolean |
True |
Whether to start the speculative execution of the reduce task |
Hadoop Global Tuning
Property name |
Type |
Default value |
Description |
Mapred.child.java.opts |
|
|
The JVM memory size of the map or reduce task. If the setting is too small, it will error "Java Heap space" |
Mapred.job.reuse.jvm.num.tasks |
Int |
1 |
On a tasktrakcer, the maximum number of tasks that can be run on each JVM for a given job. 1 means no limit, that is, the same JVM can be used by all tasks of the job. The benefit of sharing a JVM is that it shares the state of the job's individual tasks, and the task can access the shared data more quickly by storing the relevant data in a static field. |
Summary of Hadoop tuning parameters