Summary of Hadoop tuning parameters

Source: Internet
Author: User

Map-side Tuning parameters

Property name
Type Default value Description
Io.sort.mb Int 100 The size of the memory buffer used when sorting the map output, in M. When the node memory is large, the parameter can be increased to reduce the number of disk writes.
Io.sort.record.percent Float 0.05
Used as a scale for storing map output (IO.SORT.MB) records. The remaining space is used to store the map output record itself
Io.sort.spill.percent Float 0.80 The map output starts writing disk thresholds.
Io.sort.factor Int 10 The maximum number of streams that are merged at a time when the map output is sorted. This property is also used in reduce.
Min.num.spills.for.combine Int 3 The minimum number of overflow files required to run combiner (when combiner is specified). The default is 3, which means that when the number of map spill is greater than or equal to 3 o'clock, the combine operation is performed on each spill before the merge operation of the map to reduce the number of files written to disk.
Mapred.compress.map.output Boolean False Whether to compress the map output
Mapred.map.output.compression.codec Class Name Org.apache.hadoop.io.
Compress. Defaultcodec
For map output compression codecs
Tasktracker.http.threads Int 40 The number of worker threads per tasktracker that is used to output the map to reducer. This property is a cluster-wide setting and cannot be set by a single job
Mapred.map.max.attempts Int 4 After the map task fails, retry the execution count. The default value is 4, and if the task map task fails more than 4, the entire calculation task will fail.
Mapred.max.map.failures.percent Maximum percentage of map task failures allowed without triggering job failure
Mapred.map.tasks.speculative.execution Boolean True Whether to initiate the speculative execution of the map task. By default, Hadoop launches a new map instance when a map task is operating longer than the average map time.

reduce End tuning Parameters
Property name Type Default value Description
Mapred.reduce.parallel.copies Int 5 The number of threads used to copy the map output to reduce
Mapred.reduce.copy.backoff Int 300 Before declaring a task to fail, reducer gets the maximum event, in seconds, that the map output spends.
If the task fails, reducer can try to retransmit again within this time
Io.sort.factor Int 10 The maximum number of merged streams at a time when a file is sorted. This property is also used on the map side.
Mapred.job.shuffle.input.buffer.percent Float 0.70 In the Shuffle replication phase (copy), the buffer allocated to the map output represents the percentage of the reduce heap space.
Mapred.job.shuffle.merge.percent Float 0.66 The threshold value of the map output buffer (defined by mapred.job.shuffle.input.buffer.percent) is used to initiate the merge output and the disk overflow write process.
Mapred.inmem.merge.threshold Int 1000 The number of threshold values for the map output that initiated the merge output and the disk overflow write process. A number of 0 or smaller means there is no threshold limit, and overflow write behavior is controlled by mapred.job.shuffle.merge.percent alone
Mapred.job.reduce.input.buffer.percent Float 0 In the reduce process, the amount of space stored in the map output in memory is proportional to the total heap space.
When the reduce phase begins, the map output size in memory cannot be greater than this value. By default, all map outputs are merged onto disk before the reduce task starts to provide as much memory as possible for reducer. If the reducer requires less memory, you can increase the value to minimize the number of times the disk is accessed and increase computational efficiency.
Mapred.reduce.max.attempts Int 4 After the reduce task fails, retry the execution count. The default value is 4, and the entire calculation task will fail if the task reduce is more than 4 failures.
Mapred.max.reduce.failures.percent Maximum percentage of allow reduce task failure without triggering job failure
Mapred.reduce.tasks.speculative.execution Boolean True Whether to start the speculative execution of the reduce task

Hadoop Global Tuning
Property name Type Default value Description
Mapred.child.java.opts The JVM memory size of the map or reduce task. If the setting is too small, it will error "Java Heap space"
Mapred.job.reuse.jvm.num.tasks Int 1 On a tasktrakcer, the maximum number of tasks that can be run on each JVM for a given job. 1 means no limit, that is, the same JVM can be used by all tasks of the job. The benefit of sharing a JVM is that it shares the state of the job's individual tasks, and the task can access the shared data more quickly by storing the relevant data in a static field.


Summary of Hadoop tuning parameters

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.