Summary of Hadoop tuning parameters

Last Update:2015-01-23 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Map-side Tuning parameters

Property name

Type

Default value

Description

Io.sort.mb

Int

100

The size of the memory buffer used when sorting the map output, in M. When the node memory is large, the parameter can be increased to reduce the number of disk writes.

Io.sort.record.percent

Float

0.05

Used as a scale for storing map output (IO.SORT.MB) records. The remaining space is used to store the map output record itself

Io.sort.spill.percent

Float

0.80

The map output starts writing disk thresholds.

Io.sort.factor

Int

The maximum number of streams that are merged at a time when the map output is sorted. This property is also used in reduce.

Min.num.spills.for.combine

Int

The minimum number of overflow files required to run combiner (when combiner is specified). The default is 3, which means that when the number of map spill is greater than or equal to 3 o'clock, the combine operation is performed on each spill before the merge operation of the map to reduce the number of files written to disk.

Mapred.compress.map.output

Boolean

False

Whether to compress the map output

Mapred.map.output.compression.codec

Class Name

Org.apache.hadoop.io.
Compress. Defaultcodec

For map output compression codecs

Tasktracker.http.threads

Int

The number of worker threads per tasktracker that is used to output the map to reducer. This property is a cluster-wide setting and cannot be set by a single job

Mapred.map.max.attempts

Int

After the map task fails, retry the execution count. The default value is 4, and if the task map task fails more than 4, the entire calculation task will fail.

Mapred.max.map.failures.percent

Maximum percentage of map task failures allowed without triggering job failure

Mapred.map.tasks.speculative.execution

Boolean

True

Whether to initiate the speculative execution of the map task. By default, Hadoop launches a new map instance when a map task is operating longer than the average map time.

reduce End tuning Parameters

Property name	Type	Default value	Description
Mapred.reduce.parallel.copies	Int	5	The number of threads used to copy the map output to reduce
Mapred.reduce.copy.backoff	Int	300	Before declaring a task to fail, reducer gets the maximum event, in seconds, that the map output spends. If the task fails, reducer can try to retransmit again within this time
Io.sort.factor	Int	10	The maximum number of merged streams at a time when a file is sorted. This property is also used on the map side.
Mapred.job.shuffle.input.buffer.percent	Float	0.70	In the Shuffle replication phase (copy), the buffer allocated to the map output represents the percentage of the reduce heap space.
Mapred.job.shuffle.merge.percent	Float	0.66	The threshold value of the map output buffer (defined by mapred.job.shuffle.input.buffer.percent) is used to initiate the merge output and the disk overflow write process.
Mapred.inmem.merge.threshold	Int	1000	The number of threshold values for the map output that initiated the merge output and the disk overflow write process. A number of 0 or smaller means there is no threshold limit, and overflow write behavior is controlled by mapred.job.shuffle.merge.percent alone
Mapred.job.reduce.input.buffer.percent	Float	0	In the reduce process, the amount of space stored in the map output in memory is proportional to the total heap space. When the reduce phase begins, the map output size in memory cannot be greater than this value. By default, all map outputs are merged onto disk before the reduce task starts to provide as much memory as possible for reducer. If the reducer requires less memory, you can increase the value to minimize the number of times the disk is accessed and increase computational efficiency.
Mapred.reduce.max.attempts	Int	4	After the reduce task fails, retry the execution count. The default value is 4, and the entire calculation task will fail if the task reduce is more than 4 failures.
Mapred.max.reduce.failures.percent			Maximum percentage of allow reduce task failure without triggering job failure
Mapred.reduce.tasks.speculative.execution	Boolean	True	Whether to start the speculative execution of the reduce task

Hadoop Global Tuning

Property name	Type	Default value	Description
Mapred.child.java.opts			The JVM memory size of the map or reduce task. If the setting is too small, it will error "Java Heap space"
Mapred.job.reuse.jvm.num.tasks	Int	1	On a tasktrakcer, the maximum number of tasks that can be run on each JVM for a given job. 1 means no limit, that is, the same JVM can be used by all tasks of the job. The benefit of sharing a JVM is that it shares the state of the job's individual tasks, and the task can access the shared data more quickly by storing the relevant data in a static field.

Summary of Hadoop tuning parameters

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Summary of Hadoop tuning parameters

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Summary of Hadoop tuning parameters

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support