Hadoop parameter optimization

Last Update:2015-05-11 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Dfs.block.size

Determine the number of blocks in the HDFs file (number of files), it will indirectly affect the job tracker scheduling and memory consumption (more affect the use of memory),

Mapred.map.tasks.speculative.execution=true

Mapred.reduce.tasks.speculative.execution=true

This is a two speculative execution of the configuration item, which is true by default

The so-called speculative implementation is that when all tasks start running, Job tracker will count the average progress of all missions, if the task node machine where a task is located with

Lower or CPU load is high (for many reasons), resulting in a task execution that is slower than the average execution of the overall task, when job tracker initiates a new task

(duplicate Task), the original task and the new task which will kill the other one first, which we often see on the Job Tracker page that the task executes successfully, but

There are always some tasks to kill, that's why.

Mapred.child.java.opts

In general, it is easy to reduce memory consumption, this option is used to set the JVM heap maximum available memory, but do not set too large if more than 2G (this is the number has

To be verified), you should consider the optimization program.

The size of the input split determines how many maps a job has, the default 64M per split, and if the amount of data entered is huge, the default 64M block will have a special

Not many map Task, the network transmission of the cluster will be very large, to the job tracker dispatch, queue, memory will bring great pressure.

Mapred.min.split.size

This configuration determines the minimum value of each input Split, and indirectly determines the number of maps for a job.

The size of the HDFS block is determined by the job write-in, and the size of the Shard is determined by three elements (the one that goes to the maximum in 3)

(1) Number of input blocks (2) Mapred.min.split.size (3) job.setnummaptasks ()

Mapred.compress.map.output

Compressing the output of a map has two benefits:
A) compression is done in memory, so data written to the map local disk becomes smaller, greatly reducing the number of local IO
b) Reduce the time it takes to copy data from each map node and significantly reduce network transmission
Note: Data serialization actually works better, either disk IO or data size, which can be significantly reduced.

Io.sort.mb
In megabytes, the default 100M, this value is smaller
When the map node is not running, there is too much data in memory to write the contents of memory to the wash disk, this setting is the size of the memory buffer, before Suffle
This option defines the size of the map output to occupy buffer in memory, and when buffer reaches a certain threshold (which is configured later), a background thread is started to buffer

And then write to the local disk (a spill file)

According to the size of the map output data, you can properly adjust the size of the buffer, note that the appropriate adjustment, not the bigger the better, assuming that the memory is infinite,

io.sort.mb=1024 (1G), and io.sort.mb=300 (300M), the former may not be faster than the latter:
(1) 1G of data sorted once
(2) Sort 3 times, 300MB each time
Must be the latter fast (merge sort)

Io.sort.spill.percent
This value is the threshold of the buffer mentioned above, the default is 0.8, both 80%, when the data in buffer reaches this threshold, the background thread will rise to the number already in buffer

The map output continues to write data to the remaining 20% buffer, and if the remaining 20% of the buffer is full, the sort is not finished,

Map task is waiting for block.
If you confirm that the data of the map output is basically ordered, the sequencing time is very short, you can adjust the threshold appropriately, and ideally, if your map output is ordered data,

You can set the buffer to a larger threshold value of 1.

Io.sort.factor
The number of file handles opened at the same time, default is 10
When a map task finishes executing, there are several spill files on the local disk (Mapred.local.dir), and the last thing the map task does is to execute the merge sort,

Synthesize these spill files into a file (Partition,combine phase).
When you execute the merge sort, it is up to io.sort.factor to open as many spill files at a time. The more files you open, the more you don't necessarily merge sort

Fast, but also according to the data situation appropriate adjustment.
Note: The result of the merge sort is two files, one is index and the other is a data file, and the index file records the offset of each different key in the data file (that is, partition).
On the map node, if you find that the child node of the map is heavier than the machine IO, the reason may be io.sort.factor This setting is relatively small, io.sort.factor set small

Words, if the spill file is more, merge into a file for a lot of read operations, which increases the load of IO. IO.SORT.MB is small, it also increases the load of IO.
If the execution of the Combine is set, combine only in the merge, adding a step, will not change the merge process, so combine will not reduce

or increase the number of files. There is also a min.num.spills.for.combine parameter, which indicates that when a merge operation is performed, if the number of input files is less than this number,

Do not call combiner. If the combiner is set, it will be called when the spill file is written, so that when the merge is called, the combine will be executed two times.

Improve the execution efficiency of reduce, in addition to the optimization of the Hadoop framework, the focus is on the code logic optimization. For example, the value of reduce is likely to be heavily

Complex, at this time if the use of Java set or STL set to achieve the purpose of deduplication, then the program is not well-extended (non-scalable), limited by the amount of data,

When data expands, memory is bound to overflow

Mapred.reduce.parallel.copies
The number of threads for the Reduce copy data, the default value is 5
Reduce to each completed map Task copy data (via RPC call), by default simultaneously initiates 5 threads to the map node to fetch data. This configuration is still critical,

If your map output is large, you can sometimes find that the map is already 100%, and reduce is slowly changing, that is, the copy data is too slow, such as 5 threads

Copy 10G data, it will be very slow, it is necessary to adjust this parameter, but the adjustment is too large, easy to cause the cluster congestion, so Job tuning at the same time, is a right

Process, be familiar with the data used!
Mapred.job.shuffle.input.buffer.percent
When the maximum heap memory value is specified for the JVM, the above configuration item is the ratio of the memory Jian memory used by reduce to hold the data taken from the map node, and the default

0.7, which is 70%, usually this ratio is enough, but for big data, the ratio is smaller, and 0.8-0.9 is more appropriate. (Provided that your

The reduce function does not eat the memory crazily)
Mapred.job.shuffle.merge.percent (default value 0.66)
Mapred.inmem.merge.threshold (default value 1000)
The first is to fetch data from the map node and put it into memory, and when this threshold is reached, the background boot thread (usually the Linux native process) takes the memory

Data merge sort, written to the local disk of the reduce node;
The second refers to the number of files taken from the map node, and when this number is reached, the merger sort is then written to the local disk of the reduce node;

Configuration items The first priority, followed by a second thresh-hold.
From the actual experience, mapred.job.shuffle.merge.percent default value is small, can be set to about 0.8; the second default of 1000, depends entirely on

Map output data size, if the map output data is very large, the default value of 1000 is not good, should be smaller, if the map output data is not small (light

Weight), can be set to 2000 or more.
mapred.reduce.slowstart.completed.maps (when map completes, start shuffle)

When the map is running slowly and reduce is running quickly, if you do not set Mapred.reduce.slowstart.completed.maps, the shuffle time of the job will be very long,

When the map was running, reduce was started early, causing the slot for reduce to remain in the occupied state. Mapred.reduce.slowstart.completed.maps This value is

And "run out of map number divided by total map number" to determine, when the latter is greater than or equal to the set value, start reduce shuffle. So when map is done with reduce

When a lot more time, you can adjust this value (0.75,0.80,0.85 and above, the default value is 0.05)

The following describes the function of each parameter from the process:
When the map task begins to operate and produces intermediate data, the resulting intermediate result is not simply written to disk. This process is more complex and takes advantage of the

Memory buffer to cache some of the results that have been generated and to perform some pre-ordering in memory buffer to optimize the performance of the entire map. Each map corresponds to a stored

In a memory buffer (Mapoutputbuffer), map will write some results that have already been generated to the buffer, the buffer is 100MB size by default, but

This is the size that can be adjusted according to the parameter setting of the job commit, which is: IO.SORT.MB. When the data generated by the map is very large, and the IO.SORT.MB

Spill, the number of times the map will be reduced in the entire calculation process, the map task will be less action on the disk, if the map tasks bottleneck on the disk,

This adjustment will greatly improve the computing performance of the map.
The map does not stop writing the existing calculations to the buffer while it is running, but the buffer does not necessarily cache the entire map output when the map output

Beyond a certain threshold (such as 100M), map must write the data in the buffer to disk, a process called spill in MapReduce. Map and

It is not necessary to wait until the buffer is fully full before spill, because if all is full and then to write spill, it is bound to cause the calculation of the map portion of the buffer to free space

Case So, the map is actually when buffer is written to a certain extent (such as 80%), it starts to spill. This threshold is also controlled by a job configuration parameter.

Io.sort.spill.percent, which defaults to 0.80 or 80%. This parameter also affects the frequency of spill, which in turn affects the reading and writing of the map task run cycle to the disk

Frequency of. But in non-special cases, no artificial adjustments are usually required. Adjusting the IO.SORT.MB is more convenient for users.
When the compute portion of the map task is complete, if the map has output, it generates one or more spill files, which are the results of the map's output. Map in the positive

Before exiting, these spill need to be merged (merge) into one, so the map has a merge process before the end. In the process of merge, there is a parameter

You can adjust the behavior of this procedure, which is: Io.sort.factor. This parameter defaults to 10. It represents the maximum number of parallel streams that can be in a merge spill file.

Writes to the merge file. For example, if the data generated by map is very large, the resulting spill file is greater than 10, and Io.sort.factor uses the default of 10, then when

When the map calculation is complete to merge, there is no way to merge all the spill files one at a time, but it will be divided multiple times, up to 10 streams at a time. This means that

When the middle result of the map is very large, the io.sort.factor is helpful to reduce the number of merges, and thus reduce the frequency of the map to read and write to the disk, it is possible to achieve the optimized operation

Objective.
When the job specifies the combiner, we all know that the map will merge the map results at the map end based on the functions defined by combiner. Run combiner

The timing of the function is likely to be before the merge is complete, or after which the timing can be controlled by a parameter, Min.num.spill.for.combine (default 3),

When Combiner is set in the job and the number of spill is at least 3, the Combiner function runs before the merge produces a result file. Through such a party

Spill, and a lot of data needs to be conbine, reducing the amount of data written to disk files, in order to reduce

The frequency of disk read and write, it is possible to achieve the purpose of optimizing the operation.
Fewer intermediate results read and write to and from the disk, and there is compression. In other words, the middle of the map, whether it's spill, or the last merge-generated knot

Files can be compressed. The advantage of compression is that the amount of data written to the read disk is reduced by compression. Very large for intermediate results, disk speed becomes map execution bottle

The job of the neck is especially useful. The parameter that controls whether the map intermediate result uses compression is: Mapred.compress.map.output (true/false). Set this parameter to

True, when the map writes the intermediate result, the data is compressed and then written to the disk, and the result is read with the first decompression. The consequence of this is: write

The intermediate result data in the disk will be less, but the CPU will consume some to compress and decompress. So this method is usually suitable for job intermediate results are very large, the bottleneck is not

CPU, but in the case of disk reads and writes. The straightforward thing to say is to use the CPU to Exchange IO. According to observation, most of the work CPU is usually not a bottleneck unless the arithmetic logic is abnormal

Complex. Therefore, it is usually profitable to use compression for intermediate results.
When using the Map intermediate result compression, the user can also choose which compression format to compress when compressing, and now the compression format supported by Hadoop is:

Gzipcodec,lzocodec,bzip2codec,lzmacodec and other compression formats. In general, to achieve a more balanced CPU and disk compression ratio, LZOCODEC

More suitable. But it also depends on the specifics of the job. If you want to choose a compression algorithm for intermediate results, you can set the configuration parameters:

Mapred.map.output.compression.codec=org.apache.hadoop.io.compress.defaultcodec or other user-selectable compression methods.

Hadoop parameter optimization

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hadoop parameter optimization

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support