Hadoop MapReduce-Tuning from job, task, and administrator perspective

Last Update:2016-02-22 Source: Internet

Author: User

Tags hadoop mapreduce

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Hadoop provides a variety of configurable parameters for user jobs to allow the user to adjust these parameter values according to the job characteristics to optimize the operational efficiency.

an application authoring specification
1. Set Combiner
For a large number of MapReduce programs, if you can set a combiner, it is very helpful to improve the performance of the job.Combiner reduces the result of the Map task intermediate output, reducing the amount of remote copy data for each reduce task, resulting in a reduction in the execution time of map task and reduce task.

2. Choose a reasonable writable type
In the MapReduce model, both the input and output types of the MAP task and the reduce task are writable. Hadoop itself has provided many writable implementations, including Intwritable, floatwritable. Choosing the right type of writable for the data processed by your application can greatly improve performance. For example, when working with integer type data, it is efficient to use the intwritable ratio to read in the text type before converting to an integer type.If the majority of the output integers can be saved in one or two bytes, then directly using vintwritable or vlongwritable, they use the variable-length integer encoding method, which can greatly reduce the amount of output data.

two job-level parameter tuning

1. Planning a reasonable number of tasks
In Hadoop, each map task processes an input Split. The partitioning of Input split is determined by the user-defined InputFormat, which, by default, is determined by the following three parameters.
Mapred.min.split.size:Input Split Minimum default value of 1
Maximum value of Mapred.max.split.szie:Input split
A block size default value of 64MB in Dfs.block.size:HDFS
Golsize: It is the user's expected input split number =totalsize/numsplits, where totalsize is the total size of the file, numsplits the number of map tasks set for the user, which by default is 1.
splitsize = max{minsize,min{goalsize,blocksize}} If you want to make the inputsize size larger than the block size, directly increase the configuration parameters mpared.min.split.size.

2. Increase the number of copies of the input file
If a job performs a large number of tasks in parallel, the common input files for these tasks can become bottlenecks. To prevent multiple tasks from reading a file content in parallel, the user can increase the number of copies of the input file as needed.

3. Start the speculative execution mechanism
Speculative execution is an optimization mechanism for Hadoop's "drag-dragging" task, when some tasks of one job run significantly slower than other tasks in the same job, Hadoop initiates a backup task for "slow task" on another node, so that two tasks process a single piece of data at the same time. Hadoop will eventually end up with the results of the priority task and kill the other task.

4. Setting the tolerance of failure
Hadoop runs the set task level and job level failure tolerance. Job-level failure tolerance refers to the failure of Hadoop to allow a certain percentage of tasks to run for each job, and the input data for that task will be ignored;
Task-level failure tolerance is when Hadoop allows a task to fail and then tries to run on another node, and if a task continues to fail after several attempts, then Hadoop will eventually assume that the task failed to run.
Users should set reasonable failure tolerance according to the characteristics of the application, so as to make the job run quickly and avoid unnecessary waste of resources.

5. Open JVM reuse function appropriately
To achieve task isolation, Hadoop executes each task in a single JVM, and for short-running tasks, the JVM starts and shuts down for a significant amount of time, enabling the user to enable the JVM reuse feature so that a JVM can start multiple tasks of the same type consecutively.

6. Set the task time-out
If a task does not report progress within a certain amount of time, Tasktracker will actively kill it to restart execution on the other node. The user can configure the task time-out based on actual needs.

7. Reasonable use of Distributedcache
In general, there are two ways to get an external file: an external file is placed with the application jar package to the client, and is uploaded by the client to a directory in HDFs when the job is submitted, and then distributed to each node via the distributed cache Another method is to put the external files directly on HDFs in advance, and the second approach is more efficient in terms of efficiency. The second method not only saves the client time of uploading files, but also implies telling Distributedcache: "Please download the files to the pubic level shared directory of each node" so that all subsequent jobs can reuse the downloaded files without having to repeat the download.

8. Skipping Bad Records
Hadoop provides users with the ability to skip bad records, and Hadoop can automatically recognize and skip these bad records when one or several bad data records cause the task to fail.

9. Improve Job Priority
Job prioritization is factored into all Hadoop job scheduler tasks. The higher the job's priority, the more resources it can fetch (the number of slots). Hadoop offers 5 job priorities, Very_high, High, NORMAL, Low, Very_low, respectively.
Note: In the production environment, the administrator has graded the job according to the importance of the job, the different important degree of the job allows the configuration of different priorities, users can adjust without authorization.

10. Reasonable control of the start time of the reduce task
If the reduce task is started prematurely, it may reduce resource utilization by causing the "slot hoarding" phenomenon caused by the reduce task to occupy the reduce slot resource for long periods of time, and conversely, if the reduce task is started too late, it will cause the reduce Task gets the resource delay, increasing the run time of the job.

three Task level parameters tuning

The Hadoop task-level parameter tuning is divided into two areas: Map task and reduce task.

1.Map Task Tuning
Map run stage is divided into: Read, map, Collect, spill, merge five stages.
The map task execution produces intermediate data, but these intermediate results are not directly IO to disk, but are stored in cache (buffer) and are pre-sequenced in the cache to optimize the performance of the entire map.the default cache size for storing map intermediate data is 100M, specified by the IO.SORT.MB parameter。 This size can be adjusted as needed. When the map task produces very large intermediate data can be appropriately adjusted to the parameter, so that the cache can accommodate more map intermediate data, rather than the large frequency of the IO disk, when the system performance bottleneck in the speed of disk IO, can be appropriately increased this parameter to reduce the performance of frequent IO obstacles.
since the map task Runtime intermediate results are first stored in the cache, the default is to write to the disk when the cache usage reaches 80% (or 0.8), a process called spill (also called overflow), The cache size for spill can be adjusted by the Io.sort.spill.percent parameter,This parameter can affect the frequency of the spill. This can then affect the frequency of the IO.

When the map task is successfully completed, multiple spill are generated if the map task has output. Next the map must merge some spill, this process is called the merge, the merge process is the parallel processing spill, each parallel how many spill is by the parameter io.sort.factor specifies the default to be 10. However, when the number of spill is very large, the merge is still running in parallel with a spill of 10, so the IO processing will still be frequent, so the appropriate size of spill per parallel processing will help reduce the number of merge and thus affect map performance.

Compression can also be configured when the map outputs intermediate results.

2. Reduce task Tuning

The reduce run phase is divided into Shuflle (copy) merge sort reduce write five stages.

The shuffle phase is the intermediate result of reduce's complete copy of the map task after the successful end, and if the map task above is compressed, then reduce copies the intermediate results of the map task to the first decompression, which is done in the cache of reduce, It also takes up a portion of the CPU. To optimize reduce execution time, reduce does not wait until all of the map data is copied to start the reduce task, but when the job finishes executing the first map task, it starts running. Reduce in the shuffle phase is actually from a different and completed map to download their own data, due to the number of map tasks, all the copy process is parallel, there are a number of reduce copy map, this parallel thread is throughmapred.reduce.parallel.copiesparameter specifies that the default is 5, which means that, by default, only 5 reduce threads will be able to copy the execution results of the map task at a time, regardless of the number of tasks in the map. So when the number of map tasks can be adjusted appropriately, it allows reduce to quickly get the running data to complete the task.

The reduce thread can also download the map data for a variety of reasons (network reason, system reason, etc.), the datannode where the map data is stored fails, in which case the reduce task will not be able to get the data on the Datanode, and the Download thread tries to download from another datanode, and can adjust the download time of the download thread by Mapred.reduce.copy.backoff (by default, 30 seconds), if the network bad cluster can increase the download time by increasing the value of the parameter. In case the download time is too long, reduce the thread to determine that the download failed.

Reduce download thread when the map results are downloaded locally, because it is a multi-threaded parallel download, it is also necessary to merge the downloaded data, so the map phase setting Io.sort.factor will also affect the reduce.

As with map, the buffer size is not written to disk until it is fully occupied, but by default the write disk operation is started when the 0.66 is completed, as specified by Mapred.job.shuffle.merge.percent.

When reduce starts the calculation, it uses mapred.job.reduce.input.buffer.percent to specify how much memory percentage is needed as a buffer percentage of reduce to read the sort good data, which defaults to 0. Hadoop assumes that the user's reduce () function requires all JVM memory, so all memory is freed before the reduce () function is executed. If this value is set, some files can be saved in memory (not written to disk).

In summary, one of the principles of the MAP task and reduce task tuning is to reduce the amount of data transferred, use memory as much as possible, reduce the number of disk IO, and increase the number of task parallelism, in addition to the actual situation of their own cluster and network tuning.

three admin angle tuning

Administrators are responsible for providing an efficient operating environment for user jobs. The administrator needs to start from the global and improve the throughput and performance of the system by adjusting some key parameters. On the whole, administrators need to start with four angles: hardware selection, operating system parameter tuning, JVM parameter tuning, and Hadoop parameter tuning, providing Hadoop users with an efficient operating environment.

Hardware Selection
The basic features of Hadoop's own architecture determine the options for its hardware configuration. Hadoop uses the Master/slave architecture, where master maintains global metadata information that is much more important than slave. In the lower Hadoop version, Master has a single point of failure problem, so the master configuration should be much better than the individual slave.

tuning of operating system parameters

1. Increase the file descriptor and network connection limit for simultaneous opening
Use the Ulimit command to increase the maximum number of file descriptors allowed to open at the same time to an appropriate value. Also adjust the kernel parameters net.core.somaxconn the number of network connections to a value that is large enough.

Supplement: The Role of Net.core.somaxconn
Net.core.somaxconn is a kernel parameter in Linux that represents the backlog limit for the socket listener (listen). What is a backlog? The backlog is the listener queue for the socket, which goes into the backlog when a request has not been processed or established. The socket server can process all requests in the backlog at once, and the processed requests are no longer in the listening queue. When the server processes the request so slowly that the listening queue is filled, the new request is rejected. In Hadoop 1.0, the parameter Ipc.server.listen.queue.size controls the length of the listener queue for the server socket, which is the backlog length, and the default value is 128. The default value for Linux parameter net.core.somaxconn is also 128. When the server is busy, such as Namenode or jobtracker,128 is far from enough. This will require you to increase the backlog, For example, our 3000 clusters set the ipc.server.listen.queue.size to 32768, so that the kernel parameter net.core.somaxconn should be set to a value greater than or equal to 32768 for the entire parameter to achieve the desired effect.

2. Close the swap partition
Avoid using swap partitions to provide efficient execution of your programs.
In addition, set reasonable pre-read buffer size, file system selection and configuration and I/O Scheduler selection, etc.

Tuning JVM Parameters
Because every service and task in Hadoop runs in a separate JVM, some of the key parameters of the JVM can also affect Hadoop performance. Administrators can improve Hadoop performance by adjusting the JVM flags and JVM garbage collection mechanisms.

Hadoop parameter Tuning
1. Rational planning of resources

set a reasonable number of slots
In Hadoop, compute resources are represented by slots. Slots are divided into two types: the Map slot and the reduce slot. Each slot represents a certain amount of resources, and the same slot is homogeneous, that is, the same type of slot represents the same amount of resources. The administrator needs to configure a certain number of map slots and reduce slots for tasktracker as needed to limit the number of map task and reduce tasks that are executed concurrently on each tasktracker.

Write a health monitoring script
Hadoop allows administrators to configure a node health monitoring script for each tasktracker. Tasktracker contains a dedicated thread that periodically executes the script and reports the results of the script execution to jobtracker through the heartbeat mechanism. Once Jobtracker discovers that the current state of a tasktracker is unhealthy, it is blacklisted and is no longer assigned a task.

2. Adjust Heartbeat configuration
Adjust the heartbeat interval to adjust the heartbeat interval according to the size of your cluster
Enable out-of-band heartbeat in order to reduce task allocation latency, Hadoop introduces an out-of-band Heartbeat. The out-of-band Heartbeat is different from the normal heartbeat, which is triggered when the task runs at the end or the task fails, enabling the Jobtracker to be notified the first time the idle resource is available so that it can quickly assign new tasks to the idle resource.

In addition, it includes disk block configuration, reasonable number of RPC handler and HTTP threads, careful blacklist mechanism, enabling batch task scheduling, selecting appropriate compression algorithm, enabling pre-read mechanism, etc.
Note: When the size of a cluster is small, if a certain number of nodes are frequently added to the system blacklist, it will greatly reduce the throughput rate and computing power of the cluster.

Four Summary
Hadoop Performance Tuning is an engineering effort that involves not only the performance tuning of Hadoop itself, but also the tuning of systems such as lower-level hardware, operating systems, and Java virtual machines.
Overall, improving operational efficiency requires the efforts of both the Hadoop administrator and the job owner, where the Administrator is responsible for providing an efficient job environment for the user, while the user makes it as fast as possible based on the characteristics of his job.

Hadoop MapReduce-Tuning from job, task, and administrator perspective

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More