Map task and reduce task scheduling parameters for MapReduce jobs

Last Update:2016-08-14 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The MapReduce job can be subdivided into map task and reduce task, and mrappmaster the map task and reduce task into four states:

1, pending: Just start but have not sent to ResourceManager resource request;

2. Scheduled: Resource request has been sent to ResourceManager, but not allocated to resources;

3, assigned: has been assigned to the resources and is running;

4, completed: Has been run completed.

The life cycle of the map task is: scheduled, assigned, completed
Reduce task life cycle: Completed, assigned, pending, scheduled.

Because the implementation of the reduce task needs to rely on the output of the map task, to avoid the resource utilization caused by the reduce task premature start, Mrappmaster let the newly-started reduce be in the pending state so that it can be based on the map The operation of the task determines whether it is scheduled.

So how do you determine when the reduce task starts? Because yarn does not have the concept of the map slot and the reduce slot in Hadoop 1.x, and ResourceManager does not know the dependency between the map task and the reduce task, Therefore, mrappmaster itself needs to design a resource application strategy to prevent low resource utilization due to the premature start of the reduce task and the map task to starve to death due to resource allocation. Mrappmaster has added stricter resource control policies and preemption strategies based on the original strategy of MRV1 (a certain percentage of map task completion is allowed to start), and the following three parameters are mainly involved:

　　mapreduce.job.reduce.slowstart.completedmaps: The English meaning is: fraction of the number of maps in thejob which should is complete before reduces is scheduled for the job. When the scale of map task completion reaches this value, the resource is requested for the reduce task, which defaults to 0.05.

　　yarn.app.mapreduce.am.job.reduce.rampup.limit: Up to reduce task scale before map task completion, default is 0.5

　　yarn.app.mapreduce.am.job.reduce.preemption.limit: When a map task requires resources but is temporarily unable to obtain resources (for example, when the reduce task is running, some map tasks are re-counted due to loss of results), to ensure that at least one map task can get resources, you can preempt the reduce task scale by default of 0.5

If the above three parameters set the unreasonable may appear the submitted job a large number of reduce is killed, this problem is actually the reduce task start time problem, because yarn does not have the concept of map slot and reduce slot, And ResourceManager does not know the dependencies between the map task and the reduce task, so mrappmaster itself needs to design a resource requisition policy to prevent premature booting of the reduce task as resource utilization is low and map The task starved to death for not allocating resources, and then through the preemption mechanism, a large number of reduce tasks were killed. The above three configuration parameters can be adjusted rationally to eliminate this situation.

Map task and reduce task schedule parameters for MapReduce jobs

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Map task and reduce task scheduling parameters for MapReduce jobs

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Map task and reduce task scheduling parameters for MapReduce jobs

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support