Map task and reduce task scheduling parameters for MapReduce jobs

Source: Internet
Author: User

The MapReduce job can be subdivided into map task and reduce task, and mrappmaster the map task and reduce task into four states:

1, pending: Just start but have not sent to ResourceManager resource request;

2. Scheduled: Resource request has been sent to ResourceManager, but not allocated to resources;

3, assigned: has been assigned to the resources and is running;

4, completed: Has been run completed.

The life cycle of the map task is: scheduled, assigned, completed
Reduce task life cycle: Completed, assigned, pending, scheduled.

Because the implementation of the reduce task needs to rely on the output of the map task, to avoid the resource utilization caused by the reduce task premature start, Mrappmaster let the newly-started reduce be in the pending state so that it can be based on the map The operation of the task determines whether it is scheduled.

So how do you determine when the reduce task starts? Because yarn does not have the concept of the map slot and the reduce slot in Hadoop 1.x, and ResourceManager does not know the dependency between the map task and the reduce task, Therefore, mrappmaster itself needs to design a resource application strategy to prevent low resource utilization due to the premature start of the reduce task and the map task to starve to death due to resource allocation. Mrappmaster has added stricter resource control policies and preemption strategies based on the original strategy of MRV1 (a certain percentage of map task completion is allowed to start), and the following three parameters are mainly involved:

  mapreduce.job.reduce.slowstart.completedmaps: The English meaning is: fraction of the number of maps in thejob which should is complete before reduces is scheduled for the job. When the scale of map task completion reaches this value, the resource is requested for the reduce task, which defaults to 0.05.

  yarn.app.mapreduce.am.job.reduce.rampup.limit: Up to reduce task scale before map task completion, default is 0.5

  yarn.app.mapreduce.am.job.reduce.preemption.limit: When a map task requires resources but is temporarily unable to obtain resources (for example, when the reduce task is running, some map tasks are re-counted due to loss of results), to ensure that at least one map task can get resources, you can preempt the reduce task scale by default of 0.5

If the above three parameters set the unreasonable may appear the submitted job a large number of reduce is killed, this problem is actually the reduce task start time problem, because yarn does not have the concept of map slot and reduce slot, And ResourceManager does not know the dependencies between the map task and the reduce task, so mrappmaster itself needs to design a resource requisition policy to prevent premature booting of the reduce task as resource utilization is low and map The task starved to death for not allocating resources, and then through the preemption mechanism, a large number of reduce tasks were killed. The above three configuration parameters can be adjusted rationally to eliminate this situation.


Map task and reduce task schedule parameters for MapReduce jobs

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.