The whole process of task scheduling in the MapReduce job of yarn source analysis (I.)

Source: Internet
Author: User

In the V2 version of the MapReduce job, the job Job_setup_completed event occurs, that is, the job setup phase completion event, which triggers the job to transition from the setup state to the running state, while the job state transitions involve processing of job information, It was done by Setupcompletedtransition, and it did four main things:

1. By setting the member variable of job setupprogress to 1, the tagging job setup is completed;

2. Map Task for dispatching job;

3, the Jobreduce task of dispatching operation;

4. If there is no task, the Job_completed event is generated and processed by the job's event handler EventHandler.

In this article, we will examine how tasks in job jobs are dispatched.


First, take a look at the code of the Transition () method in setupcompletedtransition about task scheduling in job jobs, as follows:

      The map Task      job.scheduletasks (job.maptasks, job.numreducetasks = = 0) of the job assignment is dispatched;      Reduce task      job.scheduletasks (Job.reducetasks, true) for scheduling job jobs;
It is actually done through the job, which is Jobimpl's Scheduletasks (), which requires two parameters, the first is the task ID set taskids for the job task to be dispatched, The second parameter is the flag bit recovertaskoutput that indicates whether the output of the task is resumed, and the reduce task for the map task and all types of jobs in the map-only job needs to be restored, the flag bit recovertaskoutput is true, The specific code is as follows:
 protected void Scheduletasks (Set<taskid> taskids, Boolean recovertaskoutput) {//traverse each taskids in the incoming task collection TaskId, do the following for TaskId Rationale: For (TaskId taskid:taskids) {//removes the corresponding element from the collection Completedtasksfrompreviousrun according to TaskId and obtains the removed element TaskInfo instance Taski            NFO taskinfo taskinfo = Completedtasksfrompreviousrun.remove (TaskID); if (taskinfo! = null) {//If there are taskid corresponding task information TaskInfo instance TaskInfo//construct T_recover Type task recovery event taskrecoverevent, give EventHandler Processing, the flag bit recovertaskoutput indicates whether to restore the output of the task,//for the Map-only map task and all the reduce tasks need to be restored, the flag bit recovertaskoutput is true EventHandler      . Handle (new Taskrecoverevent (TaskID, TaskInfo, Committer, recovertaskoutput)); } else {//otherwise, construct the T_schedule type task dispatch event taskevent to EventHandler processing eventhandler.handle (new Taskevent (TaskID, Ta      Skeventtype.t_schedule)); }    }  }
The Scheduletasks () method iterates through each TaskID instance TaskID in the incoming task collection Taskids, making the following processing for TaskID:

1. Remove the corresponding element from the set Completedtasksfrompreviousrun according to TaskID, and obtain the TaskInfo instance taskinfo of the removed element;

2, if there is taskid corresponding task information TaskInfo instance TaskInfo, constructs the T_recover Type task recovery event taskrecoverevent, gives EventHandler processing, The flag bit recovertaskoutput indicates whether the output of the task is resumed, the Map-only map task and all the reduce tasks need to be restored, and the flag bit recovertaskoutput is true;

3, otherwise, constructs T_schedule type task dispatch event Taskevent, gives EventHandler processing.

Let's look at the processing of the T_schedule type task scheduling event taskevent, which is handled by the job EventHandler. And this EventHandler is assigned by the dispatcher of Mrappmaster when the job is created (that is, when the Jobimpl instance is constructed), and in Mrappmaster, Dispatcher is created, the processor Taskeventdispatcher instance of the task event is registered, and the code is as follows:

Dispatcher.register (Taskeventtype.class, New Taskeventdispatcher ());
The handle () method that handles the task event taskevent in this task event handler Taskeventdispatcher is defined as follows:

  Private class Taskeventdispatcher implements Eventhandler<taskevent> {    @SuppressWarnings ("unchecked")    @Override public    void handle (Taskevent event) {      Task task = Context.getjob (Event.gettaskid (). Getjobid ()) . Gettask (          event.gettaskid ());      ((eventhandler<taskevent>) Task). handle (event);    }  }
It is actually handled by the handle () method of the task tasks in the job, and this task is implemented by Taskimpl, where the processing of various task events is similar to job jobs, which is handled by a task's state machine. As for the state machine of task tasks, we will have a special article to introduce, here, you only need to know in Taskimpl, for the above two task state machine in the task status of the transition, trigger event and event handler defined as follows:
 private Static final Statemachinefactory <taskimpl, taskstateinternal, Taskeventtype, taskevent> s               Tatemachinefactory = new Statemachinefactory<taskimpl, taskstateinternal, Taskeventtype, TaskEvent>                   (taskstateinternal.new)//Omit part of the code. Addtransition (Taskstateinternal.new, taskstateinternal.scheduled,                   Taskeventtype.t_schedule, New Initialscheduletransition ())//Omit part of the code. Addtransition (Taskstateinternal.new, Enumset.of (taskstateinternal.failed, taskstateinternal.killed, Taskstatei Nternal. RUNNING, taskstateinternal.succeeded), Taskeventtype.t_recover, New Recovertransition ( )//Omit part of the code 
Thus, for the T_recover Type task recovery Event Taskrecoverevent,task state machine is specified by Recovertransition, and the status of task tasks is converted from new to running, FAILED, Killed, succeeded, etc., and for T_schedule type task scheduling event taskevent, the task state machine is designated as initialscheduletransition processing, and the status of task tasks is converted from new to scheduled. Below, we analyze each of them.


One, T_schedule type task scheduling event taskevent

Handled by initialscheduletransition, the status of task tasks is converted from new to Scheduled,initialscheduletransition code as follows:

  private static class Initialscheduletransition    implements Singlearctransition<taskimpl, taskevent> {    @ Override public    void Transition (Taskimpl task, taskevent event) {          //Add and Schedule Task run attempt taskattempt, Avataar.virgin indicates that it is the first attempt,      //And the remaining avataar.speculative indicates that it is a attempt for a dragged task, that is, the principle      of inference Task.addandscheduleattempt (avataar.virgin);      Set the scheduled time of the task Scheduledtime to the current time      task.scheduledtime = Task.clock.getTime ();      Send task Start Event      task.sendtaskstartedevent ();    }  }
The processing logic of initialscheduletransition is relatively simple, and is broadly as follows:

1, call Addandscheduleattempt () method, add and dispatch the task to run the attempt taskattempt, Avataar.virgin says it is the first attempt, while the remaining avataar.speculative indicates that it is a attempt for a dragged task, that is, the principle of speculation;

2, set the task scheduling time scheduledtime to the current time;

3. Send task Start event.

Of these, the Addandscheduleattempt () method in 1 is implemented as follows:

//This is Always called in the Write Lock private void Addandscheduleattempt (Avataar avataar) {//Call Addattempt () method to create a task to run an attempt task        The attempt instance attempt,//and adds it to the attempt collection attempts, and also sets attempt Avataar properties Taskattempt attempt = addattempt (Avataar);        Add the ID of the attempt to the attempt collection inprogressattempts that is being executed Inprogressattempts.add (Attempt.getid ());    Schedule the Nextattemptnumber//dispatch taskattempt//If the collection failedattempts size is greater than 0, indicating that the task has taskattempt failed before, this time for rescheduling, The Taskattemp event type is Ta_reschedule, if (failedattempts.size () > 0) {eventhandler.handle (New taskattemptevent (at    Tempt.getid (), taskattempteventtype.ta_reschedule)); } else {//Otherwise the Taskattemp event type is Ta_schedule eventhandler.handle (new Taskattemptevent (Attempt.getid (), Tas    Kattempteventtype.ta_schedule)); }  }
The Addandscheduleattempt () method processing logic is as follows:

1, call the Addattempt () method, create a task run attempt taskattempt instance attempt, and add it to the attempt collection attempts, also set attempt Avataar property;

2. Add the ID of the attempt to the attempt collection inprogressattempts being executed;

3, Dispatch taskattempt: If the collection failedattempts size is greater than 0, indicating that the task has taskattempt failed before, this time for rescheduling, taskattemp event type Ta_reschedule, Otherwise, the Taskattemp event type is ta_schedule.

The Addattempt () method is implemented as follows:

  Private Taskattemptimpl addattempt (Avataar avataar) {//Call Createattempt () method to create a task run attempt Taskattemptimpl instance attempt Taskatt        Emptimpl attempt = createattempt ();        Set the Avataar property of the attempt Attempt.setavataar (Avataar); Record Debug level log information: Created attempt ... if (log.isdebugenabled ()) {Log.debug ("Created attempt" + Attempt.getid (    )); }//The task to be created runs an attempt to Taskattemptimpl instance attempt the corresponding relationship with its ID to Taskimpl task run attempt collection attempts,//attempts first initialized to Collections.empt    YMAP ()//this.attempts = Collections.emptymap (); Switch (attempts.size ()) {case 0://If the attempts size is 0, which is collections.emptymap (), replace it with Collections.singletonmap (), and join the Taskattemptimpl instance attempt attempts = Collections.singletonmap (Attempt.getid (), (taskattempt) Attem        PT);              Break        Case 1://If the attempts size is 1, which is Collections.singletonmap (), replace it with Linkedhashmap and join the previous and current Taskattemptimpl instances attempt Map<taskattemptid, taskattempt> newattempts = NEW Linkedhashmap<taskattemptid, Taskattempt> (maxattempts);        Newattempts.putall (attempts);        attempts = newattempts;        Attempts.put (Attempt.getid (), attempt);      Break        Default://If the attempts size is greater than 1, the description is actually a linkedhashmap, put it directly attempts.put (Attempt.getid (), attempt);    Break        }//Accumulate taskattempt counter nextattemptnumber ++nextattemptnumber;  Returns TASKATTEMPTIMPL instance attempt return attempt; }
The processing logic is as follows:

1, call Createattempt () method to create the task run attempt Taskattemptimpl instance attempt;

2, set the Avataar attribute of attempt;

3, record debug level log information: Created attempt ... ;

4. The task that will be created runs an attempt to taskattemptimpl the corresponding relationship of the instance attempt with its ID to the Taskimpl task run attempt collection attempts, attempts is first initialized to Collections.emptymap ( ):

4.1, if the attempts size is 0, that is Collections.emptymap (), then replace it with Collections.singletonmap (), and join the Taskattemptimpl instance attempt;

4.2, if the attempts size is 1, that is Collections.singletonmap (), then replace it with Linkedhashmap, and join the previous and present Taskattemptimpl instance attempt;

4.3, if the attempts size is greater than 1, the explanation is actually a linkedhashmap, put it directly;

5, accumulate taskattempt counter nextattemptnumber;

6, return Taskattemptimpl instance attempt.

Continue to follow the Createattempt () method, whose code in Taskimpl is as follows:

  Protected abstract Taskattemptimpl createattempt ();
This is an abstract method, implemented by its subclasses, and its subclass has two, which represents the Maptaskimpl of the map task and the Reducetaskimpl that represents the reduce task, and its createattempt () method is implemented as follows:

1, Maptaskimpl.createattempt ()

  @Override  protected Taskattemptimpl createattempt () {    return new Maptaskattemptimpl (GetID (), Nextattemptnumber,        EventHandler, Jobfile,        partition, Tasksplitmetainfo        , conf, Taskattemptlistener, Jobtoken, credentials, clock, appContext);  }
Generates an Maptaskattemptimpl instance, passing in the nextattemptnumber that represents the attempt ordinal, the event handler EventHandler, the job file Jobfile, the partition information partition, Key variables such as shard metadata information tasksplitmetainfo.

2, Reducetaskimpl.createattempt ()

  @Override  protected Taskattemptimpl createattempt () {    return new Reducetaskattemptimpl (GetID (), Nextattemptnumber,        EventHandler, Jobfile,        partition, Nummaptasks, conf, Taskattemptlistener, Jobtoken,        Credentials, clock, appContext);  }

Generates an Reducetaskattemptimpl instance that is essentially the same as Maptaskattemptimpl except that it does not require Shard metadata information tasksplitmetainfo, and requires a map task number nummaptasks.

The taskattempt is generated, and then it should be scheduled for execution. Let's go back to the Addandscheduleattempt () method, send the taskattemptevent of the Ta_schedule or ta_reschedule type, as with Jobimpl, Taskimpl, is handled by the Taskattempt state machine, as follows:

//in event Taskattem Pteventtype.ta_schedule is triggered by requestcontainertransition processing, the state of//Taskattempt is converted from new to unassigned. Addtransition (TA Skattemptstateinternal.new, taskattemptstateinternal.unassigned, Taskattempteventtype.ta_schedule, NEW RequestCont Ainertransition (FALSE))//After the event Taskattempteventtype.ta_schedule is triggered, the requestcontainertransition is processed,//Taskat The state of tempt is converted from NEW to UNASSIGNED. Addtransition (Taskattemptstateinternal.new, taskattemptstateinternal.unassigned, Ta Skattempteventtype.ta_reschedule, New Requestcontainertransition (TRUE))// The difference between the two is that the Requestcontainertransition incoming flag bit rescheduled, the former is false, the latter is true 
Under the trigger of event Taskattempteventtype.ta_schedule, the state of Taskattempt is converted from new to unassigned by requestcontainertransition processing. In event Taskattempteventtype.ta_schedule, the state of Taskattempt is converted from new to unassigned by requestcontainertransition processing. The difference between the above is that the Requestcontainertransition incoming flag bit rescheduled, the former is false, the latter is true.

Let's look at the implementation of Requestcontainertransition, the code is as follows:

  @SuppressWarnings ("unchecked") @Override public void Transition (Taskattemptimpl taskattempt, Taskattemptev Ent event) {//Tell no speculator that we ' re requesting a container//Taskattempt Event processor EventHandler Process SP The Eculatorevent event tells all the speculator that a container taskAttempt.eventHandler.handle (new Speculatorevent (TASKATTEMPT) is being applied at this time.      GetID (). GetTaskID (), + 1)); Request for container//Request container if (rescheduled) {//Task attempt reschedule//construct container request event Containerrequest event, which is handled by the Taskattempt events processor EventHandler,//This EventHandler is actually dispatcher in Mrappmaster, followed by Taskimpl, The creation of the Taskattemptimpl passed over, TaskAttempt.eventHandler.handle (Containerrequestevent.createcontainerrequestev      Entforfailedcontainer (Taskattempt.attemptid, taskattempt.resourcecapability)); } else {//task attempt first dispatch//Construction container request event containerrequestevent, and referred to Taskattempt event handler EventHandler processing, Taskatte Mpt.eventhandler.Handle (new Containerrequestevent (Taskattempt.attemptid, taskattempt.resourcecapability, taskattempt . Datalocalhosts.toarray (New String[taskattempt.datalocalhosts.size ())), TASKATTEMPT.DATALOCALRA      Cks.toarray (New String[taskattempt.datalocalracks.size ())));       }//The difference between the Containerrequestevent events created by the two is that when rescheduled, the node and lock location properties are not considered, because attempt has failed before, and should be able to complete attempt as the first task, At the same time, both of the event types are ContainerAllocator.EventType.CONTAINER_REQ,//Mrappmaster in dispatcher for the event Containerallocator.ev Enttype Registered event handler is localcontainerallocator or Rmcontainerallocator}
The transition () method processing logic for Requestcontainertransition is as follows:

1, taskattempt event processor EventHandler processing Speculatorevent event, tell all speculator, at this time is applying for a container;

2. Application Container:

2.1, if the attempt of the task is re-dispatched, the Construction container request event containerrequestevent, and is referred to Taskattempt event handler EventHandler processing, This eventhandler is actually mrappmaster in the dispatcher, in turn through the creation of Taskimpl, Taskattemptimpl passed over;

2.2, otherwise if the attempt of the task is dispatched for the first time, the construction container applies for event containerrequestevent and is referred to the Taskattempt event handler EventHandler.

The difference between the Containerrequestevent events created by the two is that the node and lock position properties are not considered when rescheduled, because attempt has failed before, and should be able to complete attempt as the first task, while Both of the event types are ContainerAllocator.EventType.CONTAINER_REQ, The event handler registered for the event Containerallocator.eventtype dispatcher in Mrappmaster is Localcontainerallocator or rmcontainerallocator.

About the application and allocation of yarn containers and other resources Rmcontainerallocator introduction, I will be in a future article for you to explain, here, you just need to understand its implementation of the general process can be:

1, Rmcontainerallocator first indirectly inherited from the Abstractservice, it is a service in Hadoop, there is service initialization serviceinit () and service start Servicestart () method to execute;

2, Rmcontainerallocator for the container request allocation event, is a dual producer-consumer model, the first layer of producers through its handle () method, The container request assignment containerallocatorevent joins its internal eventqueue queue, the first consumer through its internal event processing thread Eventhandlingthread, Take the event from the event queue EventQueue constantly consumption, and the way to consume is as a second tier producer, the event according to the task type into the schedule request list scheduledrequests, pendingreduces, Scheduledrequests is a complex list of requests that differentiate between the map and the reduce task immediately, while pendingreduces only stores the lists of reduce task requests waiting to be dispatched. It is determined that the event is transferred to (i.e., rampup) scheduledrequests, based on the resource situation in yarn and the completion of the map task. Or move back from scheduledrequests to the reduce task dispatch request to Pendingreduces (that is, Rampdown). The second layer of the consumer is rmcontainerallocator ancestor Rmcommunicator heart jumper Allocatorthread, which periodically calls the heartbeat () method, from the yarn of the RM to obtain available resources, Then consume the scheduledrequests list of requests, carry out container allocation;

3, Rmcontainerallocator, for the map task, it undergoes the data structure, or the life cycle is scheduled->assigned->completed, and the reduce task is pending-> scheduled->assigned->completed;

4, after a number of complex logic, including the comprehensive assessment of resource situation, task locality, priority scheduling failure task, map task completion ratio, for the task of drag-and-drop speculation execution, whether it is a map task or reduce task, eventually allocated to the container container, Will send a taskattemptcontainerassignedevent event to the Taskattemptimpl state machine containerassignedtransition to handle, And its method will eventually construct Containerremotelaunchevent event, container remote load, in remote or native or this process container launch task attempt to perform the task.

About Rmcontainerallocator, because its structure, processing logic is more complex, I will write articles for analysis, please look forward to!


Ii. t_recover Type Task Recovery event taskrecoverevent

Not to be continued! Stay tuned for further articles!











The whole process of task scheduling in the MapReduce job of yarn source analysis (I.)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.