Yarn Source analysis of Mrappmaster on MapReduce job processing process (i)

Source: Internet
Author: User

We know that if you want to run a mapreduce job on yarn, you only need to implement a applicationmaster component, and Mrappmaster is the implementation of MapReduce applicationmaster on yarn, It controls the execution of the Mr Job on yarn. So, one of the problems that followed was how Mrappmaster controlled the mapreduce operation on yarn, in other words, what was the total process of mapreduce job processing on Mrappmaster? This is the focus of this paper.

Through the definition of Mrappmaster class we can see that mrappmaster inherit from Compositeservice, and Compositeservice inherit from Abstractservice, In other words, Mrappmaster is also a service in Hadoop, and we look at the processing of the MapReduce job in the Servicestart () method of service initiation, the key code is as follows:

  @SuppressWarnings ("unchecked") @Override protected void Servicestart () throws Exception {//...    Omit part of the code//Call the Createjob () method to create the jobs job instance job/////////////////////Create the job itself.    Job = Createjob (GetConfig (), forcedstate, shutdownmessage);    End of creating the job. // ......    Omit part of code//job initialization failed flag bit initfailed default is False, that is, initialization succeeds, no error Boolean initfailed = false; if (!errorhappenedshutdown) {//Create a job event for job intialization//creation of a job initialization event Initjobevent J        Obevent initjobevent = new Jobevent (Job.getid (), jobeventtype.job_init); Send Init to the job (this does not trigger job execution)//This is a synchronous call, not an event through dis Patcher.      We want//job-init to be do completely here.      Call Jobeventdispatcher's handle () method to handle the job initialization event initjobevent, and the job initialization event is referred to the event dispatcher for jobeventdispatcher processing.      Jobeventdispatcher.handle (initjobevent); If job is still not initialized, a error happened during//inItialization.      Must complete starting all of the services so failure/events can processed.      Gets the job initialization result initfailed initfailed = (((jobimpl) job). Getinternalstate ()! = jobstateinternal.inited);  Jobimpl ' s inittransition is do (call above was synchronous), so the/"Uber-decision" (MR-1220) has been made. Query job and switch to//Ubermode if appropriate (by registering different container-allocator//and contain      Er-launcher services/event-handlers). // ......            Omit part code//Start ClientService here, since it's not initialized if//Errorhappenedshutdown is True    Start the Client service ClientService Clientservice.start ();    }//start all the components//Call the parent class of Servicestart (), Start all component Super.servicestart ();    Finally set the job ClassLoader//Final Setup Activity class loader Mrapps.setclassloader (Jobclassloader, GetConfig ()); if (initfailed) {//If the job initialization fails, the construction job initializes the failed Job_init_failed event and is referred to the event dispatcher for Jobeventdispatcher processing JobevenT initfailedevent = new Jobevent (Job.getid (), jobeventtype.job_init_failed);    Jobeventdispatcher.handle (initfailedevent);      } else {//All components has started, start the job.    Call the Startjobs () method to start the job startjobs (); }  }

Servicestart () method initiated by the Mrappmaster service we know roughly that the MapReduce job has been created in mrappmaster--initialization--to start three main processes, cut off foliage, and retain the trunk, as follows:

1. Create: Call the Createjob () method to create the job task instance job;

2. Initialize:

2.1, create a job initialization event initjobevent;

2.2, call Jobeventdispatcher's handle () method, handle the job initialization event Initjobevent, the job initialization event is referred to the event Dispatcher jobeventdispatcher processing;

2.3, get job initialization result initfailed;

2.4. If the job initialization fails, the construction job initializes the failed Job_init_failed event and is referred to the event dispatcher for jobeventdispatcher processing.

3. Start: Call the Startjobs () method to start the job.

In fact, the job can not never stop after the start, Mrappmaster will eventually stop the job, which is also the fourth step of the job processing process, that is, the last step, the job stopped! Where is it handled? We sell a Xiaoguanzi first, please ignore this question for the time being, we will give the answer later!


Below, we describe the above three main processes of the mapreduce operation, respectively.

First, create

First look at the job creation, the Createjob () method is as follows:

  /** Create and initialize (but don ' t start) a single job.    * @param forcedstate A state to force the job into or null for normal operation.   * @param diagnostic A diagnostic message to include with the job. */protected Job createjob (Configuration conf, jobstateinternal forcedstate, String Diagnostic) {//Create sing Le job//created a job job instance newjob, in fact, is now Jobimpl job newjob = new Jobimpl (JobId, Appattemptid, conf, Dispatcher.geteventha Ndler (), Taskattemptlistener, Jobtokensecretmanager, jobcredentials, clock, Completedtasksfromprevio Usrun, metrics, Committer, Newapicommitter, Currentuser.getusername (), Appsubmittime, Aminfos, conte        XT, Forcedstate, diagnostic); Stores the Jobid of the newly created job newjob with its own mapping to the Jobs collection ((Runningappcontext) in the context of the application's run contextual information. Jobs.put (Newjob.getid ()    , newjob); The asynchronous event dispatcher dispatcher registers a job completion event jobfinishevent the corresponding event handler, which is obtained through the Createjobfinisheventhandler () method Dispatcher.register ( JobfinisHEvent.Type.class, Createjobfinisheventhandler ());  Returns the newly created job newjob return newjob; }//End Createjob ()
The main logic is as follows:

1, create a job instance newjob, in fact, is now Jobimpl, the incoming job add Jobid, the application tries to add Appattemptid, the task attempt listener taskattemptlistener, output submitter Committer, User name Currentuser.getusername (), application run context information contexts and other key member variables;

2. Store the Jobid of the newly created job newjob with its own mapping to the jobs collection in the context of the application run contextual information;

3, the asynchronous event dispatcher dispatcher registers the job completion event Jobfinishevent corresponding event handler, obtains through the Createjobfinisheventhandler () method;

4. Return the newly created job newjob.

For some details about job creation, let's not focus on it for the time being, and leave it to future articles for analysis. Here, let's focus on the 3rd step, the asynchronous event dispatcher dispatcher register the job completion event jobfinishevent the corresponding event handler, obtained through the Createjobfinisheventhandler () method, The Createjobfinisheventhandler () method code is as follows:

  /**   * Create an event handler that handles the job finish event.   * @return The job Finish event handler.   *  /protected eventhandler<jobfinishevent> Createjobfinisheventhandler () {    return new Jobfinisheventhandler ();  }
That is, when the job is created, it is defined as the processor for the job completion event Jobfinishevent is Jobfinisheventhandler, and Jobfinisheventhandler is defined as follows:

  Private class Jobfinisheventhandler implements eventhandler<jobfinishevent> {    @Override public    Void Handle (Jobfinishevent event) {      //Create a new thread to shutdown the AM. We should not does it in-line      //To avoid blocking the dispatcher itself.      New Thread () {                @Override public        void Run () {          shutdownjob ();        }      }. Start ();    }  }
This is the fourth step we didn't detail above--the job stopped, it was finally called the Shutdownjob () method, and a new thread was opened to complete the job stop, we'll do the introduction later.

Second, initialize

Let's take another look at the initialization of the job, which is done by creating a job initialization event jobevent instance initjobevent, with the event type Jobeventtype.job_init, which is then referred to the event dispatcher Jobeventdispatcher. Let's take a look at the definition and instantiation of this jobeventdispatcher, as follows:

  Job Event Dispatcher  private Jobeventdispatcher jobeventdispatcher;
Jobeventdispatcher is a jobeventdispatcher type of job event dispatcher, which is instantiated as:

This.jobeventdispatcher = new Jobeventdispatcher ();
The definition of Jobeventdispatcher is as follows:

  Private class Jobeventdispatcher implements Eventhandler<jobevent> {    @SuppressWarnings ("unchecked")    @ Override public    void handle (Jobevent event) {      //Gets the job instance from the application run context information context according to Jobid, which is the Jobimpl object, calls its handle () method, Handles the corresponding event      ((eventhandler<jobevent>) Context.getjob (Event.getjobid ())). Handle (event);}  }
Very simply, from the context of the application run contextual information in context to get the job instance, that is, the Jobimpl object, call its handle () method, handle the corresponding event, and this job instance, remember the above, that is, when the job was originally created, is added to the app runtime context information in context in the Jobs collection, where key is Jobid,value is the Jobimpl object. In the context implementation Runningappcontext, the code to get the job instance according to Jobid is as follows:

    @Override public    Job getjob (JobId JobId) {      return jobs.get (JobId);    }
Well, let's see how the Handle () method in Jobimpl is dealing with jobevent of type Jobeventtype.job_init!
  @Override/** * The only entry the Job. */public void handle (Jobevent event) {if (log.isdebugenabled ()) {Log.debug ("processing" + event.getjobid () +    "of type" + Event.gettype ());      } try {writelock.lock ();      Jobstateinternal oldstate = Getinternalstate ();      try {getstatemachine (). Dotransition (Event.gettype (), event);        } catch (Invalidstatetransitonexception e) {log.error ("Can ' t handle this event @ Current state", e);        Adddiagnostic ("Invalid event" + event.gettype () + "on Job" + this.jobid);      Eventhandler.handle (New Jobevent (This.jobid, jobeventtype.internal_error)); }//notify the EventHandler of state change if (oldstate! = Getinternalstate ()) {Log.info (jobId + "Job T        ransitioned from "+ Oldstate +" to "+ getinternalstate ());      Rememberlastnonfinalstate (oldstate);  }} finally {Writelock.unlock ();  }  } 
The most important thing is to deal with the statement getstatemachine (). Dotransition (Event.gettype (), event), which actually leads to the state machine of the MapReduce job in yarn, in order to describe the fluency of this paper, Concise, focus on the clear, we do not explain the job state machine, this part of the content for the future of the article specifically introduced, here you just know that the job initialization is finally through the Jobimpl static internal class Inittransition transition () method to achieve the line. Let's take a look at Inittransition's transition () method, as follows:

/** * Note that this transition method was called directly (and synchronously) * by Mrappmaster ' s init () method (I.     E., no RPC, no thread-switching;  * Just plain sequential call within AM context), so we can trigger * modifications on AM state from here (at least, if AM is written.     MR version is). */@Override Public jobstateinternal Transition (Jobimpl job, jobevent event) {//Invoke Job metric system metrics SUBMI            Ttedjob () method, submit job Job.metrics.submittedJob (Job);      Call Job Metric system Metrics preparingjob () method, start Job Preparation job.metrics.preparingJob (job);            The old and new APIs create different job contexts Jobcontextimpl instances if (job.newapicommitter) {job.jobcontext = new Jobcontextimpl (job.conf,      Job.oldjobid);      } else {job.jobcontext = new Org.apache.hadoop.mapred.JobContextImpl (job.conf, job.oldjobid);                } try {//Call the Setup () method to complete part of the initialization work setup before the job starts; Set the file system FS Job.fs for job jobs = Job.getfilesystem (job.conf); Log to Job history//Create job submitted event Jobsubmittedevent instance JSE jobsubmittedevent JSE = new Jobsubmittedevent (job.ol Djobid, Job.conf.get (mrjobconfig.job_name, "test"), Job.conf.get (Mrjobconfig.user_name, "mapred")            , Job.appsubmittime, Job.remoteJobConfFile.toString (), Job.jobacls, Job.queuename, Job.conf.get (mrjobconfig.workflow_id, ""), Job.conf.get (Mrjobconfig.workflow_name, ""), Job.con F.get (Mrjobconfig.workflow_node_name, ""), Getworkflowadjacencies (job.conf), Job.conf.get (mrjobconfi                G.workflow_tags, "")); Encapsulates a job submitted event jobsubmittedevent instance JSE into a job history event jobhistoryevent to the job's current affairs piece Processor EventHandler processing job.eventHandler.handle (new        Jobhistoryevent (Job.jobid, JSE));        TODO JH Verify jobacls, UserName via UGI? Call the Createsplits () method, create the Shard, and get the task Shard metadata information Tasksplitmetainfo array Tasksplitmetainfo tasksplitmetainfo[]Tasksplitmetainfo = createsplits (Job, job.jobid);        Determine the number of map tasks Nummaptasks: The length of the array of shard metadata information, that is, how many shards there are nummaptasks job.nummaptasks = tasksplitmetainfo.length; Determine the number of reduce tasks numreducetasks, take the job parameter mapreduce.job.reduces, the parameter is not configured by default to 0 job.numreducetasks = job.conf.getInt (Mrjobcon Fig.        num_reduces, 0);          Determine the map and reduce weights for the job mapweight, reduceweight if (job.nummaptasks = = 0 && Job.numreducetasks = = 0) {        Job.adddiagnostic ("No of maps and reduces are 0" + job.jobid);        } else if (Job.nummaptasks = = 0) {job.reduceweight = 0.9f;        } else if (Job.numreducetasks = = 0) {job.mapweight = 0.9f;        } else {job.mapweight = Job.reduceweight = 0.45f;        } checktasklimits ();        Calculates the input length inputlength according to the Shard metadata information, which is the job size long inputlength = 0;        for (int i = 0; i < job.nummaptasks; ++i) {inputlength + = Tasksplitmetainfo[i].getinputdatalength (); }//Based on job size inputLength, call the Makeuberdecision () method of the job to determine whether the job run mode is Uber mode or Non-uber mode job.makeuberdecision (inputlength); Based on the map of the job, the sum of the number of reduce tasks, plus 10, the//initialization task attempts to complete the event taskattemptcompletionevent list taskattemptcompletionevents Job.tas kattemptcompletionevents = new Arraylist<taskattemptcompletionevent> (Job.nummaptasks + Jo                B.numreducetasks + 10); Based on the number of map tasks for the job, plus 10,//Initialize the map task to complete the event taskcompletionevent list mapattemptcompletionevents Job.mapattemptcomplet                ionevents = new Arraylist<taskcompletionevent> (job.nummaptasks + 10); Based on the map of the job, the sum of the number of reduce tasks, plus 10,//initialization list Taskcompletionidxtomapcompletionidx Job.taskcompletionidxtomapcomplet        Ionidx = new Arraylist<integer> (job.nummaptasks + job.numreducetasks + 10); Determine the percentage of allowable map, reduce task failures,//Take parameters mapreduce.map.failures.maxpercent, mapreduce.reduce.failures.maxpercent,//Parameter  The number is not configured by default to 0, that is, map and reduce task failures are not allowed      Job.allowedmapfailurespercent = job.conf.getInt (mrjobconfig.map_failures_max_percent, 0);        Job.allowedreducefailurespercent = job.conf.getInt (mrjobconfig.reduce_failures_maxpercent, 0); Create the Tasks but don ' t start them yet//creating a map Task createmaptasks (Job, Inputlength, Tasksplitmetainf        O);        Create a reduce Task createreducetasks (job);                Call Job Metric system Metrics endpreparingjob () method, end job preparation job.metrics.endPreparingJob (job);      Returns the internal state of the job, jobstateinternal.inited, that is, the return jobstateinternal.inited has been initialized;        } catch (Exception e) {//record warn level log information: Job init failed, and print out the specific exception Log.warn ("Job init failed", e);        Call Job Metric system Metrics endpreparingjob () method, end job preparation job.metrics.endPreparingJob (job);        Job.adddiagnostic ("Job Init failed:" + stringutils.stringifyexception (e)); Leave job in the NEW state. The MR AM detect that the state is//NoT inited and send a job_init_failed event.      Returns the internal state of the job, Jobstateinternal.new, the new return jobstateinternal.new after initialization failure; }    }
For the main logic to be clear, we remove some of the details, keep the trunk, and summarize the job initialization as follows:

1, call the Setup () method to complete the initialization of the job before the start of the work, in fact, the two most important thing is:

1.1, get and set the job remote submission path remotejobsubmitdir;

1.2, get and set the job remote configuration file remotejobconffile;

2. Call the Createsplits () method, create the Shard, and get the task Shard metadata information Tasksplitmetainfo array Tasksplitmetainfo:

By Splitmetainforeader static method Readsplitmetainfo (), the job Shard metadata information is read from the job remote submission path Remotejobsubmitdir, which is the Shard metadata information for each task. In order to determine the number of map tasks, how to run the operation of some of the following columns;

3. Determine the number of map tasks Nummaptasks: The length of the array of shard metadata information, that is, how many shards there are nummaptasks;

4, determine the number of reduce task Numreducetasks, take the job parameters mapreduce.job.reduces, parameters are not configured by default to 0;

5, according to the Shard meta-data information calculation input length inputlength, that is, the job size;

6, according to the job size inputlength, call the Makeuberdecision () method of the job, decide whether the operation mode is Uber mode or Non-uber mode:

Small jobs will run in Uber mode, on the contrary, large jobs will run through Non-uber mode, see yarn source Analysis Mrappmaster: How the job works local, Uber, Non-uber!

7, determine the allowed map, reduce task failure percentage, take parameters mapreduce.map.failures.maxpercent, Mapreduce.reduce.failures.maxpercent, the parameter is not configured by default to 0, that is, the map and reduce tasks are not allowed to fail;

8, create a map Task;

9. Create a reduce Task;

10, return to the job internal state, jobstateinternal.inited, that has been initialized;

11. If an exception occurs:

11.1, record warn level log information: Job init failed, and print out the specific exception;

11.2, return to the job internal state, jobstateinternal.new, that is, the initial failure of the new;

To be continued, follow-up job initialization section detailed description, job start, job stop and other content, please pay attention to "yarn source analysis on the Mrappmaster mapreduce job processing process (ii)".








Yarn Source analysis of Mrappmaster on MapReduce job processing process (i)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.