Yarn Source Analysis (iii)-----application state Storage and recovery of ResourceManager ha

Source: Internet
Author: User
Tags log log

Preface

Any system, even if it does a large, there will be a variety of unexpected situations. Although you can say that I have done all the accident on the software level, but in case of hardware problems or physical aspects of the problem, I am afraid it is not more than a few lines of code can be solved immediately, said so much, just want to emphasize the importance of HA, system high availability. In yarn, Namenode ha method estimated that many people have already understood, that this article will comb the RM resource Manager ha knowledge, does not refer to the simple RM ha configuration, is to say the RM application state is stored in the recovery.


RM App State store uses

RM App State Store What does that mean, we know, RM full name ResourceManager, like a big housekeeper, he not only to communicate with the Applicationmaster on each node, but also with NodeManager heartbeat packet transmission, Naturally there are many applications registered on RM, each with 1 applicationmaster in charge of the entire application cycle. Since the RM role is so important, it is necessary to save the RM information state so that the application state information is lost due to an abnormal exit of the RM process, and the RM restart cannot be used before the re-run.


What to saveApplication Information

Since the goal is clear, then in yarn, the application information stored in exactly what data information, application status information is only 1 general concept. This is shown in a picture below.


As can be seen, this is a hierarchical multi-fork tree shape, this figure is similar to the implementation of the mapreduce operation of the hierarchical execution state diagram, do a brief introduction, the top is a rmstate state, this state contains a number of applicationstate application state information, Each app status message contains a lot of song app attempt information status.


How app state information is saved

How do I save RM app state information:

1.memoryrmstatestore--the implementation class that holds the information state in memory.

The 2.filesystemrmstatestore--information state is saved in the HDFs file system, and this is done in a persistent format.

3.nullrmstatestore--do Nothing to do, is not to save the application state information.

The 4.zkrmstatestore--information state is saved in the zookeeper.

Since the source of my analysis has not zkrmstatestore this class, so only for the first 3 to do a simple introduction. Some of the classes listed above are concrete implementation classes, so there must be higher class classes to define more basic variables and methods, the answer is the Rmstatestore class, so the inheritance relationship is represented by the following diagram


The following blue arrows indicate the meaning of the implementation class based on the object. What exactly do you mean, look at the next source code analysis. First Rmstatestore class object

/** * Base class to implement storage of ResourceManager state. * Takes Care of asynchronous notifications and interfacing with YARN objects. * Real store implementations need to derive from it and implement blocking * store and load methods to actually store and Load the state. * The base class for saving RM resource status information is also a service object class */public abstract class Rmstatestore extends Abstractservice {..../** * State of an appli  cation attempt * One application attempt status information class */public static class Applicationattemptstate {//application attempt ID final Applicationattemptid    Attemptid;    Main container final Container Mastercontainer;    Voucher information final Credentials appattemptcredentials;    ....  }  /** * State of an application application * Apply status information class */public static class ApplicationState {//Apply Commit Context object final    Applicationsubmissioncontext context;    Application Submission time final long submittime;    Submitter final String user; Application attempt information to Map<applicationattemptid, applicationattemptstate> attempts = new Hashmap<appliCationattemptid, applicationattemptstate> ();  ....  } public static class Rmdtsecretmanagerstate {//Dtidentifier-renewdate//RM identity identifier ID for time mapping map<rmdelegatio    Ntokenidentifier, long> delegationtokenstate = new Hashmap<rmdelegationtokenidentifier, Long> ();    set<delegationkey> masterkeystate = new hashset<delegationkey> ();    int dtsequencenumber = 0;  ....  } /** * State of the ResourceManager * RM status Information class */public static class Rmstate {//RM apply status to Diagram Map<applicationid    , applicationstate> appState = new Hashmap<applicationid, applicationstate> ();    Rmdtsecretmanagerstate rmsecretmanagerstate = new Rmdtsecretmanagerstate (); ....  }
Focus on a few of the application state classes defined in this class, compared to the first one above. Below is a look at several of the applications defined in this parent class to save related methods:

/** * non-blocking API * ResourceManager services use this to store the application ' s state * This does not block th E Dispatcher threads * Rmappstoredevent'll be sent on completion to notify the Rmapp * Save the application state method, triggering a Save event, this method is non-resistive Plug method */@SuppressWarnings ("unchecked") public synchronized void Storeapplication (Rmapp app) {Applicationsubmissionc    Ontext context = App. Getapplicationsubmissioncontext ();    Assert context instanceof Applicationsubmissioncontextpbimpl;    ApplicationState appState = new ApplicationState (App.getsubmittime (), Context, App.getuser ());  Triggers an app information save event that is handled by the Central scheduler for event distribution Dispatcher.geteventhandler (). Handle (new Rmstatestoreappevent (appState));   }/** * Blocking API * Derived Classes must implement this method to store the state of an * application.                                      * Save the blocking method for applying state information, implemented by subclasses/protected abstract void Storeapplicationstate (String appId, ApplicAtionstatedatapbimpl appstatedata) throws Exception; 
The method of saving application state is divided into blocking method and non-blocking method, the non-blocking method is implemented by event-driven way, and the blocking method is implemented by the concrete subclass. There are a few different ways to remove an application

/** * non-blocking API * ResourceManager Services call this to remove a application from the state * Store * this   does not block the dispatcher threads * There are no notification of completion for this operation.   * There is no notification of completion for this operation. * Remove app status information from RM, mainly remove application attempt information list */public synchronized void RemoveApplication (Rmapp app) {applicationstate appState    = New ApplicationState (App.getsubmittime (), App.getapplicationsubmissioncontext (), App.getuser ()); Remove the run attempt information status for (Rmappattempt appAttempt:app.getAppAttempts () in this app. VALUES ()) {Credentials Credentials = GetC      Redentialsfromappattempt (appattempt); Applicationattemptstate attemptstate = new Applicationattemptstate (Appattempt.getappattemptid (), appAt      Tempt.getmastercontainer (), credentials);    AppState.attempts.put (Attemptstate.getattemptid (), attemptstate);  }//Remove Operation RemoveApplication (appState); }
Removing the app requires all the application attempts contained in the target app to be removed, then removed, and the removeapplication operation will also be the 2 branch of the above method

/**   * non-blocking API   *  /public synchronized void RemoveApplication (ApplicationState appState) {    Dispatcher.geteventhandler (). Handle (new Rmstatestoreremoveappevent (appState));  /**   * Blocking API   * Derived Classes must implement this method to remove the state of an    * application and it S attempts   *  /protected abstract void Removeapplicationstate (ApplicationState appState)                                                              throws Exception;
In this class, specifically, what does the following class do?

public static class Rmdtsecretmanagerstate {    //Dtidentifier-renewdate    //RM identity identifier ID for time mapping    map< Rmdelegationtokenidentifier, long> delegationtokenstate =        new Hashmap<rmdelegationtokenidentifier, Long > ();    set<delegationkey> masterkeystate =        new hashset<delegationkey> ();    int dtsequencenumber = 0;    ....  }
It holds the RM identity bit-to-time mapping, which can be used to indicate whether the RM is an old RM or a newly started RM, for the application. Rmdelegationontokenidentifier Let's talk about 3 specific implementations of this class.


Memoryrmstatestore

Memory save implementation class, RM application state information in Rmstatestore has been abstracted into the Rmstate class, so in the Memoryrmstatestore class, there must be a corresponding variable

Memory RM status information Save class implement public classes Memoryrmstatestore extends Rmstatestore {    rmstate state = new Rmstate ();    @VisibleForTesting public  rmstate getState () {    return state;  }  ...
Just at the beginning, state is an instance object without any information content. Then he defines how to save the application information object.
@Override public  void Storeapplicationstate (String appId,                                      applicationstatedatapbimpl appstatedata)      Throws Exception {    //Generate a new app state object instance    applicationstate appState = new ApplicationState (        Appstatedata.getsubmittime (),        appstatedata.getapplicationsubmissioncontext (), Appstatedata.getuser ());    if (State.appState.containsKey (Appstate.getappid ())) {      Exception e = new IOException ("APP:" + AppId + "is already st ORed. ");      Log.info ("Error storing info for app:" + AppId, e);      throw e;    }    Join the state object in    State.appState.put (Appstate.getappid (), appState);  }
Save application attempt state information method

@Override public synchronized void Storeapplicationattemptstate (String attemptidstr, Applicat Ionattemptstatedatapbimpl attemptstatedata) throws Exception {Applicationattemptid Attempti    D = converterutils. Toapplicationattemptid (ATTEMPTIDSTR);    ... Applicationattemptstate attemptstate = new Applicationattemptstate (Attemptid, Attemptstatedata.getmasterco    Ntainer (), credentials);    ApplicationState appState = State.getapplicationstate (). Get (Attemptstate.getattemptid (). Getapplicationid ());    if (appState = = null) {throw new Yarnruntimeexception ("Application doesn ' t exist");          } if (AppState.attempts.containsKey (Attemptstate.getattemptid ())) {Exception E = new IOException ("attempt:" +      Attemptstate.getattemptid () + "is already stored.");      Log.info ("Error storing info for attempt:" + Attemptstate.getattemptid (), E); Throw E  }//Join AppState's run attempt information status list in AppState.attempts.put (Attemptstate.getattemptid (), attemptstate); }
After the application state information is saved, how to load from memory, this is what we care about, the LoadState () method implements this requirement

Equivalent to returning an in-memory maintained RM State Copy Object  @Override public  synchronized Rmstate loadState () throws Exception {    //return a Copy of the state-to-modification of the real state    //Create a new Rmstate object, copy the Rmstate objects maintained in memory    rmstate Returnstat E = new Rmstate ();    Copy appState    returnState.appState.putAll (state.appstate);    ReturnState.rmSecretManagerState.getMasterKeyState ()      . AddAll (State.rmSecretManagerState.getMasterKeyState ( ));    ReturnState.rmSecretManagerState.getTokenState (). Putall (      state.rmSecretManagerState.getTokenState ());    ReturnState.rmSecretManagerState.dtSequenceNumber =        State.rmSecretManagerState.dtSequenceNumber;    return returnstate;  }
Rmstate deep copy equivalent to Memorystatestore object.


Filesystemrmstatestore

File system RM Application Information state Save class, one of the core operations of this class is to persist application state information to HDFs.

/** * A simple class for storing RM state with any storage that implements A Basic * FileSystem interface. Does not use directories so, simple Key-value * stores can is used. The retry policy for the real filesystem client must is * configured separately to enable retry of filesystem operations W Hen needed. * RM status information File system Save Class */public class Filesystemrmstatestore extends Rmstatestore {public static final log log = Logfactory.get  Log (Filesystemrmstatestore.class);  private static final String Root_dir_name = "Fsrmstateroot";  private static final String Rm_dt_secret_manager_root = "Rmdtsecretmanagerroot";  private static final String Rm_app_root = "Rmapproot";  private static final String Delegation_key_prefix = "Delegationkey_";  private static final String Delegation_token_prefix = "Rmdelegationtoken_";  private static final String Delegation_token_sequence_number_prefix = "Rmdtsequencenumber_";    File system object protected FileSystem FS;  RM saved file path private path Rootdirpath; Private Path rmdtsecretmanagerroot;  Private Path Rmapproot;  Private Path Dtsequencenumberpath = null; @VisibleForTesting Path Fsworkingpath;
Multiple paths are declared, different object instances have different paths, and then there are 1 total File system operations objects. Below is a look at the core preservation application method

@Override public  synchronized void Storeapplicationstate (String appId,      Applicationstatedatapbimpl APPSTATEDATAPB) throws Exception {    Path Appdirpath = Getappdir (Rmapproot, appId);    Fs.mkdirs (Appdirpath);    Gets the directory path to be written to    Nodecreatepath = Getnodepath (Appdirpath, appId);    Log.info ("Storing info for app:" + AppId + "at:" + nodecreatepath);    Gets the state data to be written    byte[] Appstatedata = Appstatedatapb.getproto (). Tobytearray ();    try {      //currently throw all exceptions. May need to respond differently for HA      //Based on whether we had lost the right to write to FS      //write status information 
   writefile (Nodecreatepath, appstatedata);    } catch (Exception e) {      log.info ("Error storing info for app:" + AppId, e);      throw e;    }  }
Corresponding load RM application state method

@Override public  synchronized Rmstate loadState () throws Exception {  //new RM state Object    Rmstate rmstate = new Rmstate ();    Call method, recover from File    //Recover Delegationtokensecretmanager    loadrmdtsecretmanagerstate (rmstate);    Recover RM Applications    loadrmappstate (rmstate);    return rmstate;  }
Load app Actions

private void Loadrmappstate (Rmstate rmstate) throws Exception {try {list<applicationattemptstate> attempts      = new Arraylist<applicationattemptstate> (); For (Filestatus appDir:fs.listStatus (rmapproot)) {for (Filestatus childNodeStatus:fs.listStatus (Appdir.getpath          ())) {assert childnodestatus.isfile ();          String childnodename = Childnodestatus.getpath (). GetName ();          Read file data information byte[] Childdata = ReadFile (Childnodestatus.getpath (), Childnodestatus.getlen ());            If you are applying state information if (Childnodename.startswith (Applicationid.appidstrprefix)) {//Application            Log.info ("Loading Application from Node:" + Childnodename);            ApplicationID appId = Converterutils.toapplicationid (childnodename); Applicationstatedatapbimpl appstatedata = new Applicationstatedatapbimpl (applicationstate            Dataproto.parsefrom (Childdata)); ApPlicationstate appState = new ApplicationState (Appstatedata.getsubmittime (), Appstatedata.            Getapplicationsubmissioncontext (), Appstatedata.getuser ()); Assert child node name is same as actual ApplicationID assert Appid.equals (appState.context.getApplicationId            ());          RmState.appState.put (AppId, appState);            } else if (Childnodename. StartsWith (Applicationattemptid.appattemptidstrprefix)) {//attempt            If the application generates information Log.info ("Loading Application attempt from node:" + Childnodename);            Applicationattemptid Attemptid = Converterutils.toapplicationattemptid (childnodename);                  Applicationattemptstatedatapbimpl attemptstatedata = new Applicationattemptstatedatapbimpl (            Applicationattemptstatedataproto.parsefrom (Childdata));            Credentials Credentials = null; if (Attemptstatedata.getapPattempttokens ()! = null) {credentials = new credentials ();              Datainputbytebuffer Dibb = new Datainputbytebuffer ();              Dibb.reset (Attemptstatedata.getappattempttokens ());            Credentials.readtokenstoragestream (DIBB);                  } applicationattemptstate attemptstate = new Applicationattemptstate (Attemptid,            Attemptstatedata.getmastercontainer (), credentials); Assert child node name is same as application attempt ID assert attemptid.equals (attemptstate.getattemptid ()            );          Attempts.add (attemptstate);          } else {log.info ("Unknown child node with Name:" + childnodename); }        }      }

Nullrmstatestore

Empty method implementation class, that is, do not save state information operations, the method is simple, inherited the method, but does not implement the Code logic

Empty RM Information State Save class, do not implement any operation of Save method @unstablepublic class Nullrmstatestore extends Rmstatestore {  ....    Do not implement the load state method  @Override public  rmstate loadState () throws Exception {    throw new Unsupportedoperationexception ("Cannot load state from null store");  }    The specific Save application method is also not implemented  @Override  protected void storeapplicationstate (String appId,      Applicationstatedatapbimpl appstatedata) throws Exception {    //Do nothing  }  @Override  protected void Storeapplicationattemptstate (String attemptid,      Applicationattemptstatedatapbimpl attemptstatedata) Throws Exception {    //Do nothing  }  @Override  protected void Removeapplicationstate ( ApplicationState appState)      throws Exception {    //Do nothing  }  ...

So how to use the above classes, in yarn configuration properties, through the parameter Yarn.resource-manager.store.class class object configuration, fill in the class name.


All Code Analysis please click on the link Https://github.com/linyiqun/hadoop-yarn, follow up will continue to update yarn other aspects of code analysis.

Reference Documents

Principles of internal –yarn architecture design and implementation for Hadoop technology. Dong Xicheng


Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Yarn Source Analysis (iii)-----application state Storage and recovery of ResourceManager ha

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.