1. nodemanager Overview
Nodemanager (nm) is the proxy on each node in yarn. It manages a single computing node in the hadoop cluster, including maintaining communication with resourcemanger and supervising the lifecycle management of the container, monitors the resource usage (memory, CPU, etc.) of each container, tracks node health, and manages logs and ancillary services used by different applications.
Nodemanager overall architecture:
2. nodemanager Analysis
Next, code analysis will be performed based on the sequence of code execution in the nodemanager era.
2.1 Main Function
- Prints the Node Manager startup and shutdown logs;
- Create a nodemanager object;
- Load configuration file Initialization Configuration items;
- Call the initandstartnodemanager function.
Main Code:
Public static void main (string [] ARGs) {thread. setdefaultuncaughtexceptionhandler (New yarnuncaughtexceptionhandler (); // print the log information stringutils when nodemangager is started and disabled. startupshutdownmessage (nodemanager. class, argS, log); // create the nodemanager object nodemanager = new nodemanager (); // load the configuration file to initialize configuration conf = new yarnconfiguration (); // initialize and start nodemanager. initandstartnodemanager (Conf, false );}
2.2 initandstartnodemanager Function
- Call the init () function for initialization (the init method calls the rewritten serviceinit Method for initialization)
- Start various services (the start method calls the rewritten servicestart method internally to start various services)
Private void initandstartnodemanager (configuration Conf, Boolean hastoreboot) {try {// remove the old hook if we are rebooting. If (hastoreboot & null! = Nodemanagershutdownhook) {shutdownhookmanager. get (). removeshudownhook (nodemanagershutdownhook);} // added nodemanagershutdownhook to disable compositeservice nodemanagershutdownhook when nodemanager is disabled or restarted = new compositeserviceshudownhook (this); shutdownhookmanager. get (). addshutdownhook (nodemanagershutdownhook, shutdown_hook_priority); // call the init () function to initialize (the init method calls the rewritten serviceinit method to initialize) This. init (CONF); // start various services (the start method calls the rewritten servicestart method internally to start various services) This. start ();} catch (throwable t) {log. fatal ("error starting nodemanager", T); system. exit (-1 );}}
2.3init Function
(1) The init method is implemented from the service interface in the abstractservice abstract class. The init method in the abstractservice class calls the serviceinit of the protected type. In its subclass nodemananger, The serviceinit method is rewritten.
Implement the init method in the abstractservice abstract class:
@Override public void init(Configuration conf) { if (conf == null) { throw new ServiceStateException("Cannot initialize service " + getName() + ": null configuration"); } if (isInState(STATE.INITED)) { return; } synchronized (stateChangeLock) { if (enterState(STATE.INITED) != STATE.INITED) { setConfig(conf); try { serviceInit(config); if (isInState(STATE.INITED)) { //if the service ended up here during init, //notify the listeners notifyListeners(); } } catch (Exception e) { noteFailure(e); ServiceOperations.stopQuietly(LOG, this); throw ServiceStateException.convert(e); } } } }
(2) Add some services in the serviceinit method of the nodemanager class. The details are as follows:
- Perform basic configuration operations, such as reading parameters from the configuration file.
- Create and add deletionservice and nodehealthcheckerservice to an arraylist <service> Object servicelist of the parent class.
- Call the createnodestatusupdater () function to create the nodestatusupdater object, and then call the nodemanager parent class method register to register this object and add the arraylist of the parent class to the listeners object.
- Call the createcontainermanager (), createwebserver (), and createnoderesourcemonitor functions to create the containermanagerimpl object, service object, and noderesourcemonitor object, and add it to servicelist.
- Register containermanagerimpl to the previously created asynchronous event scheduler asyncdispatcher, and then add the asyncdispatcher scheduler to the Service Queue servicelist.
- Add the nodestatusupdater object to servicelist. The reason for adding this object is that the heartbeat operation must be started after all other services.
- Call the init () function of the parent class to initialize other configurations.
Main Code:
@ Override protected void serviceinit (configuration conf) throws exception {Conf. setboolean (dispatcher. required, true); nmcontainertokensecretmanager containertokensecretmanager = new nmcontainertokensecretmanager (CONF); nmtokensecretmanagerinnm nmtokensecretmanager = new pipeline (); this. aclsmanager = new applicationaclsmanager (CONF); // start containerexecutor. containerexecutor encapsulates various methods for nodemanager to perform container operations, // including starting container and querying whether the container of the specified ID is alive,. according to the configuration yarn. nodemanager. container-executor.class // determines the instance of containerexecutor, default is defaultinerexecutor. containerexecutor exec = reflectionutils. newinstance (Conf. getclass (yarnconfiguration. nm_container_executor, defaultinerexecutor. class, containerexecutor. class), conf); try {exec. init ();} catch (ioexception e) {Throw new yarnruntimeexception ("failed to initialize container executor", e);} deletionservice del = createdeletionservice (EXEC); addservice (DEL ); // nodemanager level dispatcher asynchronous distributor this. dispatcher = new asyncdispatcher (); // you can use this service to check whether the node is healthy. The health status of the current node includes nodehealthscriptrunner. ishealthy and dirshandler. arediskshealthy nodehealthchecker = new nodehealthcheckerservice (); addservice (nodehealthchecker); dirshandler = nodehealthchecker. getdiskhandler (); this. context = createnmcontext (containertokensecretmanager, nmtokensecretmanager); // create a nodestatusupdater thread, which registers and sends HEARTBEAT (Update Status) to RM ). // The resourcetracker protocol is used to communicate with RM. The underlying layer is yarnrpc. the resourcetracker interface provides two methods: Registration and heartbeat functions nodestatusupdater = createnodestatusupdater (context, dispatcher, nodehealthchecker); // MONITOR node resources (that is, whether the resources are available, four statuses, stopped, inited, notinited, started) noderesourcemonitor = plugin (); addservice (plugin); // create the containermanagerimpl service, manage the iner, and use the containermanager protocol, the containermanager protocol is the protocol for applications to communicate with nodemanager. containermanager = createcontainermanager (context, exec, Del, nodestatusupdater, this. aclsmanager, dirshandler); addservice (containermanager); (nmcontext) context ). setcontainermanager (containermanager); // create a webserver and start the nodemanager web service. through yarn. nodemanagerwebapp. address: Set the address. The default port is 8042 webserver = createwebserver (context, containermanager. getcontainersmonitor (), this. aclsmanager, dirshandler); addservice (webserver); (nmcontext) context ). setwebserver (webserver); dispatcher. register (containermanagereventtype. class, containermanager); dispatcher. register (nodemanagereventtype. class, this); addservice (dispatcher); // initialize monitoring defaultmetricssystem. initialize ("nodemanager"); // statusupdater shocould be added last so that it get started last // so that we make sure everything is up before registering with RM. addservice (nodestatusupdater); super. serviceinit (CONF); // todo add local dirs to Del}
2.4 start Function
- Perform Security Authentication
- Call the START () function of the parent class to start all the services added in serviceinit of the nodemanager class in sequence. Asyncdispatcher is responsible for event transmission, nodestatusupdater is responsible for generating heartbeat events, and containermanagerimpl is responsible for providing functions required by hadoop RPC, etc.
Specific implementation of the START method in abstractservice:
@ Override public void start () {If (isinstate (state. started) {return;} // enter the started state synchronized (statechangelock) {If (statemodel. enterstate (state. started )! = State. started) {try {starttime = system. currenttimemillis (); // call servicestart (); If (isinstate (state. started) {// if the service started (and isn't now in a later state), every y if (log. isdebugenabled () {log. debug ("service" + getname () + "is started");} policylisteners () ;}} catch (exception e) {notefailure (E); serviceoperations. stopquietly (log, this); throw servicestateexception. convert (e );}}}}
The main code of the servicestart method rewritten in nodemanager:
@Override protected void serviceStart() throws Exception { try { doSecureLogin(); } catch (IOException e) { throw new YarnRuntimeException("Failed NodeManager login", e); } super.serviceStart(); }
3. References:
Http://www.technology-mania.com/2014/05/an-insight-into-hadoop-yarn-nodemanager.html
Http://www.cnblogs.com/biyeymyhjob/archive/2012/08/18/2645576.html
Nodemanager code analysis-nodemanager Startup Process