Yarn am communicates with RM

Source: Internet
Author: User

Appmaster requests resources from RM

// The appmaster sends a heartbeat request to RM, updates the resource request structure, and extracts the allocated resources from the allocated memory structure. The specific task allocation is the backend asynchronous heartbeat driving by NM: serviceinit // service to allocate containers from RM (if non-Uber) or to fake it (uber) containerallocator = createcontainerallocator (null, context); addifservice (containerallocator); dispatcher. register (containerallocator. eventtype. class, containerallocator); protected containerallocator createcontainerallocator (Final Clientservice, final appcontext context) {return new centers (clientservice, context); //} private final class extends abstractservice implements containerallocator, rmheartbeathandler {private final clientservice; private Final appcontext context; private containerallocator ;..... @ override protected void s Ervicestart () throws exception {If (job. isuber () {This. containerallocator = new locallocalinerallocator (this. clientservice, this. context, nmhost, nmport, nmhttpport, containerid);} else {This. containerallocator = new rmcontainerallocator (// This. clientservice, this. context);} (service) This. containerallocator ). init (getconfig (); (service) This. containerallocator ). start (); super. servicest Art (); Org. apache. hadoop. mapreduce. v2.app. rm; rmcontainerallocator class has this method protected synchronized void HEARTBEAT () throws exception {schedulestats. updateandlogifchanged ("before scheduling:"); List <container> allocatedcontainers = getresources (); // send a heartbeat message remotely from RM, note that there may not be any new resource request information in the heartbeat. // you only need to tell RM that it is still alive, or you just need to get the allocated resource from RM if (allocatedcontainers. size ()> 0) {scheduledrequests. assign (allocatedcontainers); // obtain the conta Specifically, the iner is assigned to the task (which should be re-ordered)} fields included in the resource request: priority, expected host, memory size, etc. (three-way replication by default, there may be 7 resource requests, 3 local, 3 rack, and 1 random)} The method of rmcontainerallocator parent class rmcommunicator protected void startallocatorthread () {allocatorthread = new thread (New runnable () {@ override public void run () {While (! Stopped. Get ()&&! Thread. currentthread (). isinterrupted () {try {thread. sleep (rmpollinterval); // by default, try {HEARTBEAT (); // send the heartbeat... private list <container> getresources () throws exception {int headroom = getavailableresources ()! = NULL? Getavailableresources (). getmemory (): 0; // first time it wocould be null allocateresponse response;/** if contact with RM is lost, the am will wait mr_am_to_rm_wait_interval_ms * milliseconds before aborting. during this interval, am will still try * to contact the RM. */try {response = makeremoterequest (); // The Key makeremoterequest method is the method defined by its parent class rmcontainerrequestor. Protected allocateresponse makeremot Erequest () throws ioexception {resourceblacklistrequest blacklistrequest = resourceblacklistrequest. newinstance (New arraylist <string> (blacklistadditions), new arraylist <string> (blacklistremovals); allocaterequest = // create a resource request allocaterequest. newinstance (lastresponseid, super. getapplicationprogress (), new arraylist <resourcerequest> (ASK), // This ask is a collection class, stores resourcerequest instances, // There are only new methods, where New arraylist <containerid> (release), blacklistrequest); allocateresponse; try {allocateresponse = scheduler. allocate (allocaterequest); // the key is to allocate resources. The sched here is not the scheduler // But applicationmasterprotocol, he will eventually call the scheduler to create a new protected applicationmasterprotocol schedator ;... protected void servicestart () throws exception {scheduler = createschedulerproxy ();.. prot Ected applicationmasterprotocol createschedulerproxy () {final configuration conf = getconfig (); try {return clientrmproxy. creatermproxy (Conf, applicationmasterprotocol. class); // applicationmasterprotocol protocol is critical // call the method in applicationmasterservice remotely} catch (ioexception e) {Throw new yarnruntimeexception (E );}} // The next trace of the value assignment of ask is where to call the value assignment method of // ask, and the last is the addcontainerreq method. This method calls PRI in rmcontainerallocator. Vate void addresourcerequesttoask (resourcerequest remoterequest) {// because objects inside the resource map can be deleted ask can end up // containing an object that matches new resource object but with different // numcontainers. so exisintg values must be replaced explicitly if (ask. contains (remoterequest) {ask. remove (remoterequest);} Ask. add (remoterequest);} protected void addcontainerreq (Containerrequest req) {// create resource requests for (string HOST: Req. hosts) {// data-local if (! Isnodeblacklisted (host) {addresourcerequest (req. priority, host, req. capability) ;}// nothing rack-local for now (string Rack: req. racks) {addresourcerequest (req. priority, Rack, req. capability);} // off-switch addresourcerequest (req. priority, resourcerequest. any, req. capability);} rmcontainerallocator void addmap (containerrequestevent event) {// addmap method containerrequest request = NULL; If (event. getearlierattemptfailed () {earlierfailedmaps. add (event. getattemptid (); Request = new containerrequest (event, priority_fast_fail_map); log.info ("added" + event. getattemptid () + "to list of failed Maps");} else {for (string HOST: event. gethosts () {parameter list <taskattemptid> List = mapshow.apping. get (host); If (list = NULL) {list = new shortlist <taskattemptid> (); mapshostmapping. P Ut (host, list);} List. add (event. getattemptid (); If (log. isdebugenabled () {log. debug ("added attempt req to host" + host) ;}for (string Rack: event. getracks () {referlist <taskattemptid> List = mapsrackmapping. get (Rack); If (list = NULL) {list = new shortlist <taskattemptid> (); mapsrackmapping. put (rack, list);} List. add (event. getattemptid (); If (log. isdebugenabled () {log. debug ("added Attempt req to rack "+ rack) ;}} request = new containerrequest (event, priority_map);} maps. put (event. getattemptid (), request); addcontainerreq (request); // call // addmap within this method is called protected synchronized void handleevent (containerallocatorevent event) {recalculatereduceschedule = true ;.................. scheduledrequests. addmap (reqevent); // maps are immediately scheduled protected void servicestart () Throws exception {This. eventhandlingthread = new thread () {@ suppresswarnings ("unchecked") @ override public void run () {containerallocatorevent; while (! Stopped. Get ()&&! Thread. currentthread (). isinterrupted () {try {event = rmcontainerallocator. This. eventqueue. Take (); // retrieve event} catch (interruptedexception e) {If (! Stopped. get () {log. error ("returning, interrupted:" + E);} return;} Try {handleevent (event); // call // Add the event to the mrappmaster, the added event is processed in the preceding method. Where has this method been called? Public void handle (containerallocatorevent event) {This. containerallocator. Handle (event );}

Rm side accepts appmaster heartbeat request

// Conclusion: applicationmaster finally reports resource requirements to RM through applicationmasterprotocol # allocate. The applicationmasterservice of RM provides services, and finally calls the scheduler's allocate // write new resource requirements into the memory structure, and return the allocated resources public class applicationmasterservice extends abstractservice implements applicationmasterprotocol {public allocateresponse allocate (allocaterequest request) throws yarnexception, ioexception {.. // allow only one thread in AM to do heartbeat at a Tim E. synchronized (lastresponse) {// send the status update to the appattempt. this. rmcontext. getdispatcher (). geteventhandler (). handle (New rmappattemptstatusupdateevent (appattemptid, request. getprogress (); List <resourcerequest> ask = request. getasklist (); // ask, release is the encapsulated request list <containerid> release = request. getreleaselist (// send new requests to appattempt. allocation allocation = This. rsch Edges. allocate (appattemptid, ask, release, blacklistadditions, blacklistremovals); // calls rscheduler on the RM end .. allocateresponse. setupdatednodes (updatednodereports);} // encapsulate a response and return allocateresponse. setallocatedcontainers (allocation. getcontainers (); allocateresponse. setcompletedcontainersstatuses (appattempt. pulljustfinishedcontainers (); allocateresponse. setresponseid (lastresponse. getresponseid () + 1); allocateresponse. setavailableresources (allocation. getresourcelimit (); allocateresponse. setnumclusternodes (this. rscheduler. getnumclusternodes (); // Add preemption to the allocateresponse message (if any) allocateresponse. setpreemptionmessage (generatepreemptionmessage (Allocation); // Adding nmtokens for allocated containers. if (! Allocation. getcontainers (). isempty () {allocateresponse. setnmtokens (rmcontext. getnmtokensecretmanager (). createandgetnmtokens (App. getuser (), appattemptid, // allocate method of FIFO scheduler... // update application requests application. updateresourcerequests (ASK); // write the resource request to the application's request memory structure. After the NM sends the heartbeat allocation, write it to the application's allocation memory structure, // The final map <priority, Map <string, resourcerequest> requests = // New hashmap <priority, Map <string, resourcerequest> ();... return new allocation (application. pullnewlyallocatedcontainers (), // collection class inside the application, which obtains the application from the allocated memory structure. getheadroom (); // The application is ficaschedulerapp synchronized public list <container> events () {list <container> returncontainerlist = new arraylist <container> (newlyallocatedcontainers. size (); For (rmconta Iner rmcontainer: newlyallocatedcontainers) {// It is only obtained from newlyallocatedcontainers. the value assigned by newlyallocatedcontainers is the rmcontainer assigned after the assigncontainer is called after nm sends heartbeat. handle (New rmcontainerevent (rmcontainer. getcontainerid (), rmcontainereventtype. acquired); returncontainerlist. add (rmcontainer. getcontainer ();} newlyallocatedcontainers. clear (); Return returncontainerlist;} synchronized public rmcontainer allocate (Nodetype type, ficaschedulernode node, priority, resourcerequest request, container ){.... // Add it to allcontainers list. newlyallocatedcontainers. add (rmcontainer); // assign a value to it // The FIFO schediner class calls the above method. This method is the final method of sending heartbeat to nm. Private int assigncontainer (ficaschedulernode, ficaschedulerapp application, priority priority, int assignablecontainers, resourcerequest request, nodetype typ E ){....} // create the Container = builderutils. newcontainer (containerid, nodeid, node. getrmnode (). gethttpaddress (), capability, priority, containertoken); // allocate! // Inform the application rmcontainer = application. allocate (type, node, priority, request, container); // In conclusion, the appmaster sends a request to RM, which is a resource request returned from the current memory structure, this process is asynchronous. When the NM sends a heartbeat, resources are allocated according to the resource request of the appmaster. // The resources are written to the memory structure and obtained by the appmaster (the sent resource request must be saved first, the memory structure of the resource request is saved in application ficaschedulerapp. showrequests ()


Yarn am communicates with RM

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.