標籤:yarn application
前言
在之前兩周主要學了HDFS中的一些模組知識,其中的許多都或多或少有我們借鑒學習的地方,現在將目光轉向另外一個塊,被譽為MRv2,就是yarn,在Yarn中,解決了MR中JobTracker單點的問題,將此拆分成了ResourceManager和NodeManager這樣的結構,在每個節點上,還會有ApplicationMaster來管理應用程式的整個生命週期,的確在Yarn中,多了許多優秀的設計,而今天,我主要分享的就是這個ApplicationMaster相關的一整套服務,他是隸屬於ResoureManager的內部服務中的.瞭解了AM的啟動機制,你將會更進一步瞭解Yarn的任務啟動過程.
ApplicationMaster管理涉及類
ApplicationMaster管理涉及到了4大類,ApplicationMasterLauncher,AMLivelinessMonitor,ApplicationMasterService,以及ApplicationMaster自身類.下面介紹一下這些類的用途,在Yarn中,每個類都會有自己明確的功能模組的區分.
1.ApplicationMasterLauncher--姑且叫做AM啟動關閉事件處理器,他既是一個服務也是一個處理器,在這個類中,只處理2類事件,launch和cleanup事件.分別對應啟動應用和關閉應用的情形.
2.AMLivelinessMonitor--這個類從名字上可以看出他是監控類,監控的對象是AM存活狀態的監控類,檢測的方法與之前的HDFS一樣,都是採用heartbeat的方式,如果有節點到期了,將會觸發一次到期事件.
3.ApplicationMasterService--AM請求服務處理類.AMS存在於ResourceManager,中,服務的對象是各個節點上的ApplicationMaster,負責接收各個AM的註冊請求,更新心跳包資訊等.
4.ApplicationMaster--節點應用管理類,簡單的說,ApplicationMaster負責管理整個應用的生命週期.
簡答的描述完AM管理的相關類,下面從源碼層級分析一下幾個流程.
AM啟動
要想讓AM啟動,啟動的背景當然是有使用者提交了新的Application的時候,之後ApplicationMasterLauncher會產生Launch事件,與對應的nodemanager通訊,讓其準備啟動的新的AM的Container.在這裡,就用到了ApplicationMasterLauncher這個類,之前在上文中已經提到,此類就處理2類事件,Launch啟動和Cleanup清洗事件,先來看看這個類的基本變數設定
//Application應用事件處理器public class ApplicationMasterLauncher extends AbstractService implements EventHandler<AMLauncherEvent> { private static final Log LOG = LogFactory.getLog( ApplicationMasterLauncher.class); private final ThreadPoolExecutor launcherPool; private LauncherThread launcherHandlingThread; //事件隊列 private final BlockingQueue<Runnable> masterEvents = new LinkedBlockingQueue<Runnable>(); //資源管理員上下文 protected final RMContext context; public ApplicationMasterLauncher(RMContext context) { super(ApplicationMasterLauncher.class.getName()); this.context = context; //初始化線程池 this.launcherPool = new ThreadPoolExecutor(10, 10, 1, TimeUnit.HOURS, new LinkedBlockingQueue<Runnable>()); //建立處理線程 this.launcherHandlingThread = new LauncherThread(); }
還算比較簡單,有一個masterEvents事件隊列,還有執行線程以及所需的線程池執行環境。在RM相關的服務中,基本都是繼承自AbstractService這個抽象服務類的。ApplicationMasterLauncher中主要處理2類事件,就是下面的展示的
@Override public synchronized void handle(AMLauncherEvent appEvent) { AMLauncherEventType event = appEvent.getType(); RMAppAttempt application = appEvent.getAppAttempt(); //處理來自ApplicationMaster擷取到的請求,分為啟動事件和清洗事件2種 switch (event) { case LAUNCH: launch(application); break; case CLEANUP: cleanup(application); default: break; } }然後調用具體的實現方法,以啟動事件launch事件為例
//添加應用啟動事件 private void launch(RMAppAttempt application) { Runnable launcher = createRunnableLauncher(application, AMLauncherEventType.LAUNCH); //將啟動事件加入事件隊列中 masterEvents.add(launcher); }這些事件被加入到事件隊列之後,是如何被處理的呢,通過訊息佇列的形式,在一個獨立的線程中逐一被執行
//執行線程實現 private class LauncherThread extends Thread { public LauncherThread() { super("ApplicationMaster Launcher"); } @Override public void run() { while (!this.isInterrupted()) { Runnable toLaunch; try { //執行方法為從事件隊列中逐一取出事件 toLaunch = masterEvents.take(); //放入線程池池中進行執行 launcherPool.execute(toLaunch); } catch (InterruptedException e) { LOG.warn(this.getClass().getName() + " interrupted. Returning."); return; } } } }如果論到事件的具體執行方式,就要看具體AMLauch是如何執行的,AMLauch本身就是一個runnable執行個體。
/** * The launch of the AM itself. * Application事件執行器 */public class AMLauncher implements Runnable { private static final Log LOG = LogFactory.getLog(AMLauncher.class); private ContainerManagementProtocol containerMgrProxy; private final RMAppAttempt application; private final Configuration conf; private final AMLauncherEventType eventType; private final RMContext rmContext; private final Container masterContainer;在裡面主要的run方法如下,就是按照事件類型進行區分操作
@SuppressWarnings("unchecked") public void run() { //AMLauncher分2中事件分別處理 switch (eventType) { case LAUNCH: try { LOG.info("Launching master" + application.getAppAttemptId()); //調用啟動方法 launch(); handler.handle(new RMAppAttemptEvent(application.getAppAttemptId(), RMAppAttemptEventType.LAUNCHED)); ... break; case CLEANUP: try { LOG.info("Cleaning master " + application.getAppAttemptId()); //調用作業清洗方法 cleanup(); ... break; default: LOG.warn("Received unknown event-type " + eventType + ". Ignoring."); break; } }後面的launch操作會調用RPC函數與遠端NodeManager通訊來啟動Container。然後到了ApplicationMaster的run()啟動方法,在啟動方法中,會進行應用註冊的方法,
@SuppressWarnings({ "unchecked" }) public boolean run() throws YarnException, IOException { LOG.info("Starting ApplicationMaster"); Credentials credentials = UserGroupInformation.getCurrentUser().getCredentials(); DataOutputBuffer dob = new DataOutputBuffer(); credentials.writeTokenStorageToStream(dob); // Now remove the AM->RM token so that containers cannot access it. Iterator<Token<?>> iter = credentials.getAllTokens().iterator(); while (iter.hasNext()) { Token<?> token = iter.next(); if (token.getKind().equals(AMRMTokenIdentifier.KIND_NAME)) { iter.remove(); } } allTokens = ByteBuffer.wrap(dob.getData(), 0, dob.getLength()); //與ResourceManager通訊,周期性發送心跳資訊,包含了應用的最新資訊 AMRMClientAsync.CallbackHandler allocListener = new RMCallbackHandler(); amRMClient = AMRMClientAsync.createAMRMClientAsync(1000, allocListener); amRMClient.init(conf); amRMClient.start(); ..... // Register self with ResourceManager // This will start heartbeating to the RM //啟動之後進行AM的註冊 appMasterHostname = NetUtils.getHostname(); RegisterApplicationMasterResponse response = amRMClient .registerApplicationMaster(appMasterHostname, appMasterRpcPort, appMasterTrackingUrl); // Dump out information about cluster capability as seen by the // resource manager int maxMem = response.getMaximumResourceCapability().getMemory(); LOG.info("Max mem capabililty of resources in this cluster " + maxMem); // A resource ask cannot exceed the max. if (containerMemory > maxMem) { LOG.info("Container memory specified above max threshold of cluster." + " Using max value." + ", specified=" + containerMemory + ", max=" + maxMem); containerMemory = maxMem; }在這個操作中,會將自己註冊到AMLivelinessMonitor中,此刻開始啟動心跳監控。
AMLiveLinessMonitor監控
在這裡把重心從ApplicationMaster轉移到AMLivelinessMonitor上,首先這是一個啟用狀態的監控線程,此類線程都有一個共同的父類
//應用存活狀態監控線程public class AMLivelinessMonitor extends AbstractLivelinessMonitor<ApplicationAttemptId> {在AbstractlinessMonitor中定義監控類線程的一類特徵和方法
//進程存活狀態監控類public abstract class AbstractLivelinessMonitor<O> extends AbstractService { private static final Log LOG = LogFactory.getLog(AbstractLivelinessMonitor.class); //thread which runs periodically to see the last time since a heartbeat is //received. //檢查線程 private Thread checkerThread; private volatile boolean stopped; //預設逾時時間5分鐘 public static final int DEFAULT_EXPIRE = 5*60*1000;//5 mins //逾時時間 private int expireInterval = DEFAULT_EXPIRE; //監控間隔檢測時間,為逾時時間的1/3 private int monitorInterval = expireInterval/3; private final Clock clock; //儲存了心跳檢驗的結果記錄 private Map<O, Long> running = new HashMap<O, Long>();心跳檢測本身非常的簡單,做一次通訊記錄檢查,然後更新一下,記錄時間,當一個新的節點加入監控或解除監控操作
//新的節點註冊心跳監控 public synchronized void register(O ob) { running.put(ob, clock.getTime()); } //節點移除心跳監控 public synchronized void unregister(O ob) { running.remove(ob); }每次做心跳周期檢測的時候,調用下述方法
//更新心跳監控檢測最新時間 public synchronized void receivedPing(O ob) { //only put for the registered objects if (running.containsKey(ob)) { running.put(ob, clock.getTime()); } }非常簡單的更新方法,O ob對象在這裡因情境而異,在AM監控中,為ApplicationID應用ID。在後面的AMS和AM的互動中會看到。新的應用加入AMLivelinessMonitor監控中後,後面的主要操作就是AMS與AM之間的互動操作了。
AM與AMS
在ApplicationMaster運行之後,會周期性的向ApplicationMasterService發送心跳資訊,心跳資訊包含有許多資源描述資訊。
//ApplicationMaster心跳資訊更新 @Override public AllocateResponse allocate(AllocateRequest request) throws YarnException, IOException { ApplicationAttemptId appAttemptId = authorizeRequest(); //進行心跳資訊時間的更新 this.amLivelinessMonitor.receivedPing(appAttemptId); ....每次心跳資訊一來,就會更新最新監控時間。在AMS也有對應的註冊應用的方法
//ApplicationMaster在ApplicationMasterService上服務上進行應用註冊 @Override public RegisterApplicationMasterResponse registerApplicationMaster( RegisterApplicationMasterRequest request) throws YarnException, IOException { ApplicationAttemptId applicationAttemptId = authorizeRequest(); ApplicationId appID = applicationAttemptId.getApplicationId(); ..... //在存活監控線程上進行心跳記錄,更新檢測時間,key為應用ID this.amLivelinessMonitor.receivedPing(applicationAttemptId); RMApp app = this.rmContext.getRMApps().get(appID); // Setting the response id to 0 to identify if the // application master is register for the respective attemptid lastResponse.setResponseId(0); responseMap.put(applicationAttemptId, lastResponse); LOG.info("AM registration " + applicationAttemptId); this.rmContext如果在心跳監控中出現到期的現象,就會觸發一個expire事件,在AMLiveLinessMonitor中,這部分的工作是交給CheckThread執行的
//進程存活狀態監控類public abstract class AbstractLivelinessMonitor<O> extends AbstractService { ... //thread which runs periodically to see the last time since a heartbeat is //received. //檢查線程 private Thread checkerThread; .... //預設逾時時間5分鐘 public static final int DEFAULT_EXPIRE = 5*60*1000;//5 mins //逾時時間 private int expireInterval = DEFAULT_EXPIRE; //監控間隔檢測時間,為逾時時間的1/3 private int monitorInterval = expireInterval/3; .... //儲存了心跳檢驗的結果記錄 private Map<O, Long> running = new HashMap<O, Long>(); ... private class PingChecker implements Runnable { @Override public void run() { while (!stopped && !Thread.currentThread().isInterrupted()) { synchronized (AbstractLivelinessMonitor.this) { Iterator<Map.Entry<O, Long>> iterator = running.entrySet().iterator(); //avoid calculating current time everytime in loop long currentTime = clock.getTime(); while (iterator.hasNext()) { Map.Entry<O, Long> entry = iterator.next(); //進行逾時檢測 if (currentTime > entry.getValue() + expireInterval) { iterator.remove(); //調用逾時處理方法,將處理事件交由調度器處理 expire(entry.getKey()); LOG.info("Expired:" + entry.getKey().toString() + " Timed out after " + expireInterval/1000 + " secs"); } } }check線程主要做的事件就是遍曆每個節點的最新心跳更新時間,通過計算差值進行判斷是否到期,到期調用expire方法。此方法由其子類實現
//應用存活狀態監控線程public class AMLivelinessMonitor extends AbstractLivelinessMonitor<ApplicationAttemptId> { //中央調度處理器 private EventHandler dispatcher; ... @Override protected void expire(ApplicationAttemptId id) { //一旦應用到期,處理器處理到期事件處理 dispatcher.handle( new RMAppAttemptEvent(id, RMAppAttemptEventType.EXPIRE)); }}產生應用超期事件,然後發給中央調度器去處理。之所以採用的這樣的方式,是因為在RM中,所有的模組設計是以事件驅動的形式工作,最大程度的保證了各個模組間的解耦。不同模組通過不同的事件轉變為不同的狀態,可以理解為狀態機器的改變。最後用一張書中的簡單的展示AM模組相關的調用過程。
全部代碼的分析請點選連結https://github.com/linyiqun/hadoop-yarn,後續將會繼續更新YARN其他方面的程式碼分析。
參考文獻
《Hadoop技術內部–HDFS結構設計與實現原理》.蔡斌等
著作權聲明:本文為博主原創文章,未經博主允許不得轉載。
YARN源碼分析(一)-----ApplicationMaster