Driver在Cluster模式下的啟動、兩種不同的資源調度方式源碼徹底解析、資源調度內幕總結(DT大資料夢工廠)

最後更新：2016-02-21 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

標籤：driver在cluster模式下的啟動、兩種不同的資源調度方式源碼徹底解析、資源調度內幕總結

內容：

1、分配 Driver(Cluster)；

2、為Application分配資源；

3、兩種不同的資源分派方式徹底解密；

4、Spark資源分派的思考；

Spark最最重要的，這個內容每個IMF成員必須掌握，後面的效能最佳化全部跟這個有關。

==========任務調度與資源調度的區別============

1、任務調度是通過DAGScheduler、TaskScheduler、SchedulerBackend等進行的作業調度；

2、資源調度是指應用程式如何擷取資源；

3、任務調度是在資源調度的基礎上進行的，沒有資源調度，任務調度就無從談起，就成為了無源之水、無本之木；

4、Spark資源調度演算法的方法是：schedule()

==========資源調度內幕天機解密============

1、因為Master負責資源管理和調度，所以資源調度的方法Schedule位於Master.scala類中，當註冊程式或者資源發生改變的時候，都會導致schedule的調用，例如註冊程式的時候：

case RegisterApplication(description, driver) => {
// TODO Prevent repeated registrations from some driver
if (state == RecoveryState.STANDBY) {
// ignore, don‘t send response
} else {
logInfo("Registering app " + description.name)
val app = createApplication(description, driver)
registerApplication(app)
logInfo("Registered app " + description.name + " with ID " + app.id)
persistenceEngine.addApplication(app)
driver.send(RegisteredApplication(app.id, self))
schedule()
}
}

2、schedule()調用時機：每次有新的應用程式或者叢集資源狀態發生改變的時候（包括Executor增加或者減少、Worker增加或者減少等）；

3、當前Master必須是Alive的狀態才能進行資源的調度，如果不是Alive的狀態，會直接返回，也就是說，Standby Master不會進行Application的資源調用；

4、使用Random.shuffle把Master中保留的叢集中所有Worker的資訊隨機打亂，其演算法內部是迴圈隨機交換所有Worker在Master快取資料結構中的位置；

5、接下來要判斷所有Worker中哪些Worker是ALIVE層級的Worker，ALIVE才能參與資源的分配工作；

6、當Spark submit指定Driver在Cluster模式的情況下，此時driver會加入waitingDrivers等待列表中，在每個Driver的DrvierInfo中的driverDescription中有要啟動Driver時候對Worker的記憶體及Cores的要求等內容（這個Driver如果設定了supervise，則drvier掛掉之後可以自動重啟）；

private[deploy] case class DriverDescription(
jarUrl: String,
mem: Int,
cores: Int,
supervise: Boolean,
command: Command) {

override def toString: String = s"DriverDescription (${command.mainClass})"
}

7、然後在符合資源要求的基礎上用隨機打亂的一個Worker來啟動Driver，Master髮指令給遠端Worker讓遠端Worker啟動driver，然後driver的state就編程RUNNING了；

private def launchDriver(worker: WorkerInfo, driver: DriverInfo) {
logInfo("Launching driver " + driver.id + " on worker " + worker.id)
worker.addDriver(driver)
driver.worker = Some(worker)
worker.endpoint.send(LaunchDriver(driver.id, driver.desc))//Master髮指令給Worker啟動對應的driver
driver.state = DriverState.RUNNING
}

8、先啟動Drvier才會發生後續的一切的資源調度的模式；

/**
* Schedule the currently available resources among waiting apps. This method will be called
* every time a new app joins or resource availability changes.
*/
private def schedule(): Unit = {
if (state != RecoveryState.ALIVE) { return }
// Drivers take strict precedence over executors
val shuffledWorkers = Random.shuffle(workers) // Randomization helps balance drivers
for (worker <- shuffledWorkers if worker.state == WorkerState.ALIVE) {
for (driver <- waitingDrivers) {
if (worker.memoryFree >= driver.desc.mem && worker.coresFree >= driver.desc.cores) {
launchDriver(worker, driver)
waitingDrivers -= driver
}
}
}
startExecutorsOnWorkers()
}

8、spark預設為應用程式啟動Executor的方式是採用FIFO（先進先出，排隊）的方式，也就是說所有提交的應用程式都是放在調度的等待隊列中的，先進先出，只有滿足了前面應用程式的資源分派基礎上，才能夠滿足下一個應用程式資源的分配；

9、為應用程式具體分配Executor之前要判斷應用程式是否還需要分配cores，如果不需要，則不會為應用程式分配Executor；

10、具體分配Executor之前要對要求Worker必須是ALIVE狀態且必須滿足Application對每個Executor的記憶體和cores 的要求，並且在此基礎上進行排序，把cores多的放在前面

在FIFO情況下預設是spreadOutApps來讓應用程式儘可能多的運行在所有的node上

11、為應用程式分配Executors有兩種方式，第一種方式是儘可能在叢集的所有Worker上分配Executor，這種方式往往會帶來潛在的更好的資料本地性；

/**
* Schedule and launch executors on workers
*/
private def startExecutorsOnWorkers(): Unit = {
// Right now this is a very simple FIFO scheduler. We keep trying to fit in the first app
// in the queue, then the second app, etc.
for (app <- waitingApps if app.coresLeft > 0) {
val coresPerExecutor: Option[Int] = app.desc.coresPerExecutor
// Filter out workers that don‘t have enough resources to launch an executor
val usableWorkers = workers.toArray.filter(_.state == WorkerState.ALIVE)
.filter(worker => worker.memoryFree >= app.desc.memoryPerExecutorMB &&
worker.coresFree >= coresPerExecutor.getOrElse(1))
.sortBy(_.coresFree).reverse
val assignedCores = scheduleExecutorsOnWorkers(app, usableWorkers, spreadOutApps)

// Now that we‘ve decided how many cores to allocate on each worker, let‘s allocate them
for (pos <- 0 until usableWorkers.length if assignedCores(pos) > 0) {
allocateWorkerResourceToExecutors(
app, assignedCores(pos), coresPerExecutor, usableWorkers(pos))
}
}
}

12、具體在叢集上分配cores的時候，會儘可能滿足我們的要求；

13、如果是每個Worker下面只能夠為當前的應用程式分配一個Executor的話，每次只分配一個Core！

var coresToAssign = math.min(app.coresLeft, usableWorkers.map(_.coresFree).sum)

// If we are launching one executor per worker, then every iteration assigns 1 core
// to the executor. Otherwise, every iteration assigns cores to a new executor.
if (oneExecutorPerWorker) {
assignedExecutors(pos) = 1
} else {
assignedExecutors(pos) += 1
}

假設4個Worker，spreadOut時候，會一輪一輪為executor分配core的，一個一個，迴圈分配，直到資源耗盡

14、然後就是分配了，準備好具體要為當前應用程式分配的Executor資訊後，具體Master要通過遠程通訊髮指令給Worker來具體啟動ExecutorBackEnd進程；

// Now that we‘ve decided how many cores to allocate on each worker, let‘s allocate them
for (pos <- 0 until usableWorkers.length if assignedCores(pos) > 0) {
allocateWorkerResourceToExecutors(
app, assignedCores(pos), coresPerExecutor, usableWorkers(pos))
}

/**
* Allocate a worker‘s resources to one or more executors.
* @param app the info of the application which the executors belong to
* @param assignedCores number of cores on this worker for this application
* @param coresPerExecutor number of cores per executor
* @param worker the worker info
*/
private def allocateWorkerResourceToExecutors(
app: ApplicationInfo,
assignedCores: Int,
coresPerExecutor: Option[Int],
worker: WorkerInfo): Unit = {
// If the number of cores per executor is specified, we divide the cores assigned
// to this worker evenly among the executors with no remainder.
// Otherwise, we launch a single executor that grabs all the assignedCores on this worker.
val numExecutors = coresPerExecutor.map { assignedCores / _ }.getOrElse(1)
val coresToAssign = coresPerExecutor.getOrElse(assignedCores)
for (i <- 1 to numExecutors) {
val exec = app.addExecutor(worker, coresToAssign)
launchExecutor(worker, exec)
app.state = ApplicationState.RUNNING
}
}

15、緊接著給我們應用程式的Driver發送一個ExecutorAdded的資訊

private def launchExecutor(worker: WorkerInfo, exec: ExecutorDesc): Unit = {
logInfo("Launching executor " + exec.fullId + " on worker " + worker.id)
worker.addExecutor(exec)
worker.endpoint.send(LaunchExecutor(masterUrl,
exec.application.id, exec.id, exec.application.desc, exec.cores, exec.memory))
exec.application.driver.send(
ExecutorAdded(exec.id, worker.id, worker.hostPort, exec.cores, exec.memory))
}

王家林老師名片：

中國Spark第一人

新浪微博：http://weibo.com/ilovepains

公眾號：DT_Spark

部落格：http://blog.sina.com.cn/ilovepains

手機：18610086859

QQ：1740415547

郵箱：[email protected]

本文出自 “一枝花傲寒” 部落格，謝絕轉載！

Driver在Cluster模式下的啟動、兩種不同的資源調度方式源碼徹底解析、資源調度內幕總結(DT大資料夢工廠)

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More