標籤:driver在cluster模式下的啟動、兩種不同的資源調度方式源碼徹底解析、資源調度內幕總結
內容:
1、分配 Driver(Cluster);
2、為Application分配資源;
3、兩種不同的資源分派方式徹底解密;
4、Spark資源分派的思考;
Spark最最重要的,這個內容每個IMF成員必須掌握,後面的效能最佳化全部跟這個有關。
==========任務調度與資源調度的區別============
1、任務調度是通過DAGScheduler、TaskScheduler、SchedulerBackend等進行的作業調度;
2、資源調度是指應用程式如何擷取資源;
3、任務調度是在資源調度的基礎上進行的,沒有資源調度,任務調度就無從談起,就成為了無源之水、無本之木;
4、Spark資源調度演算法的方法是:schedule()
==========資源調度內幕天機解密============
1、因為Master負責資源管理和調度,所以資源調度的方法Schedule位於Master.scala類中,當註冊程式或者資源發生改變的時候,都會導致schedule的調用,例如註冊程式的時候:
case RegisterApplication(description, driver) => {
// TODO Prevent repeated registrations from some driver
if (state == RecoveryState.STANDBY) {
// ignore, don‘t send response
} else {
logInfo("Registering app " + description.name)
val app = createApplication(description, driver)
registerApplication(app)
logInfo("Registered app " + description.name + " with ID " + app.id)
persistenceEngine.addApplication(app)
driver.send(RegisteredApplication(app.id, self))
schedule()
}
}
2、schedule()調用時機:每次有新的應用程式或者叢集資源狀態發生改變的時候(包括Executor增加或者減少、Worker增加或者減少等);
3、當前Master必須是Alive的狀態才能進行資源的調度,如果不是Alive的狀態,會直接返回,也就是說,Standby Master不會進行Application的資源調用;
4、使用Random.shuffle把Master中保留的叢集中所有Worker的資訊隨機打亂,其演算法內部是迴圈隨機交換所有Worker在Master快取資料結構中的位置;
5、接下來要判斷所有Worker中哪些Worker是ALIVE層級的Worker,ALIVE才能參與資源的分配工作;
6、當Spark submit指定Driver在Cluster模式的情況下,此時driver會加入waitingDrivers等待列表中,在每個Driver的DrvierInfo中的driverDescription中有要啟動Driver時候對Worker的記憶體及Cores的要求等內容(這個Driver如果設定了supervise,則drvier掛掉之後可以自動重啟);
private[deploy] case class DriverDescription(
jarUrl: String,
mem: Int,
cores: Int,
supervise: Boolean,
command: Command) {
override def toString: String = s"DriverDescription (${command.mainClass})"
}
7、然後在符合資源要求的基礎上用隨機打亂的一個Worker來啟動Driver,Master髮指令給遠端Worker讓遠端Worker啟動driver,然後driver的state就編程RUNNING了;
private def launchDriver(worker: WorkerInfo, driver: DriverInfo) {
logInfo("Launching driver " + driver.id + " on worker " + worker.id)
worker.addDriver(driver)
driver.worker = Some(worker)
worker.endpoint.send(LaunchDriver(driver.id, driver.desc))//Master髮指令給Worker啟動對應的driver
driver.state = DriverState.RUNNING
}
8、先啟動Drvier才會發生後續的一切的資源調度的模式;
/**
* Schedule the currently available resources among waiting apps. This method will be called
* every time a new app joins or resource availability changes.
*/
private def schedule(): Unit = {
if (state != RecoveryState.ALIVE) { return }
// Drivers take strict precedence over executors
val shuffledWorkers = Random.shuffle(workers) // Randomization helps balance drivers
for (worker <- shuffledWorkers if worker.state == WorkerState.ALIVE) {
for (driver <- waitingDrivers) {
if (worker.memoryFree >= driver.desc.mem && worker.coresFree >= driver.desc.cores) {
launchDriver(worker, driver)
waitingDrivers -= driver
}
}
}
startExecutorsOnWorkers()
}
8、spark預設為應用程式啟動Executor的方式是採用FIFO(先進先出,排隊)的方式,也就是說所有提交的應用程式都是放在調度的等待隊列中的,先進先出,只有滿足了前面應用程式的資源分派基礎上,才能夠滿足下一個應用程式資源的分配;
9、為應用程式具體分配Executor之前要判斷應用程式是否還需要分配cores,如果不需要,則不會為應用程式分配Executor;
10、具體分配Executor之前要對要求Worker必須是ALIVE狀態且必須滿足Application對每個Executor的記憶體和cores 的要求,並且在此基礎上進行排序,把cores多的放在前面
在FIFO情況下預設是spreadOutApps來讓應用程式儘可能多的運行在所有的node上
11、為應用程式分配Executors有兩種方式,第一種方式是儘可能在叢集的所有Worker上分配Executor,這種方式往往會帶來潛在的更好的資料本地性;
/**
* Schedule and launch executors on workers
*/
private def startExecutorsOnWorkers(): Unit = {
// Right now this is a very simple FIFO scheduler. We keep trying to fit in the first app
// in the queue, then the second app, etc.
for (app <- waitingApps if app.coresLeft > 0) {
val coresPerExecutor: Option[Int] = app.desc.coresPerExecutor
// Filter out workers that don‘t have enough resources to launch an executor
val usableWorkers = workers.toArray.filter(_.state == WorkerState.ALIVE)
.filter(worker => worker.memoryFree >= app.desc.memoryPerExecutorMB &&
worker.coresFree >= coresPerExecutor.getOrElse(1))
.sortBy(_.coresFree).reverse
val assignedCores = scheduleExecutorsOnWorkers(app, usableWorkers, spreadOutApps)
// Now that we‘ve decided how many cores to allocate on each worker, let‘s allocate them
for (pos <- 0 until usableWorkers.length if assignedCores(pos) > 0) {
allocateWorkerResourceToExecutors(
app, assignedCores(pos), coresPerExecutor, usableWorkers(pos))
}
}
}
12、具體在叢集上分配cores的時候,會儘可能滿足我們的要求;
13、如果是每個Worker下面只能夠為當前的應用程式分配一個Executor的話,每次只分配一個Core!
var coresToAssign = math.min(app.coresLeft, usableWorkers.map(_.coresFree).sum)
// If we are launching one executor per worker, then every iteration assigns 1 core
// to the executor. Otherwise, every iteration assigns cores to a new executor.
if (oneExecutorPerWorker) {
assignedExecutors(pos) = 1
} else {
assignedExecutors(pos) += 1
}
假設4個Worker,spreadOut時候,會一輪一輪為executor分配core的,一個一個,迴圈分配,直到資源耗盡
14、然後就是分配了,準備好具體要為當前應用程式分配的Executor資訊後,具體Master要通過遠程通訊髮指令給Worker來具體啟動ExecutorBackEnd進程;
// Now that we‘ve decided how many cores to allocate on each worker, let‘s allocate them
for (pos <- 0 until usableWorkers.length if assignedCores(pos) > 0) {
allocateWorkerResourceToExecutors(
app, assignedCores(pos), coresPerExecutor, usableWorkers(pos))
}
/**
* Allocate a worker‘s resources to one or more executors.
* @param app the info of the application which the executors belong to
* @param assignedCores number of cores on this worker for this application
* @param coresPerExecutor number of cores per executor
* @param worker the worker info
*/
private def allocateWorkerResourceToExecutors(
app: ApplicationInfo,
assignedCores: Int,
coresPerExecutor: Option[Int],
worker: WorkerInfo): Unit = {
// If the number of cores per executor is specified, we divide the cores assigned
// to this worker evenly among the executors with no remainder.
// Otherwise, we launch a single executor that grabs all the assignedCores on this worker.
val numExecutors = coresPerExecutor.map { assignedCores / _ }.getOrElse(1)
val coresToAssign = coresPerExecutor.getOrElse(assignedCores)
for (i <- 1 to numExecutors) {
val exec = app.addExecutor(worker, coresToAssign)
launchExecutor(worker, exec)
app.state = ApplicationState.RUNNING
}
}
15、緊接著給我們應用程式的Driver發送一個ExecutorAdded的資訊
private def launchExecutor(worker: WorkerInfo, exec: ExecutorDesc): Unit = {
logInfo("Launching executor " + exec.fullId + " on worker " + worker.id)
worker.addExecutor(exec)
worker.endpoint.send(LaunchExecutor(masterUrl,
exec.application.id, exec.id, exec.application.desc, exec.cores, exec.memory))
exec.application.driver.send(
ExecutorAdded(exec.id, worker.id, worker.hostPort, exec.cores, exec.memory))
}
王家林老師名片:
中國Spark第一人
新浪微博:http://weibo.com/ilovepains
公眾號:DT_Spark
部落格:http://blog.sina.com.cn/ilovepains
手機:18610086859
QQ:1740415547
郵箱:[email protected]
本文出自 “一枝花傲寒” 部落格,謝絕轉載!
Driver在Cluster模式下的啟動、兩種不同的資源調度方式源碼徹底解析、資源調度內幕總結(DT大資料夢工廠)