Content:
1. Allocation of Driver (Cluster);
2, allocate resources for application;
3. Two different ways of allocating resources are completely decrypted;
4, spark resource allocation of thinking;
Spark is the most important thing that every IMF member must master, and the performance optimizations behind it all have to do with this.
The difference between ========== task scheduling and resource scheduling ============
1, task scheduling is through Dagscheduler, TaskScheduler, schedulerbackend and other job scheduling;
2, resource scheduling refers to how the application obtains resources;
3, task scheduling is on the basis of resource scheduling, no resource scheduling, task scheduling can not talk about, became water without, tree without roots ;
4, the Spark resource scheduling algorithm method is: Schedule ()
========== Resource Dispatch insider secret decryption ============
1, because Master is responsible for resource management and scheduling, so the method of resource scheduling schedule is located in the Master.scala class, when the registration program or resource changes, will lead to schedule calls, such as when registering the program:
Case RegisterApplication(Description, Driver) + = {
// TODO Prevent repeated registrations from some driver
if( State= = Recoverystate.STANDBY){
Ignore, don ' t send response
}Else{
Loginfo ("Registering App"+ description.name)
ValApp = createapplication (description, Driver
RegisterApplication (APP)
Loginfo ("registered app"+ Description.name +"with ID"+ app.id)
Persistenceengine. Addapplication (APP)
Driver.send (registeredapplication(app.id, Self))
Schedule ()
}
}
2, schedule () Call timing: Each time a new application or cluster resource state changes (including executor increase or decrease, worker increase or decrease, etc.);
3, the current master must be alive State in order to schedule resources, if not alive state, will be directly returned, that is, Standby master does not make application resource calls;
4. Using random.shuffle to randomly disrupt the information of all the workers in the cluster retained in master, the internal algorithm is the position of all the workers in the master cache data structure in the loop random exchange;
5, next to determine which workers in all the worker is alive level of worker,alive to participate in the allocation of resources;
6. When spark submit specifies driver in cluster mode, the driver will be added to the waitingdrivers wait list, In the driverdescription of each driver drvierinfo, there are requirements for the worker's memory and cores to start driver (this driver, if supervise is set, Then the drvier can be restarted automatically after hanging up);
Private[Deploy]Case Classdriverdescription(
Jarurl:String,
Mem:Int,
Cores:Int,
Supervise:Boolean,
Command:command) {
Override DeftoString:String=S "Driverdescription (${Command.mainclass})"
}
7, and then in accordance with the requirements of the resources on the basis of a randomly disturbed a worker to start the driver,master command to remote workers let the remote worker start driver, and then driver state programming running;
PRIVATE DEF  (Worker: workerinfo,  driver: driverinfo) {
Loginfo ( + Driver.id + + worker.id)
Worker.adddriver (driver)
Driver. WORKER&NBSP, = launchdriver (Driver.id,  driver.desc))// Master sends instructions to the worker to start the corresponding driver
driver. = driverstate. running
}
8, the first start drvier will occur after all the resource scheduling mode;
/**
* Schedule the currently available resources among waiting apps. This method would be called
* Every time a new app joins or resource availability changes.
*/
Private DefSchedule():Unit= {
if( State! = Recoverystate.ALIVE) {return}
//Drivers take strict precedence over executors
ValShuffledworkers = Random.shuffle (Workers)//randomization helps balance drivers
for(Worker <-ShuffledworkersifWorker. State= = Workerstate.ALIVE) {
for(Driver <-waitingdrivers) {
if(Worker.memoryfree >= driver.desc.mem && worker.coresfree >= driver.desc.cores) {
Launchdriver (worker, Driver
waitingdrivers-= Driver
}
}
}
Startexecutorsonworkers ()
}
8, spark by default for the application to start the executor way is the FIFO (first-out, queued) way, that is, all submitted applications are placed in the waiting queue of the schedule, first-out, only to meet the previous application resource allocation based on To meet the allocation of the next application resource;
9, before the application specific allocation executor to determine whether the application also need to allocate cores, if not required, will not assign executor to the application;
10, the specific allocation of executor before the requirements of the worker must be alive state and must meet the application of each executor memory and cores requirements, and on this basis to sort, put cores more in front
In FIFO case the default is Spreadoutapps to let the application run as much as possible on all node
11. There are two ways to assign executors to an application, the first is to allocate executor on all workers in the cluster as much as possible, which often leads to potentially better data locality;
/**
* Schedule and launch executors on workers
*/
Private Defstartexecutorsonworkers():Unit= {
very simple FIFO scheduler. We keep trying to fit in the first app
In the queue, then the second app, etc.
for(App <-WaitingappsifApp.coresleft >0) {
Valcoresperexecutor:option[Int] = App.desc.coresPerExecutor
Filter out workers that don ' t has enough resources to launch an executor
ValUsableworkers =Workers. Toarray.filter (_. State= = Workerstate.ALIVE)
. Filter (worker = Worker.memoryfree >= App.desc.memoryPerExecutorMB &&
Worker.coresfree >= Coresperexecutor.getorelse (1))
. SortBy (_.coresfree). Reverse
ValAssignedcores = scheduleexecutorsonworkers (app, Usableworkers, Spreadoutapps)
//Now so we ' ve decided how many cores to allocate on each worker, let's allocate them
for(Pos <-0Until Usableworkers.lengthifAssignedcores (POS) >0) {
Allocateworkerresourcetoexecutors (
App, Assignedcores (POS), Coresperexecutor, Usableworkers (POS))
}
}
}
12, the specific in the cluster allocation cores, will be as far as possible to meet our requirements;
13. If each worker is only able to assign a executor to the current application, assign only one core! at a time
var corestoassign = Math. min (App.coresleft, usableworkers.map (_.coresfree). Sum)
If We are launching one executor per worker, then every iteration assigns 1 core
to the executor. Otherwise, every iteration assigns cores to a new executor.
if (Oneexecutorperworker) {
Assignedexecutors (POS) = 1
} Else {
Assignedexecutors (pos) + = 1
}
Assuming that the 4 worker,spreadout time, will round round for the executor distribution core, one, the loop allocation, until the resource exhaustion
14, then is allocated, ready to specifically for the current application allocation of executor information, the specific master to send instructions through the remote communication to the worker to specifically start the executorbackend process;
Now the we ' ve decided how many cores to allocate on each worker and let's allocate them
for (Pos <- 0 until Usableworkers.length if assignedcores (POS) > 0) {
Allocateworkerresourcetoexecutors (
App, assignedcores (POS), coresperexecutor, usableworkers (POS))
}
/**
* Allocate A worker ' s resources to one or more executors.
* @param appThe info of the application which the executors belong to
* @param assignedcoresNumber of cores on this worker for this application
* @param coresperexecutorNumber of cores per executor
* @param workerThe worker Info
*/
Private Defallocateworkerresourcetoexecutors(
App:applicationinfo,
Assignedcores:Int,
coresperexecutor:option[Int],
Worker:workerinfo):Unit= {
//If the number of cores per executor is specified, we divide the cores assigned
//To this worker evenly among the executors with no remainder.
Otherwise, we launch a single executor that grabs all the assignedcores on this worker.
ValNumexecutors = Coresperexecutor.map {assignedcores/_}.getorelse (1)
ValCorestoassign = Coresperexecutor.getorelse (assignedcores)
for(I <-1To Numexecutors) {
Valexec = App.addexecutor (worker, Corestoassign)
Launchexecutor (worker, exec
App. State= ApplicationState.RUNNING
}
}
15. Send a executoradded message to the driver of our application immediately.
Private DefLaunchexecutor(Worker:workerinfo, EXEC:EXECUTORDESC):Unit= {
Loginfo ("Launching Executor"+ Exec.fullid +"on worker"+ worker.id)
Worker.addexecutor (EXEC)
Worker.endpoint.send (Launchexecutor(MasterUrl,
Exec.application.id, Exec.id, Exec.application.desc, Exec.cores, Exec.memory))
Exec.application.driver.send (
executoradded(exec.id, Worker.id, Worker.hostport, Exec.cores, Exec.memory))
}
Liaoliang Teacher's card:
China Spark first person
Sina Weibo: Http://weibo.com/ilovepains
Public Number: Dt_spark
Blog: http://blog.sina.com.cn/ilovepains
Mobile: 18610086859
qq:1740415547
Email: [Email protected]
This article from "a Flower proud Cold" blog, declined reprint!
Driver start-up in cluster mode, two different resource scheduling methods source thorough analysis, Resource Scheduling insider summary (DT Big Data DreamWorks)