The two most important classes in the Scheduler module are Dagscheduler and TaskScheduler. On the Dagscheduler, this article speaks of TaskScheduler.
TaskScheduler
As mentioned earlier, in the process of sparkcontext initialization, different implementations of TaskScheduler are created based on the type of master. When Master creates Taskschedulerimpl for local, Spark, Mesos, and when Master is YARN, other implementations are created, which the reader can study on its own.
Master Match { Case "Local"=Val Scheduler=NewTaskschedulerimpl (SC, max_local_task_failures, isLocal =true) Val Backend=NewLocalbackend (sc.getconf, Scheduler,1) Scheduler.initialize (backend) (backend, scheduler) CaseSpark_regex (Sparkurl) =Val Scheduler=NewTaskschedulerimpl (SC) Val masterurls= Sparkurl.split (","). Map ("spark://"+_) Val Backend=Newsparkdeployschedulerbackend (Scheduler, SC, masterurls) scheduler.initialize (backend) (backend, scheduler) CaseMesosurl @ Mesos_regex (_) =Mesosnativelibrary.load () Val Scheduler=NewTaskschedulerimpl (SC) Val coarsegrained= Sc.conf.getBoolean ("Spark.mesos.coarse",false) Val URL= Mesosurl.stripprefix ("mesos://")//strip scheme from raw Mesos URLsVal backend =if(coarsegrained) {Newcoarsemesosschedulerbackend (Scheduler, SC, URL, sc.env.securityManager)}Else { Newmesosschedulerbackend (Scheduler, SC, URL)} scheduler.initialize (Backend) (backend, Scheduler) ....}
At this time the attentive reader will have doubts, TaskScheduler need to dispatch the task on different resource management platform (local, Spark, Mesos), how can you use the same taskschedulerimpl? Note that there is a very important member of backend. Each master corresponds to a different backend, which is the backend responsible for communicating with the resource management platform.
Because this level of scheduling, need to communicate with the resource manager, so it will also be part of the deployment module and the content of the executor module. Because the local mode is too simplistic (locally initiated multithreading Task), and YARN and Mesos require programming interface-related background knowledge, here we choose Sparkdeployschedulerbackend to focus on analysis. This is a resource management system that is implemented by Spark itself, which some readers may have built and used.
Taskschedulerimpl Start-up
In Sparkcontext (the code above), the Taskschedulerimpl and Sparkdeployschedulerbackend are created first, and backend is passed into the Taskschedulerimpl. Taskschedulerimpl was then started.
// Create and start the scheduler Val (sched, TS) = Sparkcontext.createtaskscheduler (this , master) _schedulerbackend = sched_ TaskScheduler = Ts_dagscheduler = new Dagscheduler (this ) _ Heartbeatreceiver.ask[boolean] (taskschedulerisset) // start TaskScheduler after TaskScheduler sets Dagscheduler reference in Dagscheduler ' s // constructor _taskscheduler.start ()
The Taskschedulerimpl.start method is primarily called Backend.start (). Sparkdeployschedulerbackend starts by calling the Start method of the parent class Coarsegrainedschedulerbackend, which creates a driverendpoint, which is a local driver, Communicate with other executor in the same way as RPC.
Coarsegrainedschedulerbackend.scala
Override def start () { = rpcenv.setupendpoint ( new Driverendpoint ( Rpcenv, properties)}
//Sparkdeployschedulerbackend.scalaOverridedef start () {Super.start ()//The endpoint for executors to talk to usVal Driverurl =rpcenv.uriof (Sparkenv.driveractorsystemname, Rpcaddress (sc.conf.Get("Spark.driver.host"), sc.conf.Get("Spark.driver.port"). ToInt), Coarsegrainedschedulerbackend.endpoint_name) Val args=Seq ("--driver-url", Driverurl,"--executor-id","{{executor_id}}", "--hostname","{{HOSTNAME}}", "--cores","{{cores}}", "--app-id","{{app_id}}", "--worker-url","{{Worker_url}}") .... val command= Command ("Org.apache.spark.executor.CoarseGrainedExecutorBackend", args, Sc.executorenvs, classpathentries++Testingclasspath, Librarypathentries, javaopts) Val Appdesc=Newapplicationdescription (Sc.appname, Maxcores, sc.executormemory, command, Appuiaddress, Sc.eventlogdir, sc.event Logcodec, coresperexecutor) client=NewAppclient (SC.ENV.RPCENV, Masters, Appdesc, This, conf) Client.start () waitforregistration ()}
This creates a client:appclient, which is connected to Masters (spark://master:7077), with a command to start the executor, to change commands, and to Driver-url parameters. This executor automatically connects to the driver when it is started.
At this point, the boot process for Taskschedulerimpl and Sparkdeployschedulerbackend has been completed. Mainly do two things, start local driver, notify Masters start executors. and RPC communication is used by driver and executors.
Note: For executor how to start, wait for the analysis of the deploy and executor modules, and then carefully analyzed.
Taskschedulerimpl Submitting a task
In the previous article, we said that Dagscheduler finally called the Taskscheduler.submittasks commit task. The following article continues the analysis:
override def Submittasks (Taskset:taskset {Val tasks = taskset.tasks this Span style= "color: #000000;" >.synchronized {Val Manager = Createtasksetmanager (TaskSet, Maxtaskfailures) v Al stage = Taskset.stageid val stagetasksets = Tasksetsbystageidandattempt.getorelseupdate (stage, new Hashmap[int, Tasksetmanager]) stagetasksets (taskset.stageattemptid) = Manager Schedulablebuilder.addtasksetmanager (Manager, Manager.taskSet.properties)//Add task to variable Rootpool} backend.re Viveoffers ()}
The TaskSet is then packaged into Tasksetmanager and added to the Schedulablebuilder. By the way, Schedulablebuilder is a scheduling policy implementation of Spark, with FIFO and FAIR two, the default is FIFO. They eventually put the Tasksetmanager in the Rootpool.
The backend.reviveoffers is then called, and there is a more abrupt call relationship. Because Sparkdeployschedulerbackend has no method reviveoffers, it is called with the same method as the parent class Coarsegrainedschedulerbackend. There is only one row in the coarsegrainedschedulerbackend.reviveoffers implementation, i.e.
Override def reviveoffers () { driverendpoint.send (reviveoffers)}
The Makeoffers method is called when Driverendpoint receives the reviceoffers:
Private def makeoffers () { // Filter out executors under killing val activeexecutors = execut Ordatamap.filterkeys (! Executorspendingtoremove.contains (_)) case (ID, executordata) = new Workeroffer (ID, executordata.executorhost, executordata.freecores) }.toseq launchtasks ( Scheduler.resourceoffers (workoffers))}
The entire invocation relationship is taskscheduler.submittasks (), Coarsegrainedschedulerbackend.reviveoffers (), Rpcendpointref.send ( reviveoffers)->driverendpoint.reviceoffers-driverendpoint.makeoffers
This calls Taskschedulerimpl's method resourceoffers, which allocates compute resources to the task. The coarsegrainedschedulerbackend.launchtasks is then called, and the compute task is really sent to executor.
//Launch tasks returned by a set of resource offersPrivatedef launchtasks (Tasks:seq[seq[taskdescription]) { for(Task <-Tasks.flatten) {val Serializedtask= Ser.serialize (Task)//Serializing a task if(Serializedtask.limit >= Akkaframesize-akkautils.reservedsizebytes) {//serialization result exceeds upper limit, alarm } Else{val Executordata= Executordatamap (Task.executorid)//Select a executor to execute a taskExecutordata.freecores-= scheduler. Cpus_per_task//Tagged executor CPU resources are taking up part of theExecutorData.executorEndpoint.send (Launchtask (NewSerializablebuffer (Serializedtask)))//Sending execution task information to the executor RPC server } }}
As mentioned above, when the Taskschedulerimpl started, Masters also started the executor, the specific starting method is org.apache.spark.executor.CoarseGrainedExecutorBackend. So the method that executor accepts the message is also in this class:
//Org.apache.spark.executor.CoarseGrainedExecutorBackendOverridedef receive:partialfunction[any, Unit] = { CaseLaunchtask (data) =if(Executor = =NULL) {LogError ("Received launchtask Command But executor is null") System.exit (1) } Else{val Taskdesc= Ser.deserialize[taskdescription] (data.value)//Send serializationLoginfo ("Got assigned Task"+taskdesc.taskid) Executor.launchtask ( This, taskId = taskdesc.taskid, Attemptnumber =Taskdesc.attemptnumber, Taskdesc.name, Taskdesc.serializedtask)} ...}
Executor Execute Task:
// Org.apache.spark.executor.Executor def launchtask ( context:executorbackend, Taskid:long, attemptnumber:int, taskname:string, = { new// Create thread execution task runningtasks.put (taskId, TR) Threadpool.execute (TR)}
Taskrunner uses ClassLoader to load a Task from a byte and executes the resulting result, serializing the result using RPC return.
//Org.apache.spark.executor.Executor.TaskRunnerOverridedef run (): Unit ={execbackend.statusupdate (taskId, taskstate.running, Empty_byte_buffer)//Executor notifies driver that a task is being executed Try{val (taskfiles, Taskjars, taskbytes)= Task.deserializewithdependencies (Serializedtask)//Deserialization of dependent files, jar packages, task itselfupdatedependencies (taskfiles, taskjars) Task=Ser.deserialize[task[any]] (taskbytes, Thread.currentThread.getContextClassLoader) Task.settaskmemorymanager ( Taskmemorymanager) Val (value, accumupdates)=Try{val res= Task.run (Taskattemptid = taskId, Attemptnumber = attemptnumber, Metricssystem = Env.metricssystem)//Execute TaskRes}finally { ... } Val Resultser=env.serializer.newInstance () Val valuebytes= Resultser.serialize (value)//Serialization Resultsexecbackend.statusupdate (TaskId, taskstate.finished, Serializedresult)//returns the result to driver in RPC mode}Catch { ...}
There is a general description of the process that the task runs within Taskschedulerimpl. It's a little too much of a branch, but it doesn't affect the reader's understanding of the overall process.
Summarize
Spark Scheduler module up and down two the scheduling logic for Spark has a general description of the order in which it is executed.
The code architecture of the Scheduler module fully embodies the design philosophy of layering and isolation. First of all, Dagscheduler is the unique logic of Spark, and TaskScheduler is different because of resource scheduler, so the dispatch part is separated into these two parts, the former need only one implementation, and the latter can be implemented on different platforms. Even with TaskScheduler, there are commonalities on multiple platforms, so Taskschedulerimpl is also a more general implementation, except that the communication section of the resource scheduler uses a different backend.
Spark Scheduler module (bottom)