When a task is submitted, spark mainly exists in the two nodes of driver and executor.
(1) Driver: It is used to convert the operation of all the RDD to be processed into a DAG, and the JBO is split into multiple stages based on the Rdd Dag, and the corresponding task is generated and distributed to each executor execution.
Process: Sc.runjob, dagscheduler.runjob->submitjob->dageventprocessactor dagscheduler.handlejobsubmitted->submitstage->submitmissingtasks->taskscheduler.submittasks Schedulerbackend.reviveoffers->reviveoffers->driveractor->makeoffers, resourceOffers Launchtasks->coarsegrainedexecutorbackend (Executor)
Among them, handlejobsubmitted and Submitstage are responsible for dependency analysis, generating finalstage, and generating job according to Finalstage.
Source Newstage is used to create a new stage
Private def newstage ( rdd:rdd[], numtasks:int, shuffledep:option[shuffledependency[_,_,_], Jobid:int, callsite:callsite) : stage = { val id = nextstageid.getandincrement () val stage = new Stage (Id,rdd,numtasks,shuffledep,getparentstages (rdd,jobid), jobid,callsite) stageidtostage (id) = Stage Updatejobidstageidmaps (jobid,stage) Stagetoinfos (stage) = Stageinfo.fromstage (stage) stage}
Before creating a stage, spark must know how many partition the stage needs to read into the data, and create the task number accordingly. SOURCE Stage:
Private[spark] class stage ( Val Id:int//stage The larger the number, the larger the value Val rdd:rdd[_],//The last Rdd Val that belongs to this stage Numtasks:int,//number of tasks created, equal to the number of output partition of the parent Rdd Val shuffledep:option[shuffledependency[_,_,_]],// Whether there is a shuffle Val parents:list[stage],//parent Stage list val jobid:int,//job ID val callsite:callsite)
The important basis for the division of the stage is whether there is a shuffle operation, a wide dependency (the wide dependency of the Rdd and the narrow dependency refer to the previous article, or Baidu-), and if so, create a new stage. The stage division is complete with a clear set of content, as follows:
(1) How many partition are needed to read the data in the resulting stage
(2) How many partition the resulting stage will generate
(3) Whether the resulting stage belongs to shuffle
When the number of partition is confirmed, the number of tasks is actually confirmed.
During the job submission and execution, there is a large number of message interactions in the spark cluster, so the message is received using Akka, the message is processed, and the message is sent.
The following begins the execution of a task in each executor. However, the task is divided into two kinds of shufflemaptask and Resulttask, The equivalent of Hadoop's map and reduce. Each stage identifies the task type according to Isshufflemap to differentiate between Shufflemaptask and Resulttask. Once the task type and quantity are determined, it is distributed to each executor, by execut Or start the county to execute. (from plan to execution)
Taskschedulerimple sends a reviveoffers message to Driveractor,driveractor after the reviveoffers message is received, the Makeoffers function is called for processing. The source code is as follows:
Def makeoffers () { launchtasks (scheduler.resourceoffers ( executorhost.toarray.map{case (id,host) =>new Workeroffer (Id,host,freecores (ID))}))
The Makeoffers function is mainly used to find idle executor, distribute randomly, and divide tasks into executor as much as possible. Found free executor, the task list of some tasks using launchtasks sent to the established Executor.task execution completed.
Spark Job scheduling