Spark Technology Insider: Stage Division and source code analysis

Source: Internet
Author: User

After an RDD action is triggered, take count as an example. The call relationship is as follows:

  1. Org. Apache. Spark. RDD. RDD # Count
  2. Org. Apache. Spark. sparkcontext # runjob
  3. Org. Apache. Spark. scheduler. dagscheduler # runjob
  4. Org. Apache. Spark. scheduler. dagscheduler # submitjob
  5. Org. Apache. Spark. schedtor. dagschedulereventprocessactor # receive (jobsubmitted)
  6. Org. Apache. Spark. schedted. dagschedted # handlejobsubmitted

In step 5, the dagschedulereventprocessactor is the interface proxy for dagscheduler to interact with the outside. when creating the dagscheduler, the actor named eventprocessactor is created. The role of this actor depends on its implementation:

/*** The main event loop of the Dag scheddag. */DEF receive = {Case jobsubmitted (jobid, RDD, func, partitions, allowlocal, callsite, listener, properties) => dagschedted. handlejobsubmitted (jobid, RDD, func, partitions, allowlocal, callsite, listener, properties) // submit a job from a message with RDD-> sparkcontext-> dagscheduler. The reason for this need to be transferred here is to ensure the consistency of the module functions. Case stagecancelled (stageid) => // The message source org.apache.spark.ui.jobs. jobprogresstab. The execution status of a sparkcontext job is displayed on the GUI. // You can cancel a stage, which will be passed here through sparkcontext-> dagscheduler. Dagschedation. handlestagecancellation (stageid) Case jobcancelled (jobid) => // message from org. Apache. Spark. schediter. jobwaiter. Cancel a job dagscheduler. handlejobcancellation (jobid) Case jobgroupcancelled (groupid) => // cancel the entire job group dagscheduler. handlejobgroupcancelled (groupid) Case alljobscancelled => // cancel all job dagscheduler. docancelalljobs () Case executoradded (execid, host) => // taskschedded get the message that the executor is added. Specific from Org. apache. spark. scheduler. taskschedulerimpl. resourceoffers dagscheduler. handleexecutoradded (execid, host) Case executorlost (execid) => // from taskscheduler dagscheduler. handleexecutorlost (execid) Case beginevent (task, taskinfo) => // from taskscheduler dagscheduler. handlebeginevent (task, taskinfo) Case gettingresultevent (taskinfo) => // process the message dagscheduler that obtains taskresult information. handlegettaskresult (taskinfo) c ASE completion @ completionevent (task, reason, _, _, taskinfo, taskmetrics) => // from taskscheduler, report whether the task is completed or failed. handletaskcompletion (Completion) Case tasksetfailed (taskset, reason) => // from taskschedled, either the number of taskset failures exceeds the threshold or the job cancel. Dagschedled. handletasksetfailed (taskset, reason) Case resubmitfailedstages => // when a stage fails to be processed, retry. From org. Apache. Spark. scheduler. dagscheduler. handletaskcompletion dagschedtion. resubmitfailedstages ()}

To sum up the functions of org. Apache. Spark. schedtor. dagschedulereventprocessactor, you can understand it as an external function interface of dagschedtor. It hides the details of its internal implementation and makes it easier to understand its logic. It also reduces maintenance costs and makes dagscheduler more complex functional interfaces.


Handlejobsubmitted

Org. Apache. Spark. schedmit. dagscheduler # handlejobsubmitted first creates finalstage Based on RDD. Finalstage, as its name implies, is the final stage. Create a job and submit it. If the submitted job meets the following conditions, it runs in Local Mode:

1) spark. localexecution. enabled is set to true and 2) the user program explicitly specifies that the program can run locally and 3) there is no parent stage for finalstage and 4) There is only one partition

3) and 4) for quick execution of tasks. If there are multiple stages or multiple partitions, local running may affect the computing speed of tasks due to computing resources on the local machine.

To understand what stage is, first understand what task is. Tasks are the basic unit for running on clusters. A task is responsible for processing a partition of RDD. Multiple patitions of RDD are processed by different tasks. Of course, the processing logic of these tasks is completely consistent. This group of tasks forms a stage. There are two types of tasks:

  1. Org. Apache. Spark. schedtask. shufflemaptask
  2. Org. Apache. Spark. schedask. resulttask

Shufflemaptask puts the calculation result into different buckets Based on the partitioner of the task. Resulttask sends the computing result back to the driver application. A job contains multiple stages, which are composed of a group of identical tasks. The final stage contains a group of resulttask.

After a user triggers an action, such as Count, collect, and sparkcontext, the task is submitted through the runjob function. Finally, the event processor of Dag will be passed to the handlejobsubmitted of dagscheduler itself. It will first divide the stage, submit the stage, and submit the task. Now, the task is running on the cluster.

The start of a stage is to read data from external storage or shuffle results. The end of a stage is because of shuffle or result generation.


Create finalstage

Handlejobsubmitted creates finalstage by calling newstage:

finalStage = newStage(finalRDD, partitions.size, None, jobId, callSite)

Create a result stage or finalstage by calling Org. apache. spark. scheduler. dagschedstage # newstage. To create a shuffle stage, you must call Org. apache. spark. scheduler. dagschedstage # neworusedstage.

private def newStage(      rdd: RDD[_],      numTasks: Int,      shuffleDep: Option[ShuffleDependency[_, _, _]],      jobId: Int,      callSite: CallSite)    : Stage =  {    val id = nextStageId.getAndIncrement()    val stage =      new Stage(id, rdd, numTasks, shuffleDep, getParentStages(rdd, jobId), jobId, callSite)    stageIdToStage(id) = stage    updateJobIdStageIdMaps(jobId, stage)    stage  }

For final stage of result, the input shuffledep is none.

We know that RDD can obtain its dependent parent RDD through org. Apache. Spark. RDD. RDD # getdependencies. Stage may also have a parent stage. Let's look at the stage division of an RDD paper:


A stage boundary. The input is external storage or the result of a stage Shuffle. The input is the result of a job (the stage corresponding to the result task) or shuffle.

The input of stage3 is the result of rdd a and rdd f shuffle. A and F need to shuffle to B and G, so they need to be divided into different stages.

From the perspective of source code implementation, the final stage (stage 3) is created by triggering an action, that is, the fifth parameter of the new stage is the parent stage of the stage: get through RDD and job ID:

// Generate the RDD parent stage. Without a shuffledependency, a stage private def getparentstages (RDD: RDD [_], jobid: INT) is generated ): list [stage] = {Val parents = new hashset [stage] // store parent stage Val visited = new hashset [RDD [_] // store the RDD that has been accessed/ /We are manually maintaining a stack here to prevent stackoverflowerror // caused by recursively visiting // store the RDD to be processed. In the stack, RDD must be processed. Val waitingforvisit = new stack [RDD [_] def visit (r: RDD [_]) {If (! Visited (R) {visited + = r // kind of uugly: need to register RDDs with the cache here since // we can't do it in its constructor because # of partitions is unknown for (DEP <-R. dependencies) {Dep match {Case shufdep: shuffledependency [_, _, _] => // a new stage parents + = getshufflemapstage (shufdep, jobid) must be generated when shuffledependency is shuffled) case _ => waitingforvisit. push (Dep. RDD) // not shuffledependency, so it belongs to the same s Tage }}} waitingforvisit. Push (RDD) // The input RDD is the first RDD to be processed. Then, access its parent RDD while (! Waitingforvisit. isempty) {// process the operation as long as the stack is not empty. Visit (waitingforvisit. Pop () // every time visit encounters shuffledependency, a stage is formed; otherwise, these RDD belong to the same stage} parents. tolist}

After finalstage is generated, You need to submit the stage.

// Submit the stage. If a parent stage is not submitted, submit it recursively. Private def submitstage (stage: stage) {Val jobid = activejobforstage (stage) if (jobid. isdefined) {logdebug ("submitstage (" + stage + ")") // if the current stage is not waiting for its parent stage to return and is not running, and there is no failure (the failure will have a Retry Mechanism and will not be submitted again here) if (! Waitingstages (stage )&&! Runningstages (stage )&&! Failedstages (stage) {Val missing = getmissingparentstages (stage ). sortby (_. ID) logdebug ("Missing:" + missing) if (missing = nil) {// if all the parent stages have been completed, submit the task loginfo ("submitting" + stage + "(" + stage. RDD + "), which has no missing parents") submitmissingtasks (stage, jobid. get)} else {for (parent <-missing) {// if the parent stage is complete, submit it recursively. submitstage (parent )} waitingstages + = stage }}} else {abortstage (stage, "no active job for Stage" + stage. ID )}}


After the dagschedstage stage is divided, the submission is actually by converting the stage to taskset, and then submitting the computing task to the cluster through taskscheduler. The location is shown in.


Next, we will convert the analysis stage to taskset and finally submit it to executor for running.


BTW: My work has been too busy recently. It usually takes more than 10 o'clock to finish washing at home. There is no energy to parse the source code. Fortunately, you don't have to work overtime on weekends. Therefore, blog updates will be concentrated on weekends. Come on.


Spark Technology Insider: Stage Division and source code analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.