International - English

Cart Console

Topic Center

Contact Sales

Home > Others

Spark Cultivation (Advanced article)--spark Source reading: Nineth section The result of the success of task execution __spark

Last Update:2018-08-21 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

result processing when task execution succeeds

In the previous section, we showed the running code for the task on executor, and we knew that the final run of the code passed the Taskrunner method

Class Taskrunner (
      execbackend:executorbackend,
      Val taskid:long,
      Val attemptnumber:int,
      taskname: String,
      serializedtask:bytebuffer)
    extends Runnable {/

    /other extraneous code omitted

      //to driver end-to-end status update
      Execbackend.statusupdate (TaskId, taskstate.running, Empty_byte_buffer)/

      /Other non-critical code omitted
      //executed upon completion, Notifies the driver side of status updates
        execbackend.statusupdate (taskId, taskstate.finished, Serializedresult)

      } catch {
        // Notify driver of status updates
        //code omitted when an error occurs

State Update, the first call is the Statusupdate method in Coarsegrainedexecutorbackend

  Override Def statusupdate (Taskid:long, State:taskstate, Data:bytebuffer) {
    val msg = statusupdate (Executorid, TaskI D, State, data)
    driver match {
      //Send driver end statusupdate message case
      Some (driverref) => driverref.send (msg) Case
      None => logwarning (S "Drop $msg because has not yet connected to driver")}}

The Receive method in the Driverendpoint receives and sends a hair to send over statusupdate message, the concrete source code is as follows:

 Override Def receive:partialfunction[any, Unit] = {//Receive statusupdate sent over the message case statusupdate (Executorid, Ta SkId, State, data) =>//Invoke the Statusupdate method Scheduler.statusupdate (TaskId, State, data) in Taskschedulerimpl.
            Value)//if (taskstate.isfinished (state)) {Executordatamap.get (Executorid) match { Case Some (executorinfo) => executorinfo.freecores + = scheduler. Cpus_per_task makeoffers (executorid) Case None =>//ignoring the update since
              We don ' t know about the executor. Logwarning (S "Ignored task status update ($taskId State $state)" + S "from unknown executor with ID $execut Orid ")}} case reviveoffers => makeoffers () Case Killtask (TaskId, Executorid,
            InterruptThread) => Executordatamap.get (Executorid) match {case Some (executorinfo) => Executorinfo.exEcutorendpoint.send (Killtask (TaskId, Executorid, InterruptThread)) Case None =>//ignoring the T
            Ask kill since the executor is not registered.
 Logwarning (S "Attempted to kill task $taskId for unknown executor $executorId.")}}

The source code for the Statusupdate method in the

Taskschedulerimpl is as follows:

 def statusupdate (Tid:long, State:taskstate, Serializeddata:bytebuffer) {var failedexecutor:option[string] = None
          Synchronized {try {if (state = = Taskstate.lost && taskidtoexecutorid.contains (tid)) {  We lost this entire executor, so remember that it ' s gone val execid = Taskidtoexecutorid (tid) if (Activeexecutorids.contains (EXECID))
        {removeexecutor (execid) failedexecutor = Some (execid)}}
              Taskidtotasksetmanager.get (tid) match {case Some (TaskSet) => if (taskstate.isfinished (state)) { Taskidtotasksetmanager.remove (TID) taskidtoexecutorid.remove (TID)}// Process if task execution succeeds if (state = = taskstate.finished) {taskset.removerunningtask (tid)//task
            Resultgetter is the thread pool that handles the success of execution Taskresultgetter.enqueuesuccessfultask (TaskSet, Tid, Serializeddata)Task execution is unsuccessful, including task execution failure, task loss, and task being killed} else if (Set (taskstate.failed, taskstate.killed, Taskstate.lost). Contains (stat e)) {Taskset.removerunningtask (TID)//Handling failure of task execution Taskresultgetter.enqueuefaile Dtask (TaskSet, Tid, State, Serializeddata)} case None => logerror ("I Gnoring update with state%s for TID%s because it task set is gone (this is ' + ' likely the result of RE Ceiving Duplicate task Finished status updates) "). Format (State, TID))} catch {C ASE e:exception => logerror ("Exception in Statusupdate", E)}//Update the Dagscheduler without Ng a lock on this, since can deadlock if (failedexecutor.isdefined) {dagscheduler.executorlost (FAILEDEXECU Tor.get) Backend.reviveoffers ()}}

For a successful task execution, it calls the Taskresultgetter Enqueuesuccessfultask method for processing:

 def enqueuesuccessfultask (Tasksetmanager:tasksetmanager, Tid:long, Serializeddata:bytebuffer) {GetTaskResult  Executor.execute (New Runnable {override def run (): unit = utils.loguncaughtexceptions {try {val
            (result, size) = Serializer.get (). Deserialize[taskresult[_]] (serializeddata) match {//results for final calculation Case Directresult:directtaskresult[_] => if (!tasksetmanager.canfetchmoreresults serializeddata.limit ( )) {return}//Deserialize "value" without holding any lock so it won '
              T block and other threads. We should call it and then it's called again in//"Tasksetmanager.handlesuccessfultask" and it do
              Es not need to deserialize the value.
            Directresult.value () (Directresult, Serializeddata.limit ())//results saved in Blockmanager of the Remote worker node
   Case Indirecttaskresult (blockid, size) =>           if (!tasksetmanager.canfetchmoreresults (size)) {//dropped by executor if size is larger than
              Maxresultsize SparkEnv.blockManager.master.removeBlock (blockid) return} Logdebug ("fetching indirect task result for TID%s". Format (TID)) Scheduler.handletaskgettingresul T (Tasksetmanager, TID)//Get results from remote worker val Serializedtaskresult = SparkEnv.blockManager.getRem  Otebytes (Blockid) if (!serializedtaskresult.isdefined) {/* We won ' t be able to get the task Result if the machine this ran task failed * between when the task ended and, we tried to fetch The result, or if the * blocks manager had to flush.
                  *///Get results, if the remote eexecutor corresponding machine failure or other error, may cause the result to obtain failure Scheduler.handlefailedtask (
 Tasksetmanager, Tid, taskstate.finished, Taskresultlost)               return}//Deserialize the result of the remote fetch val deserializedresult = Serializer.get (). Deserialize[directtaskresult[_]] (serializedtaskresult.get)//delete remote results SparkEnv.blockManager.mas Ter.removeblock (Blockid) (Deserializedresult, size)} result.metrics.setResultSize (Size //taskschedulerimpl processing obtained results scheduler.handlesuccessfultask (Tasksetmanager, Tid, result)} CA 
            tch {case Cnf:classnotfoundexception => val loader = Thread.currentThread.getContextClassLoader Tasksetmanager.abort ("ClassNotFound with ClassLoader:" + loader)//Matching nonfatal so we don ' t
          Catch the controlthrowable from the ' return ' above. Case Nonfatal (ex) => Logerror ("Exception while getting task result", ex) Tasksetmanager.abort (" Exception while getting task result:%s ". Format (Ex)}}}"}

The Handlesuccessfultask method in Taskschedulerimpl will eventually deal with the results of the calculation, with the following source:

def handlesuccessfultask (
      Tasksetmanager:tasksetmanager,
      Tid:long,
      taskresult:directtaskresult[_]): Unit = synchronized {
     //Call Tasksetmanager.handlesuccessfultask method for processing
    tasksetmanager.handlesuccessfultask ( Tid, Taskresult)
  }

Tasksetmanager.handlesuccessfultask method source code is as follows:

/** * Marks the task as successful and notifies Dagscheduler that a task has. * Def handlesuccessfultask (Tid:long, result:directtaskresult[_]): unit = {val info = TaskInfos (tid) Val Inde x = Info.index info.marksuccessful () removerunningtask (TID)//This are called by "Taskschedulerimpl.han Dlesuccessfultask "which holds"//"Taskschedulerimpl" lock until exiting. To avoid the SPARK-7655 issue, we should not//"deserialize" the value when holding a lock to avoid blocking other th Reads.
    So we called//"Result.value ()" in "Taskresultgetter.enqueuesuccessfultask" before reaching here. Note: "Result.value ()" is deserializes the value when it's called at the "the" "Result.value ()"
    Just returns the value and won ' t block, other threads. Call Dagscheduler's Taskended method sched.dagScheduler.taskEnded (Tasks (index), Success, Result.value (), Result.accumup Dates, info, result.metrics) if (!succesSful (index) {taskssuccessful = 1 loginfo ("finished task%s in stage%s (TID%d) in%d MS on%s (%d/%d)". For Mat (Info.id, Taskset.id, Info.taskid, Info.duration, Info.host, taskssuccessful, numtasks))//Mark Success
      Ful and stop if all the tasks have succeeded. Successful (index) = True if (taskssuccessful = = numtasks) {Iszombie = true}} else {LogIn Fo ("Ignoring task-finished event for" + Info.id + "in stage" + Taskset.id + "because task" + index + "has AL Ready completed successfully ")} failedexecutors.remove (Index) Maybefinishtaskset ()}

The taskended method of entering Dagscheduler

The Taskended method in Dagscheduler
/**
   * Called by the Tasksetmanager to the task completions or failures.
   *
  def taskended (
      task:task[_],
      Reason:taskendreason,
      result:any,
      Accumupdates:map[long, any ],
      taskinfo:taskinfo,
      taskmetrics:taskmetrics): unit = {
      // The Dagschedulereventprocessloop post method is invoked to commit completionevent to the event queue, which is handled by the Eventthread, and the OnReceive method handles the event
    Eventprocessloop.post (
      completionevent (task, reason, result, accumupdates, TaskInfo, Taskmetrics))
  }

To jump to the OnReceive method, you can see that it calls the OnReceive

The OnReceive method in Dagschedulereventprocessloop
/**
   * The main event loop of the DAG scheduler.
   *
  /override def onreceive (event:dagschedulerevent): unit = {
    val timercontext = timer.time ()
    try {
      Doonreceive (event)
    } finally {
      timercontext.stop ()
    }
  }

Jump to the Doonreceive method to see

Doonreceive method in Dagschedulereventprocessloop private def doonreceive (event:dagschedulerevent): unit = Event Match { Case jobsubmitted (Jobid, Rdd, func, partitions, Callsite, Listener, properties) => Dagscheduler.handlejobsubmitte D (Jobid, Rdd, func, partitions, Callsite, listener, properties) case stagecancelled (Stageid) => Dagscheduler

    . Handlestagecancellation (Stageid) Case jobcancelled (Jobid) => dagscheduler.handlejobcancellation (jobId) Case jobgroupcancelled (groupId) => dagscheduler.handlejobgroupcancelled (groupId) Case alljobscancelled =&gt
      ; Dagscheduler.docancelalljobs () Case executoradded (Execid, host) => dagscheduler.handleexecutoradded (ExecId, Host) Case Executorlost (execid) => dagscheduler.handleexecutorlost (execid, fetchfailed = false) case is Ginevent (Task, TaskInfo) => dagscheduler.handlebeginevent (Task, TaskInfo) Case gettingresultevent (TaskInfo) => DagsCheduler.handlegettaskresult (TaskInfo)//Handling Completionevent Event Case Completion @ completionevent (task, Reason, _, _, TaskInfo, Taskmetrics) =>//submitted to the Dagscheduler.handletaskcompletion method of processing dagscheduler.handletaskcompletion (comple tion) Case tasksetfailed (taskSet, Reason, exception) => dagscheduler.handletasksetfailed (taskSet, Reason, ex
 ception) Case Resubmitfailedstages => dagscheduler.resubmitfailedstages ()}

The Dagscheduler.handletaskcompletion method completes the processing of the calculated results

/** * Responds to a task finishing. This is called inside the "event loop so it assumes" it can * Modify the scheduler ' s internal state.
   Use taskended () to post a task end event from outside. */Private[scheduler] def handletaskcompletion (event:completionevent) {val task = Event.task val Stageid = Tas K.stageid val tasktype = utils.getformattedclassname (Task) outputcommitcoordinator.taskcompleted (StageId, TASK.PA Rtitionid, Event.taskInfo.attempt, Event.reason)//The Success case are dealt with separately below, since we n
    Eed to compute accumulator//updates before posting. if (Event.reason!= Success) {val Attemptid = Task.stageattemptid listenerbus.post (sparklistenertaskend (stage IDs, Attemptid, TaskType, Event.reason, Event.taskinfo, Event.taskmetrics)} if (!stageidtostage.contains
      (Task.stageid)) {//Skip all the actions if the stage has been cancelled. return} val stage = StageidTostage (Task.stageid) Event.reason match {case Success => listenerbus.post (sparklistenertaskend EId, Stage.latestInfo.attemptId, TaskType, Event.reason, Event.taskinfo, Event.taskmetrics) Stage.pendi Ngtasks-= Task Task Match {//process Resulttask Case Rt:resulttask[_, _] =>/Ca St to resultstage here because it's part of the Resulttask//TODO Refactor this out to a function that accept
              S a resultstage val resultstage = stage.asinstanceof[resultstage] Resultstage.resultofjob Match { Case Some (Job) => if (!job.finished (Rt.outputid)) {updateaccumulators ( event) = job.finished (Rt.outputid) = true job.numfinished + = 1//If T He whole job has finished, remove it//judge whether the job has been processed, that is, whether all tasks have been processed if (job.numfinished
= = job.numpartitions) {                    Markstageasfinished (resultstage) cleanupstateforjobandindependentstages (Job) Listenerbus.post (Sparklistenerjobend (Job.jobid, Clock.gettimemillis (), jobsucceeded) )///tasksucceeded runs some user code that might throw a exception.
                  Make sure//We are resilient against that.
                  Notifies jobwaiter,job to finish processing try {job.listener.taskSucceeded (Rt.outputid, Event.result) Catch {case E:exception =>//Todo:perhaps we want to Ma
                      RK The Resultstage as failed?  Job.listener.jobFailed (New Sparkdriverexecutionexception (E))}} case None => Loginfo ("ignoring result from" + RT + "because its job has finished")}//Office Daniel Shufflemaptask CASE Smt:shufflemaptask => val shufflestage = stage.asinstanceof[shufflemapstage] Updateaccumul Ators (event) Val status = Event.result.asinstanceof[mapstatus] val execid = status.location.execut Orid logdebug ("Shufflemaptask finished on" + execid) if (Failedepoch.contains (execid) && Smt.epoch <= Failedepoch (execid)) {Loginfo (s) ignoring possibly bogus $smt completion from executor $exec Id ")} else {//result saved to Shufflemapstage Shufflestage.addoutputloc (Smt.partitionid, S 
              Tatus)} if (Runningstages.contains (shufflestage) && shuffleStage.pendingTasks.isEmpty) {
              Markstageasfinished (shufflestage) loginfo ("Looking for newly runnable stages")  Loginfo ("Running:" + runningstages) loginfo ("Waiting:" + waitingstages) Loginfo ("failed:" +

        Failedstages)      We supply true to increment the epoch of the "This is a//recomputation of the" map out Puts. In this case, some nodes could have cached//locations with holes (from when we detected the error) and would
              Need the//epoch incremented to refetch them. Todo:only Increment the epoch number if this isn't the "the" the "the" the "
              UTs. Mapoutputtracker.registermapoutputs (ShuffleStage.shuffleDep.shuffleId, SHUFFLESTAGE.OUTPU Tlocs.map (List => if (list.isempty) null else list.head), Changeepoch = True) ClearCache Locs ()//handling Partial task failure if (ShuffleStage.outputLocs.contains (Nil)) {//Some TA SKS had failed;
                Let's resubmit this shufflestage//Todo:lower-level Scheduler should also and this Loginfo ("resubmitting" + ShuFflestage + "(" + Shufflestage.name + ") because some of its tasks had failed:" + sh
                 UffleStage.outputLocs.zipWithIndex.filter (_._1.isempty). Map (_._2). Mkstring (",")) Resubmit Submitstage (shufflestage)} else {//process other UNCOMMITTED stage VA L newlyrunnable = new Arraybuffer[stage] for (shufflestage <-waitingstages) {Loginf
                O ("Missing Parents for" + Shufflestage + ":" + getmissingparentstages (shufflestage))
                For (shufflestage <-waitingstages if Getmissingparentstages (shufflestage). IsEmpty)
                {newlyrunnable = shufflestage} waitingstages--= newlyrunnable Runningstages ++= newlyrunnable for {shufflestage <-newlyrunnable.sortby (
                  _.id)Jobid <-activejobforstage (shufflestage)} {Loginfo ("submitting" + Shufflestage + " ("+ Shufflestage.rdd +"), which is now runnable ") Submitmissingtasks (shufflestage , Jobid)}}}//other code omitted}

Execution Process:
1. Org.apache.spark.executor.TaskRunner.statusUpdate Method
2. The Org.apache.spark.executor.CoarseGrainedExecutorBackend.statusUpdate method
3. Org.apache.spark.scheduler.cluster.coarsegrainedschedulerbackend#driverendpoint.recieve method, DriverEndpoint is an internal class The Statusupdate method in
4. Org.apache.spark.scheduler.TaskSchedulerImpl
5. The Org.apache.spark.scheduler.TaskResultGetter.enqueueSuccessfulTask method
6. Org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion method

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Spark Cultivation (Advanced article)--spark Source reading: Nineth section The result of the success of task execution __spark

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support