"Spark Core" task execution mechanism and task source Analysis 2

Source: Internet
Author: User


The previous section, Task execution mechanism and task source 1, introduces the registration process of executor.
In this section, I'll take a executor from the end of the Launchtask, and I'll take a executor of the task after receiving the message.

1. launchtasks Function of Executor

Driveractor submits the task, sends the launchtask instruction to Coarsegrainedexecutorbackend, receives the instruction, lets its internal executor to initiate the task, That is, the launchtask function of the idle executor is called.
Here is some code for receivewithlogging in Coarsegrainedexecutorbackend:

    case LaunchTask(data) =>      ifnull) {        logError("Received LaunchTask command but executor was null")        System.exit(1)      else {        val ser = env.closureSerializer.newInstance()        val taskDesc = ser.deserialize[TaskDescription](data.value)        logInfo("Got assigned task " + taskDesc.taskId)        executor.launchTask(this, taskId = taskDesc.taskId, attemptNumber = taskDesc.attemptNumber,          taskDesc.name, taskDesc.serializedTask)      }

Executor Execute Task:

  def launchTask(      context: ExecutorBackend,      taskId: Long,      attemptNumber: Int,      taskName: String,      serializedTask: ByteBuffer) {    valnew TaskRunner(context, taskId = taskId, attemptNumber = attemptNumber, taskName,      serializedTask)    runningTasks.put(taskId, tr)    threadPool.execute(tr)  }

Executor internal maintenance A thread pool, can run multiple tasks, each submitted task will be wrapped into taskrunner to threadpool execution.

2. Taskrunner's Run method

The Run method val value = task.run(taskAttemptId = taskId, attemptNumber = attemptNumber) is a task that actually executes tasks in the task.

Here is some code for the Run method in Taskrunner:

      Try{Val(Taskfiles, Taskjars, taskbytes) = Task.deserializewithdependencies (Serializedtask) updatedependencies (TaskFiles, t Askjars)//Deserialize a tasktask = Ser.deserialize[task[any]] (taskbytes, Thread.currentThread.getContextClassLoader)//If This task had been killed before we deserialized it, let's quit now. Otherwise,        //Continue executing the task.        if(killed) {//Throw an exception rather than returning, because returning within a try{} block          //causes a Nonlocalreturncontrol exception to be thrown. The Nonlocalreturncontrol          //Exception'll be caught by the catch block, leading to an incorrect exceptionfailure          //For the task.          Throw NewTaskkilledexception} attemptedtask = Some (Task) Logdebug ("Task"+ TaskId +"' s epoch is"+ Task.epoch) Env.mapOutputTracker.updateEpoch (Task.epoch)//Run The actual task and measure its runtime.        //Run task, specifically to see Resulttask and ShufflemaptaskTaskstart = System.currenttimemillis ()ValValue = Task.run (Taskattemptid = taskId, Attemptnumber = attemptnumber)ValTaskfinish = System.currenttimemillis ()//If The task has a been killed, let ' s fail it.        if(task.killed) {Throw NewTaskkilledexception}//Serialization of results        ValResultser = Env.serializer.newInstance ()ValBeforeserialization = System.currenttimemillis ()ValValuebytes = resultser.serialize (value)ValAfterserialization = System.currenttimemillis ()The monitoring information of the update task will be reflected on the monitoring page         for(M <-task.metrics) {m.setexecutordeserializetime (Taskstart-deserializestarttime) m.setexecutorruntime (taskfinish-tasks Tart) m.setjvmgctime (gctime-startgctime) m.setresultserializationtime (Afterserialization-beforeseria Lization)}ValAccumupdates = Accumulators.values//re-package the results, and then serialize them after packaging        ValDirectresult =NewDirecttaskresult (Valuebytes, Accumupdates, Task.metrics.orNull)ValSerializeddirectresult = Ser.serialize (Directresult)ValResultSize = Serializeddirectresult.limit//Directsend = sending directly back to the driver        ValSerializedresult = {if(Maxresultsize >0&& resultsize > Maxresultsize) {logwarning (s"Finished $taskName (TID $taskId). Result is larger than Maxresultsize "+ S"(${utils.bytestostring (resultsize)} > ${utils.bytestostring (maxresultsize)}),"+ S"dropping it.") Ser.serialize (NewIndirecttaskresult[any] (Taskresultblockid (taskId), resultsize))}Else if(ResultSize >= akkaframesize-akkautils.reservedsizebytes) {//If the size of the intermediate result exceeds the size of spark.akka.frameSize (the default is 10M), the serialization level is increased, and the portion of the memory is saved to the hard disk            ValBlockid = Taskresultblockid (taskId) env.blockManager.putBytes (Blockid, Serializeddirectresult, St Oragelevel.memory_and_disk_ser) Loginfo (s"Finished $taskName (TID $taskId). $resultSize bytes result sent via Blockmanager) ") Ser.serialize (NewIndirecttaskresult[any] (Blockid, ResultSize)}Else{Loginfo (s"Finished $taskName (TID $taskId). $resultSize bytes result sent to driver ") Serializeddirectresult}}//Task completion and Taskresult, Statusupdate report to driverExecbackend.statusupdate (TaskId, taskstate.finished, Serializedresult)}Catch{//exception handling code, omit ...}finally{//Cleans up the shuffle memory registered for Resulttask, and finally removes the task from the running list        //Release memory used by this thread for shufflesEnv.shuffleMemoryManager.releaseMemoryForThisThread ()//Release memory used by this thread for unrolling blocksEnv.blockManager.memoryStore.releaseUnrollMemoryForThisThread ()//Release memory used by this thread for accumulatorsAccumulators.clear () Runningtasks.remove (TaskId)}}
3. Task execution Process

Taskrunner will start a new thread, let's take a look at the call procedure in the Run method:
TaskRunner.run–> Task.run –> Task.runTask –> RDD.iterator –> RDD.computeOrReadCheckpoint –> RDD.compute .

The run function code for the task:

  /** * Called by Executor to run the this task.   * * @param Taskattemptid An identifier for this task attempt, that's unique within a sparkcontext.  * @param Attemptnumber How many times this task have been attempted (0 for the first attempt) * @return The result of the task * /  Final defRun (Taskattemptid:long, attemptnumber:int): T = {context =NewTaskcontextimpl (Stageid = Stageid, PartitionID = PartitionID, Taskattemptid = taskattemptid, AttemptNumber = AttemptN umber, runninglocally =false) Taskcontexthelper.settaskcontext (context) Context.taskMetrics.setHostname (Utils.localhostname ()) Taskthread = T Hread.currentthread ()if(_killed) {Kill (InterruptThread =false)    }Try{RunTask (context)}finally{context.marktaskcompleted () Taskcontexthelper.unset ()}}

Different runtask functions are implemented by Shufflemaptask and Resulttask respectively.

Shufflemaptask's Runtask function code:

  Override defRunTask (context:taskcontext): Mapstatus = {//Deserialize the RDD using the broadcast variable.    ValSer = SparkEnv.get.closureSerializer.newInstance ()Val(RDD, dep) = ser.deserialize[(Rdd[_], Shuffledependency[_, _, _])] (Bytebuffer.wrap (taskbinary.value), thread.current Thread.getcontextclassloader)The taskbinary here is the broadcast variable for the task serialized in Org.apache.spark.scheduler.dagscheduler#submitmissingtasks .Metrics = Some (context.taskmetrics)varWriter:shufflewriter[any, any] =NULL    Try{ValManager = SparkEnv.get.shuffleManager writer = Manager.getwriter[any, any] (Dep.shufflehandle, PartitionID, context) Writer.write (Rdd.iterator (partition, context). asinstanceof[iterator[_ <: Product2[any, any]])//write the results of the RDD calculation to memory or disk      returnWriter.stop (Success =true). Get}Catch{ CaseE:exception =Try{if(Writer! =NULL) {Writer.stop (success =false)          }        }Catch{ CaseE:exception = Log.debug ("Could not stop writer", e)}Throwe}}

Resulttask's Runtask function code:

  overridedef runTask(context: TaskContext): U = {    // Deserialize the RDD and the func using the broadcast variables.    val ser = SparkEnv.get.closureSerializer.newInstance()    val (rdd, func) = ser.deserialize[(RDD[T], (TaskContext, Iterator[T]) => U)](      ByteBuffer.wrap(taskBinary.value), Thread.currentThread.getContextClassLoader)    metrics = Some(context.taskMetrics)    func(context, rdd.iterator(partition, context))  }
4. Task status updates

The task execution is run through Taskrunner, it needs to communicate through executorbackend and driver, and the communication message is statusupdate:

  1. Before the task runs, tell driver that the status of the current task is taskstate.running.
  2. After the task runs, it tells driver that the current task's state is taskstate.finished and returns the result of the calculation.
  3. If an error occurs during a task run, tell driver that the current task's status is taskstate.failed and return the cause of the error.
  4. If the task is killed halfway through, tell driver that the current task's status is taskstate.failed.

5. Task execution Complete

The task executes, and in the Taskrunner run function, the Statusupdate notifies executebackend and the result is saved in Directtaskresult.
After Schedulerbackend receives the statusupdate, it makes the following judgment: If the task has been successfully processed, it is removed from the watch list. If all the tasks in the entire job are completed, the resource that is consumed is freed.
Taskschedulerimpl the currently completed task into the completion queue and takes out the next waiting task.

The following coarsegrainedschedulerbackend is the code in which the Statusupdate message is processed:

       CaseStatusupdate (Executorid, TaskId, state, data) =//statusupdate function Processing handles the task of deleting completed tasks from TasksetScheduler.statusupdate (TaskId, State, Data.value)if(Taskstate.isfinished (state)) {Executordatamap.get (Executorid)Match{ CaseSome (Executorinfo) = executorinfo.freecores + = scheduler. Cpus_per_task makeoffers (Executorid) CaseNone =//Ignoring the update since we don ' t know about the executor.Logwarning (S"Ignored task status update ($taskId State $state)"+"from unknown executor $sender with ID $executorId")          }        }

scheduler.statusUpdatefunction to perform the following steps:

  1. TaskScheduler through TaskID to find the Tasksetmanager that manages the task (the class that manages a batch of tasks) and deletes the task from the Tasksetmanager. and insert the task into the success queue of Taskresultgetter (the class responsible for obtaining the task result);
  2. After Taskresultgetter obtains the result, call TaskScheduler's Handlesuccessfultask method to return the result;
  3. TaskScheduler calls Tasksetmanager's Handlesuccessfultask method to handle a successful task;
  4. Tasksetmanager calls Dagscheduler's Taskended method, tells Dagscheduler that the task is finished, and if the task succeeds at this time, it will end Tasksetmanager.

Dagscheduler triggers the Completionevent event in the Taskended method and calls Dagscheduler's handletaskcompletion function in handling completionevent message events. Treat results differently for Resulttask and Shufflemaptask:
1) Resulttask:
The job's numfinished plus 1, if numfinished equals its number of shards, indicates that the task stage ends, marking the stage as the end, Finally call Joblistener (specifically implemented in Jobwaiter) of the tasksucceeded method, the result is given to Resulthandler (the wrapper's own written anonymous function) processing, if the completed task number equals total task number, the task exits.
2) Shufflemaptask:

  1. Call the Addoutputloc method of the stage to add the results to the Outputlocs list in the stage
  2. If the stage does not have a task to wait for, mark the stage as the end
  3. Register the Outputlocs of the stage with the Mapoutputtracker and leave the next stage
  4. If the stage's Outputlocs is empty, indicating that its calculation failed, resubmit the stage
  5. Find the next stage to wait and no father to submit

reprint Please indicate the author Jason Ding and its provenance
Gitcafe Blog Home page (http://jasonding1354.gitcafe.io/)
GitHub Blog Home page (http://jasonding1354.github.io/)
CSDN Blog (http://blog.csdn.net/jasonding1354)
Jane Book homepage (http://www.jianshu.com/users/2bd9b48f6ea8/latest_articles)
Google search jasonding1354 go to my blog homepage

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

"Spark Core" task execution mechanism and task source Analysis 2

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.