"Spark Core" task execution mechanism and task source Analysis 2

Last Update:2015-07-20 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Introduction

The previous section, Task execution mechanism and task source 1, introduces the registration process of executor.
In this section, I'll take a executor from the end of the Launchtask, and I'll take a executor of the task after receiving the message.

1. launchtasks Function of Executor

Driveractor submits the task, sends the launchtask instruction to Coarsegrainedexecutorbackend, receives the instruction, lets its internal executor to initiate the task, That is, the launchtask function of the idle executor is called.
Here is some code for receivewithlogging in Coarsegrainedexecutorbackend:

    case LaunchTask(data) =>      ifnull) {        logError("Received LaunchTask command but executor was null")        System.exit(1)      else {        val ser = env.closureSerializer.newInstance()        val taskDesc = ser.deserialize[TaskDescription](data.value)        logInfo("Got assigned task " + taskDesc.taskId)        executor.launchTask(this, taskId = taskDesc.taskId, attemptNumber = taskDesc.attemptNumber,          taskDesc.name, taskDesc.serializedTask)      }

Executor Execute Task:

  def launchTask(      context: ExecutorBackend,      taskId: Long,      attemptNumber: Int,      taskName: String,      serializedTask: ByteBuffer) {    valnew TaskRunner(context, taskId = taskId, attemptNumber = attemptNumber, taskName,      serializedTask)    runningTasks.put(taskId, tr)    threadPool.execute(tr)  }

Executor internal maintenance A thread pool, can run multiple tasks, each submitted task will be wrapped into taskrunner to threadpool execution.

2. Taskrunner's Run method

The Run method val value = task.run(taskAttemptId = taskId, attemptNumber = attemptNumber) is a task that actually executes tasks in the task.

Here is some code for the Run method in Taskrunner:

      Try{Val(Taskfiles, Taskjars, taskbytes) = Task.deserializewithdependencies (Serializedtask) updatedependencies (TaskFiles, t Askjars)//Deserialize a tasktask = Ser.deserialize[task[any]] (taskbytes, Thread.currentThread.getContextClassLoader)//If This task had been killed before we deserialized it, let's quit now. Otherwise,        //Continue executing the task.        if(killed) {//Throw an exception rather than returning, because returning within a try{} block          //causes a Nonlocalreturncontrol exception to be thrown. The Nonlocalreturncontrol          //Exception'll be caught by the catch block, leading to an incorrect exceptionfailure          //For the task.          Throw NewTaskkilledexception} attemptedtask = Some (Task) Logdebug ("Task"+ TaskId +"' s epoch is"+ Task.epoch) Env.mapOutputTracker.updateEpoch (Task.epoch)//Run The actual task and measure its runtime.        //Run task, specifically to see Resulttask and ShufflemaptaskTaskstart = System.currenttimemillis ()ValValue = Task.run (Taskattemptid = taskId, Attemptnumber = attemptnumber)ValTaskfinish = System.currenttimemillis ()//If The task has a been killed, let ' s fail it.        if(task.killed) {Throw NewTaskkilledexception}//Serialization of results        ValResultser = Env.serializer.newInstance ()ValBeforeserialization = System.currenttimemillis ()ValValuebytes = resultser.serialize (value)ValAfterserialization = System.currenttimemillis ()The monitoring information of the update task will be reflected on the monitoring page         for(M <-task.metrics) {m.setexecutordeserializetime (Taskstart-deserializestarttime) m.setexecutorruntime (taskfinish-tasks Tart) m.setjvmgctime (gctime-startgctime) m.setresultserializationtime (Afterserialization-beforeseria Lization)}ValAccumupdates = Accumulators.values//re-package the results, and then serialize them after packaging        ValDirectresult =NewDirecttaskresult (Valuebytes, Accumupdates, Task.metrics.orNull)ValSerializeddirectresult = Ser.serialize (Directresult)ValResultSize = Serializeddirectresult.limit//Directsend = sending directly back to the driver        ValSerializedresult = {if(Maxresultsize >0&& resultsize > Maxresultsize) {logwarning (s"Finished $taskName (TID $taskId). Result is larger than Maxresultsize "+ S"(${utils.bytestostring (resultsize)} > ${utils.bytestostring (maxresultsize)}),"+ S"dropping it.") Ser.serialize (NewIndirecttaskresult[any] (Taskresultblockid (taskId), resultsize))}Else if(ResultSize >= akkaframesize-akkautils.reservedsizebytes) {//If the size of the intermediate result exceeds the size of spark.akka.frameSize (the default is 10M), the serialization level is increased, and the portion of the memory is saved to the hard disk            ValBlockid = Taskresultblockid (taskId) env.blockManager.putBytes (Blockid, Serializeddirectresult, St Oragelevel.memory_and_disk_ser) Loginfo (s"Finished $taskName (TID $taskId). $resultSize bytes result sent via Blockmanager) ") Ser.serialize (NewIndirecttaskresult[any] (Blockid, ResultSize)}Else{Loginfo (s"Finished $taskName (TID $taskId). $resultSize bytes result sent to driver ") Serializeddirectresult}}//Task completion and Taskresult, Statusupdate report to driverExecbackend.statusupdate (TaskId, taskstate.finished, Serializedresult)}Catch{//exception handling code, omit ...}finally{//Cleans up the shuffle memory registered for Resulttask, and finally removes the task from the running list        //Release memory used by this thread for shufflesEnv.shuffleMemoryManager.releaseMemoryForThisThread ()//Release memory used by this thread for unrolling blocksEnv.blockManager.memoryStore.releaseUnrollMemoryForThisThread ()//Release memory used by this thread for accumulatorsAccumulators.clear () Runningtasks.remove (TaskId)}}

3. Task execution Process

Taskrunner will start a new thread, let's take a look at the call procedure in the Run method:
TaskRunner.run–> Task.run –> Task.runTask –> RDD.iterator –> RDD.computeOrReadCheckpoint –> RDD.compute .

The run function code for the task:

  /** * Called by Executor to run the this task.   * * @param Taskattemptid An identifier for this task attempt, that's unique within a sparkcontext.  * @param Attemptnumber How many times this task have been attempted (0 for the first attempt) * @return The result of the task * /  Final defRun (Taskattemptid:long, attemptnumber:int): T = {context =NewTaskcontextimpl (Stageid = Stageid, PartitionID = PartitionID, Taskattemptid = taskattemptid, AttemptNumber = AttemptN umber, runninglocally =false) Taskcontexthelper.settaskcontext (context) Context.taskMetrics.setHostname (Utils.localhostname ()) Taskthread = T Hread.currentthread ()if(_killed) {Kill (InterruptThread =false)    }Try{RunTask (context)}finally{context.marktaskcompleted () Taskcontexthelper.unset ()}}

Different runtask functions are implemented by Shufflemaptask and Resulttask respectively.

Shufflemaptask's Runtask function code:

  Override defRunTask (context:taskcontext): Mapstatus = {//Deserialize the RDD using the broadcast variable.    ValSer = SparkEnv.get.closureSerializer.newInstance ()Val(RDD, dep) = ser.deserialize[(Rdd[_], Shuffledependency[_, _, _])] (Bytebuffer.wrap (taskbinary.value), thread.current Thread.getcontextclassloader)The taskbinary here is the broadcast variable for the task serialized in Org.apache.spark.scheduler.dagscheduler#submitmissingtasks .Metrics = Some (context.taskmetrics)varWriter:shufflewriter[any, any] =NULL    Try{ValManager = SparkEnv.get.shuffleManager writer = Manager.getwriter[any, any] (Dep.shufflehandle, PartitionID, context) Writer.write (Rdd.iterator (partition, context). asinstanceof[iterator[_ <: Product2[any, any]])//write the results of the RDD calculation to memory or disk      returnWriter.stop (Success =true). Get}Catch{ CaseE:exception =Try{if(Writer! =NULL) {Writer.stop (success =false)          }        }Catch{ CaseE:exception = Log.debug ("Could not stop writer", e)}Throwe}}

Resulttask's Runtask function code:

  overridedef runTask(context: TaskContext): U = {    // Deserialize the RDD and the func using the broadcast variables.    val ser = SparkEnv.get.closureSerializer.newInstance()    val (rdd, func) = ser.deserialize[(RDD[T], (TaskContext, Iterator[T]) => U)](      ByteBuffer.wrap(taskBinary.value), Thread.currentThread.getContextClassLoader)    metrics = Some(context.taskMetrics)    func(context, rdd.iterator(partition, context))  }

4. Task status updates

The task execution is run through Taskrunner, it needs to communicate through executorbackend and driver, and the communication message is statusupdate:

Before the task runs, tell driver that the status of the current task is taskstate.running.

After the task runs, it tells driver that the current task's state is taskstate.finished and returns the result of the calculation.

If an error occurs during a task run, tell driver that the current task's status is taskstate.failed and return the cause of the error.

If the task is killed halfway through, tell driver that the current task's status is taskstate.failed.

5. Task execution Complete

The task executes, and in the Taskrunner run function, the Statusupdate notifies executebackend and the result is saved in Directtaskresult.
After Schedulerbackend receives the statusupdate, it makes the following judgment: If the task has been successfully processed, it is removed from the watch list. If all the tasks in the entire job are completed, the resource that is consumed is freed.
Taskschedulerimpl the currently completed task into the completion queue and takes out the next waiting task.

The following coarsegrainedschedulerbackend is the code in which the Statusupdate message is processed:

       CaseStatusupdate (Executorid, TaskId, state, data) =//statusupdate function Processing handles the task of deleting completed tasks from TasksetScheduler.statusupdate (TaskId, State, Data.value)if(Taskstate.isfinished (state)) {Executordatamap.get (Executorid)Match{ CaseSome (Executorinfo) = executorinfo.freecores + = scheduler. Cpus_per_task makeoffers (Executorid) CaseNone =//Ignoring the update since we don ' t know about the executor.Logwarning (S"Ignored task status update ($taskId State $state)"+"from unknown executor $sender with ID $executorId")          }        }

scheduler.statusUpdatefunction to perform the following steps:

TaskScheduler through TaskID to find the Tasksetmanager that manages the task (the class that manages a batch of tasks) and deletes the task from the Tasksetmanager. and insert the task into the success queue of Taskresultgetter (the class responsible for obtaining the task result);

After Taskresultgetter obtains the result, call TaskScheduler's Handlesuccessfultask method to return the result;

TaskScheduler calls Tasksetmanager's Handlesuccessfultask method to handle a successful task;

Tasksetmanager calls Dagscheduler's Taskended method, tells Dagscheduler that the task is finished, and if the task succeeds at this time, it will end Tasksetmanager.

Dagscheduler triggers the Completionevent event in the Taskended method and calls Dagscheduler's handletaskcompletion function in handling completionevent message events. Treat results differently for Resulttask and Shufflemaptask:
1) Resulttask:
The job's numfinished plus 1, if numfinished equals its number of shards, indicates that the task stage ends, marking the stage as the end, Finally call Joblistener (specifically implemented in Jobwaiter) of the tasksucceeded method, the result is given to Resulthandler (the wrapper's own written anonymous function) processing, if the completed task number equals total task number, the task exits.
2) Shufflemaptask:

Call the Addoutputloc method of the stage to add the results to the Outputlocs list in the stage

If the stage does not have a task to wait for, mark the stage as the end

Register the Outputlocs of the stage with the Mapoutputtracker and leave the next stage

If the stage's Outputlocs is empty, indicating that its calculation failed, resubmit the stage

Find the next stage to wait and no father to submit

reprint Please indicate the author Jason Ding and its provenance
Gitcafe Blog Home page (http://jasonding1354.gitcafe.io/)
GitHub Blog Home page (http://jasonding1354.github.io/)
CSDN Blog (http://blog.csdn.net/jasonding1354)
Jane Book homepage (http://www.jianshu.com/users/2bd9b48f6ea8/latest_articles)
Google search jasonding1354 go to my blog homepage

"Spark Core" task execution mechanism and task source Analysis 2

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

"Spark Core" task execution mechanism and task source Analysis 2

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

"Spark Core" task execution mechanism and task source Analysis 2

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support