Introduced
Previous article Dagscheduler source analysis mainly from the submission job of the process point of view of the dagscheduler source of important functions and key points, this article Dagscheduler source Analysis 2 main reference fxjwind spark source analysis- Dagscheduler, this article introduces several important functions that were not previously introduced in the Dagscheduler file.
Event handling
Before the Spark 1.0 release, the EventQueue private member was added to the Dagscheduler class to set the EventLoop thread to loop through the read event for processing. In the Spark 1.0 source code, event handling is performed by actor, and the Dageventprocessactor class involved in the main event handling work.
Perhaps because Scala no longer supports native actor mode, and the Akka actor as the official standard, in my view of the Spark 1.4 source, Dagscheduler re-use the EventQueue way of event processing, in order to clarify the code logic, Less coupling, 1.4 of the source code to write the Dagschedulereventprocessloop class for event processing.
privateclass DAGSchedulerEventProcessLoop(dagScheduler: DAGScheduler) extends EventLoop[DAGSchedulerEvent]("dag-scheduler-event-loop"with Logging {
Here Dagschedulereventprocessloop inherits the EventLoop class, where:
Private[Spark]Abstract class eventloop[E](name:string) extends Logging { Private ValEventqueue:blockingqueue[e] =NewLinkedblockingdeque[e] ()Private Valstopped =NewAtomicboolean (false)Private ValEventthread =NewThread (name) {Setdaemon (true)Override defRun (): Unit = {Try{ while(!stopped.get) {Valevent = Eventqueue.take ()Try{OnReceive (Event)}Catch{ CaseNonfatal (e) + = {Try{OnError (e)}Catch{ CaseNonfatal (E) = LogError ("Unexpected error in"+ Name, E)}}}}Catch{ CaseIe:interruptedexception =//Exit even if EventQueue is not empty CaseNonfatal (E) = LogError ("Unexpected error in"+ Name, e)}} ...
We can see that Dagscheduler sends an event to the EventQueue by delivering an event to the Dagschedulereventprocessloop object, Eventthread continuously obtains the event from the EventQueue and calls the OnReceive function for processing.
overridedefmatch { case JobSubmitted(jobId, rdd, func, partitions, allowLocal, callSite, listener, properties) => dagScheduler.handleJobSubmitted(jobId, rdd, func, partitions, allowLocal, callSite, listener, properties) ......
Jobwaiter
Jobwaiter first implements the tasksucceeded and jobfailed functions of Joblistener, When Dagscheduler receives an event of tasksuccess or fail, it invokes the corresponding function at tasksuccess to determine when all the tasks are success, which means jobfinished and Awaitresult, is to wait for jobfinished to be placed.
You can see that the Jobwaiter instance is created in the Submitjob function, the event instance passed in as a parameter, and the jobfailed function of Jobwaiter is called when an error occurs in the Handlejobsubmitted function.
The following is the code for the Jobwaiter class:
Private[Spark] class jobwaiter[T](Dagscheduler:dagscheduler, Val jobid:int, Totaltasks:in T, Resulthandler: (Int, T) = = Unit) extendsJoblistener {Private varFinishedtasks =0 //Is the job as a whole finished (succeeded or failed)? @volatile Private var_jobfinished = Totaltasks = =0 defjobfinished = _jobfinished//If The job is finished, this would be it result. In the case of 0 task jobs (e.g. Zero ///partition RDDs), we set the Jobresult directly to jobsucceeded. Private varJobresult:jobresult =if(jobfinished) jobsucceededElse NULL /** * Sends a signal to the Dagscheduler to cancel the job. The cancellation itself is handled * asynchronously. After the "Scheduler cancels all" belonging to the "this job, it * would fail this job with a sparkexception . */ defCancel () {Dagscheduler.canceljob (jobId)}Override defTasksucceeded (Index:int, result:any): Unit = synchronized {if(_jobfinished) {Throw NewUnsupportedoperationexception ("tasksucceeded () called on a finished jobwaiter")} resulthandler (index, result.asinstanceof[t]) Finishedtasks + =1 if(Finishedtasks = = totaltasks) {_jobfinished =trueJobresult = jobsucceeded This. Notifyall ()}}Override defJobfailed (exception:exception): Unit = synchronized {_jobfinished =trueJobresult = jobfailed (Exception) This. Notifyall ()}defAwaitresult (): Jobresult = synchronized { while(!_jobfinished) { This. Wait ()}returnJobresult}}
Summary
This section describes a few small details in the Dagscheduler.scala file, and I'll analyze the stage division and dependencies in the Dagscheduler.scala file in the next article.
reprint Please indicate the author Jason Ding and its provenance
Gitcafe Blog Home page (http://jasonding1354.gitcafe.io/)
GitHub Blog Home page (http://jasonding1354.github.io/)
CSDN Blog (http://blog.csdn.net/jasonding1354)
Jane Book homepage (http://www.jianshu.com/users/2bd9b48f6ea8/latest_articles)
Google search jasonding1354 go to my blog homepage
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
"Spark" Dagscheduler Source Analysis 2