"Spark" Dagscheduler Source Analysis 2

Source: Internet
Author: User

Introduced

Previous article Dagscheduler source analysis mainly from the submission job of the process point of view of the dagscheduler source of important functions and key points, this article Dagscheduler source Analysis 2 main reference fxjwind spark source analysis- Dagscheduler, this article introduces several important functions that were not previously introduced in the Dagscheduler file.

Event handling

Before the Spark 1.0 release, the EventQueue private member was added to the Dagscheduler class to set the EventLoop thread to loop through the read event for processing. In the Spark 1.0 source code, event handling is performed by actor, and the Dageventprocessactor class involved in the main event handling work.
Perhaps because Scala no longer supports native actor mode, and the Akka actor as the official standard, in my view of the Spark 1.4 source, Dagscheduler re-use the EventQueue way of event processing, in order to clarify the code logic, Less coupling, 1.4 of the source code to write the Dagschedulereventprocessloop class for event processing.

privateclass DAGSchedulerEventProcessLoop(dagScheduler: DAGScheduler)  extends EventLoop[DAGSchedulerEvent]("dag-scheduler-event-loop"with Logging {

Here Dagschedulereventprocessloop inherits the EventLoop class, where:

Private[Spark]Abstract  class eventloop[E](name:string) extends Logging {  Private ValEventqueue:blockingqueue[e] =NewLinkedblockingdeque[e] ()Private Valstopped =NewAtomicboolean (false)Private ValEventthread =NewThread (name) {Setdaemon (true)Override defRun (): Unit = {Try{ while(!stopped.get) {Valevent = Eventqueue.take ()Try{OnReceive (Event)}Catch{ CaseNonfatal (e) + = {Try{OnError (e)}Catch{ CaseNonfatal (E) = LogError ("Unexpected error in"+ Name, E)}}}}Catch{ CaseIe:interruptedexception =//Exit even if EventQueue is not empty         CaseNonfatal (E) = LogError ("Unexpected error in"+ Name, e)}} ...

We can see that Dagscheduler sends an event to the EventQueue by delivering an event to the Dagschedulereventprocessloop object, Eventthread continuously obtains the event from the EventQueue and calls the OnReceive function for processing.

  overridedefmatch {    case JobSubmitted(jobId, rdd, func, partitions, allowLocal, callSite, listener, properties) =>      dagScheduler.handleJobSubmitted(jobId, rdd, func, partitions, allowLocal, callSite,        listener, properties)  ......
Jobwaiter

Jobwaiter first implements the tasksucceeded and jobfailed functions of Joblistener, When Dagscheduler receives an event of tasksuccess or fail, it invokes the corresponding function at tasksuccess to determine when all the tasks are success, which means jobfinished and Awaitresult, is to wait for jobfinished to be placed.
You can see that the Jobwaiter instance is created in the Submitjob function, the event instance passed in as a parameter, and the jobfailed function of Jobwaiter is called when an error occurs in the Handlejobsubmitted function.

The following is the code for the Jobwaiter class:

Private[Spark] class jobwaiter[T](Dagscheduler:dagscheduler, Val jobid:int, Totaltasks:in T, Resulthandler: (Int, T) = = Unit)  extendsJoblistener {Private varFinishedtasks =0  //Is the job as a whole finished (succeeded or failed)?  @volatile  Private var_jobfinished = Totaltasks = =0  defjobfinished = _jobfinished//If The job is finished, this would be it result. In the case of 0 task jobs (e.g. Zero  ///partition RDDs), we set the Jobresult directly to jobsucceeded.  Private varJobresult:jobresult =if(jobfinished) jobsucceededElse NULL  /** * Sends a signal to the Dagscheduler to cancel the job. The cancellation itself is handled * asynchronously. After the "Scheduler cancels all" belonging to the "this job, it * would fail this job with a sparkexception   . */  defCancel () {Dagscheduler.canceljob (jobId)}Override defTasksucceeded (Index:int, result:any): Unit = synchronized {if(_jobfinished) {Throw NewUnsupportedoperationexception ("tasksucceeded () called on a finished jobwaiter")} resulthandler (index, result.asinstanceof[t]) Finishedtasks + =1    if(Finishedtasks = = totaltasks) {_jobfinished =trueJobresult = jobsucceeded This. Notifyall ()}}Override defJobfailed (exception:exception): Unit = synchronized {_jobfinished =trueJobresult = jobfailed (Exception) This. Notifyall ()}defAwaitresult (): Jobresult = synchronized { while(!_jobfinished) { This. Wait ()}returnJobresult}}
Summary

This section describes a few small details in the Dagscheduler.scala file, and I'll analyze the stage division and dependencies in the Dagscheduler.scala file in the next article.

reprint Please indicate the author Jason Ding and its provenance
Gitcafe Blog Home page (http://jasonding1354.gitcafe.io/)
GitHub Blog Home page (http://jasonding1354.github.io/)
CSDN Blog (http://blog.csdn.net/jasonding1354)
Jane Book homepage (http://www.jianshu.com/users/2bd9b48f6ea8/latest_articles)
Google search jasonding1354 go to my blog homepage

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

"Spark" Dagscheduler Source Analysis 2

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.