"Spark" Dagscheduler Source Analysis 2

Last Update:2015-07-15 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Introduced

Previous article Dagscheduler source analysis mainly from the submission job of the process point of view of the dagscheduler source of important functions and key points, this article Dagscheduler source Analysis 2 main reference fxjwind spark source analysis- Dagscheduler, this article introduces several important functions that were not previously introduced in the Dagscheduler file.

Event handling

Before the Spark 1.0 release, the EventQueue private member was added to the Dagscheduler class to set the EventLoop thread to loop through the read event for processing. In the Spark 1.0 source code, event handling is performed by actor, and the Dageventprocessactor class involved in the main event handling work.
Perhaps because Scala no longer supports native actor mode, and the Akka actor as the official standard, in my view of the Spark 1.4 source, Dagscheduler re-use the EventQueue way of event processing, in order to clarify the code logic, Less coupling, 1.4 of the source code to write the Dagschedulereventprocessloop class for event processing.

privateclass DAGSchedulerEventProcessLoop(dagScheduler: DAGScheduler)  extends EventLoop[DAGSchedulerEvent]("dag-scheduler-event-loop"with Logging {

Here Dagschedulereventprocessloop inherits the EventLoop class, where:

Private[Spark]Abstract  class eventloop[E](name:string) extends Logging {  Private ValEventqueue:blockingqueue[e] =NewLinkedblockingdeque[e] ()Private Valstopped =NewAtomicboolean (false)Private ValEventthread =NewThread (name) {Setdaemon (true)Override defRun (): Unit = {Try{ while(!stopped.get) {Valevent = Eventqueue.take ()Try{OnReceive (Event)}Catch{ CaseNonfatal (e) + = {Try{OnError (e)}Catch{ CaseNonfatal (E) = LogError ("Unexpected error in"+ Name, E)}}}}Catch{ CaseIe:interruptedexception =//Exit even if EventQueue is not empty         CaseNonfatal (E) = LogError ("Unexpected error in"+ Name, e)}} ...

We can see that Dagscheduler sends an event to the EventQueue by delivering an event to the Dagschedulereventprocessloop object, Eventthread continuously obtains the event from the EventQueue and calls the OnReceive function for processing.

  overridedefmatch {    case JobSubmitted(jobId, rdd, func, partitions, allowLocal, callSite, listener, properties) =>      dagScheduler.handleJobSubmitted(jobId, rdd, func, partitions, allowLocal, callSite,        listener, properties)  ......

Jobwaiter

Jobwaiter first implements the tasksucceeded and jobfailed functions of Joblistener, When Dagscheduler receives an event of tasksuccess or fail, it invokes the corresponding function at tasksuccess to determine when all the tasks are success, which means jobfinished and Awaitresult, is to wait for jobfinished to be placed.
You can see that the Jobwaiter instance is created in the Submitjob function, the event instance passed in as a parameter, and the jobfailed function of Jobwaiter is called when an error occurs in the Handlejobsubmitted function.

The following is the code for the Jobwaiter class:

Private[Spark] class jobwaiter[T](Dagscheduler:dagscheduler, Val jobid:int, Totaltasks:in T, Resulthandler: (Int, T) = = Unit)  extendsJoblistener {Private varFinishedtasks =0  //Is the job as a whole finished (succeeded or failed)?  @volatile  Private var_jobfinished = Totaltasks = =0  defjobfinished = _jobfinished//If The job is finished, this would be it result. In the case of 0 task jobs (e.g. Zero  ///partition RDDs), we set the Jobresult directly to jobsucceeded.  Private varJobresult:jobresult =if(jobfinished) jobsucceededElse NULL  /** * Sends a signal to the Dagscheduler to cancel the job. The cancellation itself is handled * asynchronously. After the "Scheduler cancels all" belonging to the "this job, it * would fail this job with a sparkexception   . */  defCancel () {Dagscheduler.canceljob (jobId)}Override defTasksucceeded (Index:int, result:any): Unit = synchronized {if(_jobfinished) {Throw NewUnsupportedoperationexception ("tasksucceeded () called on a finished jobwaiter")} resulthandler (index, result.asinstanceof[t]) Finishedtasks + =1    if(Finishedtasks = = totaltasks) {_jobfinished =trueJobresult = jobsucceeded This. Notifyall ()}}Override defJobfailed (exception:exception): Unit = synchronized {_jobfinished =trueJobresult = jobfailed (Exception) This. Notifyall ()}defAwaitresult (): Jobresult = synchronized { while(!_jobfinished) { This. Wait ()}returnJobresult}}

Summary

This section describes a few small details in the Dagscheduler.scala file, and I'll analyze the stage division and dependencies in the Dagscheduler.scala file in the next article.

reprint Please indicate the author Jason Ding and its provenance
Gitcafe Blog Home page (http://jasonding1354.gitcafe.io/)
GitHub Blog Home page (http://jasonding1354.github.io/)
CSDN Blog (http://blog.csdn.net/jasonding1354)
Jane Book homepage (http://www.jianshu.com/users/2bd9b48f6ea8/latest_articles)
Google search jasonding1354 go to my blog homepage

"Spark" Dagscheduler Source Analysis 2

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

"Spark" Dagscheduler Source Analysis 2

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

"Spark" Dagscheduler Source Analysis 2

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support