In-depth understanding of spark-taskscheduler,schedulerbackend source Analysis

Source: Internet
Author: User

The last time I analyzed the dagshceduler is how to split the task into Job,stage,task, but the split is only a logical result, saved as a Resultstage object, and did not execute;

And the task being performed is the Spark's TaskScheduler module and the Shcedulerbackend module,

Taskcheduler module is responsible for task scheduling, Schedulerbackend is responsible for the voluntary application of the task, these two combinations close, the realization is also realized together;

Open the internal properties of the Sparkcontext, You can see that the object of TaskScheduler (Org.apache.spark.scheduler.TaskScheduler) is a trait (Scala's term, simply understood as Java-like interface), because the way the task is submitted is more Can be either yarn-client mode or Yarn-cluster model, depending on the parameter master that was set when submitting the spark submission.

The master setting is different, the final implementation is also different, when the yarn-client mode, the task implementation is Yarnscheduler.

The same schedulerbackend is also a trait, the specific implementation is based on the spark.master to decide, if the yarn-client mode, the implementation is yarnclientschedulerbackend.

Specifically, look at the code implementation:

Sparkcontext#createtaskscheduler

Sparkcontext calls Createtaskscheduler, depending on master to determine the actual type of the build, Taskscheduler,schedulerbackend

Val (sched, ts) = Sparkcontext.createtaskscheduler (this//  The master here is the value of the "Spark.master" parameter, String type _schedulerbackend = sched// generate schedulerbackend_taskscheduler = ts// Build TaskScheduler_taskscheduler. Start ()

Into the Createtaskscheduler method, the concrete implementation of the model according to the master, there are yarn-client,yarn-cluster,local and so on;

We only look at the yarn-client mode (usually use more time, Yarn-client mode when the driver in the client, then the output of the log will be viewable locally, Yarn-cluster mode driver is under the resource manager, First the log is not very easy to see), you can see that the internal implementation is based on match case to achieve matching. Yarn-clent mode, schedulerbackend implementation Org.apache.spark.scheduler.cluster.yarnclientschedulerbackend,taskscheduler achieve Org.apache.spark.scheduler.cluster.YarnScheduler;

  Case"Yarn-client" =Val Scheduler=Try{val Clazz= Utils.classforname ("Org.apache.spark.scheduler.cluster.YarnScheduler") Val Cons=Clazz.getconstructor (Classof[sparkcontext]) cons.newinstance (SC). Asinstanceof[taskschedulerimpl]} 
    Catch {           CaseE:exception = {            Throw NewSparkexception ("YARN mode not available?"), E)} } Val Backend=Try{val Clazz=Utils.classforname ("Org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend") Val Cons=Clazz.getconstructor (Classof[taskschedulerimpl], Classof[sparkcontext]) cons.newinstance (Scheduler, SC). AsIn Stanceof[coarsegrainedschedulerbackend]}Catch {           CaseE:exception = {            Throw NewSparkexception ("YARN mode not available?"), E)} } scheduler.initialize (Backend) (backend, scheduler)

After the actual dispatch type has been obtained according to master, there is no immediate return,

In-depth understanding of spark-taskscheduler,schedulerbackend source Analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.