In-depth understanding of spark-Two scheduling mode Fifo,fair mode

Last Update:2018-10-14 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Before we know that a task submission will be split into Job,stage,task by the DAG and finally submitted to TaskScheduler, The TaskScheduler and schedulerbackend two classes are initialized according to master in the commit taskscheduler, and a scheduling pool is initialized;

1. Scheduling Pool Comparison

Initialize the schedule pools pool according to mode

 def Initialize (backend:schedulerbackend) { this . Backend = backend  //  temporarily set Rootpool name to empty here you can see that the minimum set of schedule pool initialization is 0  rootpool = new  Pool ("", Schedulingmode, 0, 0 = {schedulingmode match { case  Schedulingmode.fifo = new   Fifoschedulablebuilder (Rootpool)  case  Schedulingmode.fair = new   Fairschedulableb Uilder (Rootpool, conf)}} schedulablebuilder.buildpools ()}

FIFO mode

This will be based on Spark.scheduler.mode to set the FIFO or FAIR, the default is FIFO mode;

FIFO mode do nothing, implementation of the default Schedulerablebuilder method, the establishment of the scheduling pool is also empty, Addtasksetmaneger is also called the default;

It is easy to understand that the default mode FIFO does nothing.

Fair mode

The fair mode overrides the Buildpools method, reads the default path $SPARK _home/conf/fairscheduler.xml file, or spark.scheduler.allocation.file sets the user-defined profile by parameter.

is configured in the file

poolname Thread pool Name

Schedulermode Scheduling Mode (Fifo,fair only two types)

Minshare number of thread cores for initial size

Wight the weight of the dispatch pool

override Def buildpools () {    = None    try  {      = Option {        = >          new  FileInputStream (f)        }.getorelse {          Utils.getSparkClassLoader.getResourceAsStream (default_scheduler_file)        }      }      =  Buildfairschedulerpool (i)}    finally  {      Is.foreach (_.close ())    }     // finally create "default" Pool     Builddefaultpool ()  }

It also overrides the Addtaskmanager method

override Def Addtasksetmanager (manager:schedulable, properties:properties) {var poolname=default_pool_name var parentpool=rootpool.getschedulablebyname (poolname)if(Properties! =NULL) {poolname=Properties.getproperty (fair_scheduler_properties, default_pool_name) Parentpool=rootpool.getschedulablebyname (poolname)if(Parentpool = =NULL) {        //We'll create a new pool that user have configured in app//instead of being defined in XML fileParentpool =NewPool (poolname, Default_scheduling_mode, Default_minimum_share, Default_weight) rootpool.addschedulabl E (Parentpool) loginfo ("Created pool%s, Schedulingmode:%s, Minshare:%d, Weight:%d". Format (poolname, Default_scheduling_mode, Default_minimum_share, Default_weight))} } parentpool.addschedulable (manager) Loginfo ("Added Task Set" + Manager.name + "tasks to pool" +poolname)}

The logic is to put the pool in the configuration file, or the default pool into the Rootpool, and then put the Tasksetmanager into rootpool corresponding sub pool;

2. Scheduling algorithm Comparison

In addition to the initialization of the scheduling pool inconsistencies, its implementation of the scheduling algorithm is inconsistent

The implementation of the scheduling pool, in the internal implementation method will also be based on mode inconsistency to achieve scheduling differences

var tasksetschedulingalgorithm:schedulingalgorithm = {    schedulingmode match {      case Schedulingmode.fair =        new  fairschedulingalgorithm ()       case Schedulingmode.fifo =        new  fifoschedulingalgorithm ()    }  }

FIFO mode

FIFO mode scheduling method is easy to understand, compared Stageid, who small who first executed;

This is also very well understood, Stageid small task is generally the lowest level of recursion, is the first to submit to the scheduling pool;

Private[Spark]classFifoschedulingalgorithmextendsSchedulingalgorithm {override Def comparator (s1:schedulable, s2:schedulable): Boolean={val priority1=s1.priority Val Priority2=s2.priority var res= Math.signum (Priority1-priority2)if(res = = 0) {val stageId1=S1.stageid Val stageId2=S2.stageid Res= Math.signum (StageId1-stageId2)} if(Res < 0) {      true    } Else {      false    }  }}

Fair mode

Fair mode, a little more complicated;

But it's easier to read,

1. Compare the number of cores used by the runningtask of the two stage, in fact, can also be understood as the quantity of the task, who the small priority high;

2. Compare the runningtask weights of the two stage, the weight of who first executed;

3. If the previous has been, then the comparison name (string comparison), who first executed;

Private[Spark]classFairschedulingalgorithmextendsSchedulingalgorithm {override Def comparator (s1:schedulable, s2:schedulable): Boolean={val MinShare1=S1.minshare Val MinShare2=S2.minshare Val runningTasks1=s1.runningtasks Val RunningTasks2=s2.runningtasks Val s1needy= RunningTasks1 <minShare1 Val s2needy= RunningTasks2 <MinShare2 Val minShareRatio1= Runningtasks1.todouble/math.max (MinShare1, 1.0). todouble Val MinShareRatio2= Runningtasks2.todouble/math.max (MinShare2, 1.0). todouble Val TaskToWeightRatio1= runningtasks1.todouble/s1.weight.toDouble Val TaskToWeightRatio2= runningtasks2.todouble/s2.weight.toDouble var compare:int= 0if(S1needy &&!)s2needy) {      return true    } Else if(!s1needy &&s2needy) {      return false    } Else if(S1needy &&s2needy) {Compare=Minshareratio1.compareto (MinShareRatio2)}Else{Compare=Tasktoweightratio1.compareto (TaskToWeightRatio2)}if(Compare < 0) {      true    } Else if(Compare > 0) {      false    } Else{s1.name<S2.name}}

Summary: Although to understand the scheduling mode of spark, previously in the implementation of the basic is not used, did not think Spark has such a hidden function ...

In-depth understanding of spark-Two scheduling mode Fifo,fair mode

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

In-depth understanding of spark-Two scheduling mode Fifo,fair mode

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

In-depth understanding of spark-Two scheduling mode Fifo,fair mode

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support