In-depth understanding of spark-Two scheduling mode Fifo,fair mode

Source: Internet
Author: User

Before we know that a task submission will be split into Job,stage,task by the DAG and finally submitted to TaskScheduler, The TaskScheduler and schedulerbackend two classes are initialized according to master in the commit taskscheduler, and a scheduling pool is initialized;

1. Scheduling Pool Comparison

Initialize the schedule pools pool according to mode

 def Initialize (backend:schedulerbackend) { this . Backend = backend  //  temporarily set Rootpool name to empty here you can see that the minimum set of schedule pool initialization is 0  rootpool = new  Pool ("", Schedulingmode, 0, 0 = {schedulingmode match { case  Schedulingmode.fifo = new   Fifoschedulablebuilder (Rootpool)  case  Schedulingmode.fair = new   Fairschedulableb Uilder (Rootpool, conf)}} schedulablebuilder.buildpools ()}  

FIFO mode

This will be based on Spark.scheduler.mode to set the FIFO or FAIR, the default is FIFO mode;

FIFO mode do nothing, implementation of the default Schedulerablebuilder method, the establishment of the scheduling pool is also empty, Addtasksetmaneger is also called the default;

It is easy to understand that the default mode FIFO does nothing.

Fair mode

The fair mode overrides the Buildpools method, reads the default path $SPARK _home/conf/fairscheduler.xml file, or spark.scheduler.allocation.file sets the user-defined profile by parameter.

is configured in the file

poolname Thread pool Name

Schedulermode Scheduling Mode (Fifo,fair only two types)

Minshare number of thread cores for initial size

Wight the weight of the dispatch pool

override Def buildpools () {    = None    try  {      = Option {        = >          new  FileInputStream (f)        }.getorelse {          Utils.getSparkClassLoader.getResourceAsStream (default_scheduler_file)        }      }      =  Buildfairschedulerpool (i)}    finally  {      Is.foreach (_.close ())    }     // finally create "default" Pool     Builddefaultpool ()  }

It also overrides the Addtaskmanager method

override Def Addtasksetmanager (manager:schedulable, properties:properties) {var poolname=default_pool_name var parentpool=rootpool.getschedulablebyname (poolname)if(Properties! =NULL) {poolname=Properties.getproperty (fair_scheduler_properties, default_pool_name) Parentpool=rootpool.getschedulablebyname (poolname)if(Parentpool = =NULL) {        //We'll create a new pool that user have configured in app//instead of being defined in XML fileParentpool =NewPool (poolname, Default_scheduling_mode, Default_minimum_share, Default_weight) rootpool.addschedulabl E (Parentpool) loginfo ("Created pool%s, Schedulingmode:%s, Minshare:%d, Weight:%d". Format (poolname, Default_scheduling_mode, Default_minimum_share, Default_weight))} } parentpool.addschedulable (manager) Loginfo ("Added Task Set" + Manager.name + "tasks to pool" +poolname)}

The logic is to put the pool in the configuration file, or the default pool into the Rootpool, and then put the Tasksetmanager into rootpool corresponding sub pool;

2. Scheduling algorithm Comparison

In addition to the initialization of the scheduling pool inconsistencies, its implementation of the scheduling algorithm is inconsistent

The implementation of the scheduling pool, in the internal implementation method will also be based on mode inconsistency to achieve scheduling differences

var tasksetschedulingalgorithm:schedulingalgorithm = {    schedulingmode match {      case Schedulingmode.fair =        new  fairschedulingalgorithm ()       case Schedulingmode.fifo =        new  fifoschedulingalgorithm ()    }  }

FIFO mode

FIFO mode scheduling method is easy to understand, compared Stageid, who small who first executed;

This is also very well understood, Stageid small task is generally the lowest level of recursion, is the first to submit to the scheduling pool;

Private[Spark]classFifoschedulingalgorithmextendsSchedulingalgorithm {override Def comparator (s1:schedulable, s2:schedulable): Boolean={val priority1=s1.priority Val Priority2=s2.priority var res= Math.signum (Priority1-priority2)if(res = = 0) {val stageId1=S1.stageid Val stageId2=S2.stageid Res= Math.signum (StageId1-stageId2)} if(Res < 0) {      true    } Else {      false    }  }}

Fair mode

Fair mode, a little more complicated;

But it's easier to read,

1. Compare the number of cores used by the runningtask of the two stage, in fact, can also be understood as the quantity of the task, who the small priority high;

2. Compare the runningtask weights of the two stage, the weight of who first executed;

3. If the previous has been, then the comparison name (string comparison), who first executed;

Private[Spark]classFairschedulingalgorithmextendsSchedulingalgorithm {override Def comparator (s1:schedulable, s2:schedulable): Boolean={val MinShare1=S1.minshare Val MinShare2=S2.minshare Val runningTasks1=s1.runningtasks Val RunningTasks2=s2.runningtasks Val s1needy= RunningTasks1 <minShare1 Val s2needy= RunningTasks2 <MinShare2 Val minShareRatio1= Runningtasks1.todouble/math.max (MinShare1, 1.0). todouble Val MinShareRatio2= Runningtasks2.todouble/math.max (MinShare2, 1.0). todouble Val TaskToWeightRatio1= runningtasks1.todouble/s1.weight.toDouble Val TaskToWeightRatio2= runningtasks2.todouble/s2.weight.toDouble var compare:int= 0if(S1needy &&!)s2needy) {      return true    } Else if(!s1needy &&s2needy) {      return false    } Else if(S1needy &&s2needy) {Compare=Minshareratio1.compareto (MinShareRatio2)}Else{Compare=Tasktoweightratio1.compareto (TaskToWeightRatio2)}if(Compare < 0) {      true    } Else if(Compare > 0) {      false    } Else{s1.name<S2.name}}

Summary: Although to understand the scheduling mode of spark, previously in the implementation of the basic is not used, did not think Spark has such a hidden function ...

In-depth understanding of spark-Two scheduling mode Fifo,fair mode

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.