Before we know that a task submission will be split into Job,stage,task by the DAG and finally submitted to TaskScheduler, The TaskScheduler and schedulerbackend two classes are initialized according to master in the commit taskscheduler, and a scheduling pool is initialized;
1. Scheduling Pool Comparison
Initialize the schedule pools pool according to mode
def Initialize (backend:schedulerbackend) { this . Backend = backend // temporarily set Rootpool name to empty here you can see that the minimum set of schedule pool initialization is 0 rootpool = new Pool ("", Schedulingmode, 0, 0 = {schedulingmode match { case Schedulingmode.fifo = new Fifoschedulablebuilder (Rootpool) case Schedulingmode.fair = new Fairschedulableb Uilder (Rootpool, conf)}} schedulablebuilder.buildpools ()}
FIFO mode
This will be based on Spark.scheduler.mode to set the FIFO or FAIR, the default is FIFO mode;
FIFO mode do nothing, implementation of the default Schedulerablebuilder method, the establishment of the scheduling pool is also empty, Addtasksetmaneger is also called the default;
It is easy to understand that the default mode FIFO does nothing.
Fair mode
The fair mode overrides the Buildpools method, reads the default path $SPARK _home/conf/fairscheduler.xml file, or spark.scheduler.allocation.file
sets the user-defined profile by parameter.
is configured in the file
poolname Thread pool Name
Schedulermode Scheduling Mode (Fifo,fair only two types)
Minshare number of thread cores for initial size
Wight the weight of the dispatch pool
override Def buildpools () { = None try { = Option { = > new FileInputStream (f) }.getorelse { Utils.getSparkClassLoader.getResourceAsStream (default_scheduler_file) } } = Buildfairschedulerpool (i)} finally { Is.foreach (_.close ()) } // finally create "default" Pool Builddefaultpool () }
It also overrides the Addtaskmanager method
override Def Addtasksetmanager (manager:schedulable, properties:properties) {var poolname=default_pool_name var parentpool=rootpool.getschedulablebyname (poolname)if(Properties! =NULL) {poolname=Properties.getproperty (fair_scheduler_properties, default_pool_name) Parentpool=rootpool.getschedulablebyname (poolname)if(Parentpool = =NULL) { //We'll create a new pool that user have configured in app//instead of being defined in XML fileParentpool =NewPool (poolname, Default_scheduling_mode, Default_minimum_share, Default_weight) rootpool.addschedulabl E (Parentpool) loginfo ("Created pool%s, Schedulingmode:%s, Minshare:%d, Weight:%d". Format (poolname, Default_scheduling_mode, Default_minimum_share, Default_weight))} } parentpool.addschedulable (manager) Loginfo ("Added Task Set" + Manager.name + "tasks to pool" +poolname)}
The logic is to put the pool in the configuration file, or the default pool into the Rootpool, and then put the Tasksetmanager into rootpool corresponding sub pool;
2. Scheduling algorithm Comparison
In addition to the initialization of the scheduling pool inconsistencies, its implementation of the scheduling algorithm is inconsistent
The implementation of the scheduling pool, in the internal implementation method will also be based on mode inconsistency to achieve scheduling differences
var tasksetschedulingalgorithm:schedulingalgorithm = { schedulingmode match { case Schedulingmode.fair = new fairschedulingalgorithm () case Schedulingmode.fifo = new fifoschedulingalgorithm () } }
FIFO mode
FIFO mode scheduling method is easy to understand, compared Stageid, who small who first executed;
This is also very well understood, Stageid small task is generally the lowest level of recursion, is the first to submit to the scheduling pool;
Private[Spark]classFifoschedulingalgorithmextendsSchedulingalgorithm {override Def comparator (s1:schedulable, s2:schedulable): Boolean={val priority1=s1.priority Val Priority2=s2.priority var res= Math.signum (Priority1-priority2)if(res = = 0) {val stageId1=S1.stageid Val stageId2=S2.stageid Res= Math.signum (StageId1-stageId2)} if(Res < 0) { true } Else { false } }}
Fair mode
Fair mode, a little more complicated;
But it's easier to read,
1. Compare the number of cores used by the runningtask of the two stage, in fact, can also be understood as the quantity of the task, who the small priority high;
2. Compare the runningtask weights of the two stage, the weight of who first executed;
3. If the previous has been, then the comparison name (string comparison), who first executed;
Private[Spark]classFairschedulingalgorithmextendsSchedulingalgorithm {override Def comparator (s1:schedulable, s2:schedulable): Boolean={val MinShare1=S1.minshare Val MinShare2=S2.minshare Val runningTasks1=s1.runningtasks Val RunningTasks2=s2.runningtasks Val s1needy= RunningTasks1 <minShare1 Val s2needy= RunningTasks2 <MinShare2 Val minShareRatio1= Runningtasks1.todouble/math.max (MinShare1, 1.0). todouble Val MinShareRatio2= Runningtasks2.todouble/math.max (MinShare2, 1.0). todouble Val TaskToWeightRatio1= runningtasks1.todouble/s1.weight.toDouble Val TaskToWeightRatio2= runningtasks2.todouble/s2.weight.toDouble var compare:int= 0if(S1needy &&!)s2needy) { return true } Else if(!s1needy &&s2needy) { return false } Else if(S1needy &&s2needy) {Compare=Minshareratio1.compareto (MinShareRatio2)}Else{Compare=Tasktoweightratio1.compareto (TaskToWeightRatio2)}if(Compare < 0) { true } Else if(Compare > 0) { false } Else{s1.name<S2.name}}
Summary: Although to understand the scheduling mode of spark, previously in the implementation of the basic is not used, did not think Spark has such a hidden function ...
In-depth understanding of spark-Two scheduling mode Fifo,fair mode