DT Big Data Dream Factory 35th Class spark system run cycle flow

Source: Internet
Author: User
Tags sorted by name aliyun

The contents of this lesson:

1. How TaskScheduler Works

2. TaskScheduler Source Code

First, TaskScheduler working principle

Overall scheduling diagram:


Through the first few lectures, RDD and dagscheduler and workers have been in-depth explanation, this lesson we mainly explain the operation principle of TaskScheduler.

Review:

Dagscheduler for the entire job division of multiple stages, the division is from the back to the backward process, run from the back of the run. There are many tasks in each stage task,task can be executed in parallel. Their execution logic is exactly the same, except that the data being processed is different, and dagscheduler all the tasks it constructs are submitted to the underlying scheduler TaskScheduler by means of taskset.

& TaskScheduler is a trait, which is decoupled from the specific resource scheduling, which conforms to the principle of dependency abstraction in object-oriented, and brings the pluggable nature of the underlying resource scheduler, which leads to the many resource scheduling modes that spark can run, such as: StandAlone, Yarn, Mesos, Local, EC2, or other custom resource scheduler.

In standalone mode, let's look at an implementation taskschedulerimpl of TaskScheduler.

The core task of 1.TaskScheduler

TaskScheduler's core task is to submit taskset to the cluster and report the results, primarily responsible for scheduling between the different jobs of application.

Specifically, the following points are:

(1) Create and maintain a Tasksetmanager for Taskset and track the local and error messages of the task;

(2) Start the retry mechanism when the task execution fails, and the Straggle task will start the backup task on the other node;

(3) Report to Dagscheduler on implementation, including information such as Fetch failed error when shuffle output is lost.

2. Core features of TaskScheduler

(1) Register the current program.

& TaskScheduler internal will hold schedulerbackend reference, Schedulerbackend is a trait, it is mainly responsible for the management of executor resources, from the standalone model, The concrete implementation is sparkdeployschedulerbackend. Sparkdeployschedulerbackend constructs a appclient instance at startup and starts Clientendpoint (message loop body) when the instance starts. Clientendpoint will register the current program with master at startup.

(1) Registration executor information.

& Sparkdeployschedulerbackend's parent class Coarsegrainedschedulerbackend instantiates a type driverendpoint message loop body at start. Driverendpoint is the driver object when our program is running. Sparkdeployschedulerbackend is specifically responsible for collecting resource information on worker When the executorbackend starts, it sends Registeredexecutor information to the Driverbackend in driver to register. (You can refer to the Master Registration section of the previous lecture.) At this point Sparkdeployschedulerbackend has mastered the computing resources owned by the current application application, TaskScheduler is to run the task specifically through the compute resources owned by Sparkdeployschedulerbackend.

& Add: Sparkcontext, Dagscheduler, Taskschedulerimpl, sparkdeployschedulerbackend are instantiated once when the application is started, These objects always exist during application existence. Sparkdeployschedulerbackend is an auxiliary class that helps the task in Taskschedulerimpl to get the compute resources and send the task to the cluster

The timing of the instantiation of 3.TaskScheduler

TaskScheduler is instantiated when the Sparkcontext is instantiated, such as Taskschedulerimpl. (Spark1.6.0 Sparkcontext.scala #521-#526)

Create and start the scheduler    val (sched, ts) = Sparkcontext.createtaskscheduler (this, master)    _ Schedulerbackend = sched    _taskscheduler = ts    _dagscheduler = new Dagscheduler (This)    _ Heartbeatreceiver.ask[boolean] (Taskschedulerisset)

TaskScheduler and Sparkdeployschedulerbackend are created in the Sparkcontext#createtaskscheduler method:
Private def createtaskscheduler (      sc:sparkcontext,      master:string): (Schedulerbackend, TaskScheduler) = {    Import sparkmasterregex._     //Omit section code case spark_regex (sparkurl)        = Val Scheduler = new Taskschedulerimpl (SC) C10/>val masterurls = Sparkurl.split (","). Map ("spark://" + _)        val backend = new Sparkdeployschedulerbackend ( Scheduler, SC, masterurls)//Use Sparkdeployschedulerbackend to initialize TaskScheduler        scheduler.initialize (Backend)  //1        (Backend, Scheduler)}
4.TaskScheduler initialization

<pre name= "code" class= "plain" >//1 is called Def Initialize (backend:schedulerbackend) {    this.backend = backend    //Temporarily set Rootpool name to empty   Rootpool = new Pool ("", Schedulingmode, 0, 0)//2//creates a scheduler based on the algorithm in Rootpool Like    Schedulablebuilder = {      Schedulingmode match {case        schedulingmode.fifo =>//fifo mode          New Fifoschedulablebuilder (rootpool) case        schedulingmode.fair =>//fair mode          new Fairschedulablebuilder ( Rootpool, conf)      }    }    //Create Schedule Pool    schedulablebuilder.buildpools ()  }


Create a Schedule Pool

(1) Create rootpool (implement scheduling algorithm)
2 called Private[spark] class Pool (    Val poolname:string,    Val schedulingmode:schedulingmode,    Initminshare : Int,    initweight:int)  extends schedulable with  Logging {  val schedulablequeue = new Concurrentlinkedqueue[schedulable]  val schedulablenametoschedulable = new concurrenthashmap[string, Schedulable ]  var weight = Initweight  var minshare = Initminshare  var runningtasks = 0  var priority = 0  //A PO Ol ' s stage ID is used the tie-in scheduling.  var Stageid =-1  var name = poolname  var parent:pool = null//Creates an instance of the scheduling algorithm based on different scheduling algorithms  var Tasksetschedulingalgo Rithm:schedulingalgorithm = {    Schedulingmode match {case      Schedulingmode.fair =        New Fairschedulingalgorithm () case      Schedulingmode.fifo =        new Fifoschedulingalgorithm ()    }  }

(2) To create a scheduler object

The Org.apache.spark.scheduler.Pool contains a set of entities that can be dispatched. For FIFO, Rootpool contains a set of Tasksetmanager, and for fair, Rootpool contains a set of pool that forms a dispatch tree, where the leaf node of this tree is Tasksetmanager.

(3) Create a schedule pool

Schedulablebuilder.buildpools () varies by dispatch mode, if it is FIFO, its implementation is empty as follows:

Private[spark] class Fifoschedulablebuilder (Val rootpool:pool)  extends Schedulablebuilder with Logging {  Override Def Buildpools () {    //Nothing  }//defines how to add Tasksetmanager to the dispatch pool with  override Def Addtasksetmanager ( Manager:schedulable, properties:properties) {    rootpool.addschedulable (manager)//3  }}
because Rootpool does not contain a pool, it directly contains tasksetmanager:submittasks to add Tasksetmanager directly to the Rootpool (dispatch queue, queue default is first in, first out)。
Add a scheduled object to dispatch queue  3 called override def addschedulable (schedulable:schedulable) {    require (schedulable! = null)    Schedulablequeue.add (schedulable)    schedulablenametoschedulable.put (Schedulable.name, schedulable)    Schedulable.parent = This  }

The fair mode requires a certain configuration before running. It needs to build this scheduling tree based on the configuration file on a rootpool basis.

See the following code for implementation:
Override Def Buildpools () {    var is:option[inputstream] = None    try {is      = Option {//to Spark.scheduler.allocati On.file set the file name to create FileInputStream        schedulerallocfile.map {f =          new FileInputStream (f)        }.getorelse {/ /If Spark. Scheduler.allocation.file is not set, it is created directly with Fairscheduler.xml//fileinputstream          Utils.getSparkClassLoader.getResourceAsStream (default_scheduler_file)        }      }//creates a pool      with the content corresponding to IS Is.foreach {i = Buildfairschedulerpool (i)}    } finally {      Is.foreach (_.close ()    })} Create a pool named "Default"    Builddefaultpool ()  }

(4) Scheduling algorithm

Private[spark] Traitschedulingalgorithm {  <span style= "White-space:pre" ></span>def Comparator (S1: Schedulable, s2:schedulable): Boolean    }

From the code point of view, the scheduling algorithm is a trait, which requires sub-class implementation. Its essence is to encapsulate a comparison function. Subclasses only need to implement this comparison function.

(a) FIFO

The Order of FIFO task scheduling is used:

First of all to ensure that Jobid smaller first is scheduled, if it is the same job, then Stageid Small first is scheduled (the same job, perhaps multiple stages can be executed in parallel, such as the stage division process STAGE0 and Stage1).


Scheduling algorithm:

Private[spark] class Fifoschedulingalgorithm extends Schedulingalgorithm {//Compare scheduled objects S1 and S2, here S1 and S2 are actually tasksetmanager.  override Def comparator (s1:schedulable, s2:schedulable): Boolean = {    val priority1 = s1.priority     //This prior ity is actually the job ID    val priority2 = s2.priority     //Ibid.    var res = math.signum (priority1-priority2)  //First Compare job ID    if (res = = 0) {//If the job ID is the same, then compare the stage ID      val stageId1 = S1.stageid      val stageId2 = s2.stageid      res = ma Th.signum (STAGEID1-STAGEID2)    }    if (Res < 0) {      true    } else {      false    }}  }
(2) FAIR

For fair, first the pool that is hung under Rootpool determines the scheduling order, and then uses the same algorithm within each pool to determine the scheduling order of the Tasksetmanager.


Algorithm implementation:

Private[spark] class Fairschedulingalgorithm extends Schedulingalgorithm {override Def comparator (s1:schedulable, s2:s  chedulable): Boolean = {//minimum share, which can be understood as the minimum number of resources required for execution, i.e. CPU cores, other phases, the minimum number of cores required for the priority//dispatch Val MinShare1 = s1.minshare val minShare2 = s2.minshare//number of tasks to run Val runningTasks1 = s1.runningtasks val runningTasks2 = s2.runningtasks//See if there is a dispatch queue hungry, see assignable Whether the number of cores is less than the number of tasks, if the resources are not enough, then//In a starving state val s1needy = RunningTasks1 < minShare1 val S2needy = RunningTasks2 < minshare2/     /calculate shareratio, the minimum resource occupancy ratio, which can be understood here as the biased task of the lighter Val minShareRatio1 = Runningtasks1.todouble/math.max (MinShare1, 1.0). todouble Val MinShareRatio2 = Runningtasks2.todouble/math.max (MinShare2, 1.0). todouble//calculates the weight weight of a task as a weight, with the same number of tasks, and with a high weight of priority VA L TASKTOWEIGHTRATIO1 = runningtasks1.todouble/s1.weight.todouble val TaskToWeightRatio2 = runningtasks2.todouble/s2 . weight.todouble var compare:int = 0//First in Hunger priority if (S1needy &&!s2needy) {return true} else if (!s 1Needy && S2needy{return false} else if (S1needy && s2needy) {//Both are in starvation, requiring less resource consumption than the first compare = Minshareratio 1.compareTo (MinShareRatio2)} else {//both do not starve, compare weight ratio, low proportion of precedence compare = Tasktoweightratio1.compareto (TaskToWeightRatio2 )} if (Compare < 0) {true} else if (Compare > 0) {false} else {///If all are the same, then compare names in alphabetical order, so Name more important S1.name < S2.name}}}

Note:

& Fairness Principle is the principle of who the most need to give to whom, so hunger is preferred;

& Resource occupancy It's a bit confusing than this one, and it's easy to understand if you think of him as a greedy question. For all the tasks that are out of starvation can be understood, the load of the immediate resources you do not necessarily can effectively alleviate, do not give a small load, let its rapid use, after completion can release more resources, this is a greedy strategy. such as Joba and Jobb task quantity is 10,a Minshare is 2,b is 5, that occupancy ratio of 5 and 2, obviously B occupies a smaller, greedy strategy should give B first dispatch processing;

& For all are in a state of satisfaction, of course, who has a better weight of the decisive, weight than low priority (biased right);

& If all of the above comparisons are the same, then the first name dictionary sort priority (haha, the name is very important OH), the name AAA is preferred than ABC, so here is the name of the pool or Tasksetmanager to consider this point

(Note Source: https://yq.aliyun.com/articles/6041)

Add: The two scheduling algorithms targeted at comparable objects are Schedule specific objects, here we schedulable this object to do a simple explanation.

As mentioned earlier,schedulable can be scheduled in spark in two ways: Pool and Tasksetmanager. The pool is a dispatch pond, which can also have rootpool in the sub-pool,spark, that is, the root node defaults to a pool of nameless (default). There are different levels for FIFO and fair.

For FIFO mode scheduling,rootpool Management is directly Tasksetmanager, no sub Pool This concept is only two layers,rootpool and leaf nodes tasksetmanager, implemented as shown below.


but for FAIR This mode is three layers, and the root node is Rootpool, for nameless Pool, The next layer is user-defined Pool (do not specify a name with the default name default ), and the next layer is Tasksetmanager, that is, the root scheduling pool manages a set of scheduling pools, each of which manages its own Tasksetmanager , and its implementation is shown below.



The dispatch order here refers to the dispatch within a sparkcontext, in general we do not need the concept of pool, because there is no competition between the pool, but if we provide a spark application, everyone can submit a task, the server has a permanent task , for a sparkcontext, each user submits a task to be executed by its agent, then the task submitted for each user can set a pool according to the user level and task priority, so that there is a competitive relationship between the different users ' pool. The priority of the pool can be used to prioritize tasks and users, * * But it is important to emphasize a bit more, because in the fair mechanism, if other comparisons cannot be judged, the dictionary is sorted by name. (Supplementary Source: https://yq.aliyun.com/articles/6041)

5. Create Dagscheduler, call the Taskscheduler#start method (Sparkcontext during initialization)
sparkcontext.scala525)     _dagscheduler = new Dagscheduler (this) 526)     _heartbeatreceiver.ask[boolean] ( Taskschedulerisset) 527) 528)    //Start TaskScheduler after TaskScheduler sets Dagscheduler reference in Dagscheduler ' s529)    //constructor530)    _taskscheduler.start ()


Taskschedulerimpl.scalaoverride def start () {   //start sparkdeployschedulerbackend  backend.start ()    if (! IsLocal && Conf.getboolean ("Spark.speculation", False)) {      Loginfo ("Starting Speculative execution thread" )      Speculationscheduler.scheduleatfixedrate (new Runnable {        override def run (): Unit = Utils.tryorstopsparkcontext (SC) {          checkspeculatabletasks ()        }      }, Speculation_interval_ms, Speculation_ Interval_ms, Timeunit.milliseconds)    }  }


6. Start Executor
Override Def start () {Super.start () Launcherbackend.connect ()//The endpoint for executors to talk to us Val Driverurl = Rpcenv.uriof (Sparkenv.driveractorsystemname, Rpcaddress (Sc.conf.get ("Spark.driver.host"), Sc.conf.get ( "Spark.driver.port"). ToInt), coarsegrainedschedulerbackend.endpoint_name) val args = Seq ("--driver-url", dri Verurl, "--executor-id", "{{executor_id}}", "--hostname", "{{hostname}}", "--cores", "{{cores}}", "--a Pp-id "," {{app_id}} ","--worker-url "," {{Worker_url}} ") Val extrajavaopts = sc.conf.getOption (" spark.executor.extr Ajavaoptions "). Map (utils.splitcommandstring). Getorelse (seq.empty) Val classpathentries = Sc.conf.getOption (" Spark  . Executor.extraclasspath "). Map (_.split (java.io.File.pathSeparator). toseq). Getorelse (Nil) Val librarypathentries =  Sc.conf.getOption ("Spark.executor.extraLibraryPath")//Start executors with a few necessary configs for registering with The scheduler Val SparkJavaopts = utils.sparkjavaopts (conf, sparkconf.isexecutorstartupconf) Val javaopts = sparkjavaopts + + extrajavaopts//definition The process name of the assigned resource val command = command ("Org.apache.spark.executor.CoarseGrainedExecutorBackend", args, Sc.executorenvs, C Lasspathentries + + Testingclasspath, librarypathentries, javaopts). Map (_.split (java.io.File.pathSeparator). toseq).    Getorelse (Nil)//Omit part of the code, see the previous sections for details, executor the registration process. Client = new Appclient (SC.ENV.RPCENV, Masters, Appdesc, this, conf)//Register Application Client.start () launcherbackend.setstate ( SparkAppHandle.State.SUBMITTED) waitforregistration () launcherbackend.setstate (SparkAppHandle.State.RUNNING)}
The application is eventually registered in the Client#start method.

Second, summaryCall Createtaskscheduler when Sparkcontext is instantiated to create Taskschedulerimpl and Sparkdeployschedulerbackend, The Taskschedulerimpl#start method is called when the Sparkcontext is instantiated, and is called in the method Sparkdeployschedulerbackend#start In this start method, the Appclient object is created and the Appclient#start method is called, and Clientendpoint is created. The name of the Ingress class that is passed in when creating Clientendpoint to specify the executor process that is specifically launched for the current application is Coarsegrainedexecutorbackend, Then Clientendpoint starts and registers the current application with Tryregistermaster to master, and master receives the registration information, if it can run the program, The program generates JOBID and allocates compute resources through schedule, which is determined by configuration information such as application run, Memory, core, and finally master sends instructions to the worker When allocating compute resources for the current application in a worker, the first allocation Executorrunner,executorrunner internally constructs the Processbuilder to start another JVM process by thread. The name of the class in which the main method is loaded when the JVM process starts is to create the entry class Coarsegrainedexecutorbackend that the Clientendpoint incoming command specifies. The main method is loaded and called when the JVM obtains coarsegrainedexecutorbackend when it is booted through Processbuilder. In the main method, the Coarsegrainedexecutorbackend itself is instantiated as the message loop body, When instantiated, it sends Registerexecutor to Driverendpoint via callback OnStart to register the current coarsegrainedexecutorbackend, At this point Driverendpoint receives the registration information and saves it in the Sparkdeployscheduler instance's memory data structure, so driver gets the compute resources!


Task runs each phase of the interaction process





Figure 35-6 Resource allocation process





Note: For FIFO, fair scheduling algorithm analysis part of the reference Zhang An Station--spark Technology Insider Book


Description

This article is a note from the 35th course of the IFM course at DT Big Data DreamWorks



DT Big Data Dream Factory 35th Class spark system run cycle flow

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.