Storm-source code analysis-Scheduler

Last Update:2018-12-05 Source: Internet

Author: User

Tags map vector

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Document directory

Defaultscheduler

First, let's take a look at the definition of the ischeduler interface, which mainly implements two interfaces: Prepare and schedule.

package backtype.storm.scheduler;import java.util.Map;public interface IScheduler {       void prepare(Map conf);        /**     * Set assignments for the topologies which needs scheduling. The new assignments is available      * through <code>cluster.getAssignments()</code>     *     *@param topologies all the topologies in the cluster, some of them need schedule. Topologies object here      *       only contain static information about topologies. Information like assignments, slots are all in     *       the <code>cluster</code>object.     *@param cluster the cluster these topologies are running in. <code>cluster</code> contains everything user     *       need to develop a new scheduling logic. e.g. supervisors information, available slots, current      *       assignments for all the topologies etc. User can set the new assignment for topologies using     *       <code>cluster.setAssignmentById</code>     */    void schedule(Topologies topologies, Cluster cluster);}

Defaultscheduler

Defaultscheduler to implement the backtype. Storm. scheduler. ischeduler Interface

(ns backtype.storm.scheduler.DefaultScheduler    (:gen-class    :implements [backtype.storm.scheduler.IScheduler]))(defn -prepare [this conf]  )(defn -schedule [this ^Topologies topologies ^Cluster cluster]  (default-schedule topologies cluster))

Let's take a look at what default-schedule has done?

(defn default-schedule [^Topologies topologies ^Cluster cluster]  (let [needs-scheduling-topologies (.needsSchedulingTopologies cluster topologies)]    (doseq [^TopologyDetails topology needs-scheduling-topologies            :let [topology-id (.getId topology)                  available-slots (->> (.getAvailableSlots cluster)                                       (map #(vector (.getNodeId %) (.getPort %))))                  all-executors (->> topology                                     .getExecutors                                     (map #(vector (.getStartTask %) (.getEndTask %)))                                     set)                  alive-assigned (EvenScheduler/get-alive-assigned-node+port->executors cluster topology-id)                  alive-executors (->> alive-assigned vals (apply concat) set)                  can-reassign-slots (slots-can-reassign cluster (keys alive-assigned))                  total-slots-to-use (min (.getNumWorkers topology)                                          (+ (count can-reassign-slots) (count available-slots)))                  bad-slots (if (or (> total-slots-to-use (count alive-assigned))                                     (not= alive-executors all-executors))                                (bad-slots alive-assigned (count all-executors) total-slots-to-use)                                [])]]      (.freeSlots cluster bad-slots)      (EvenScheduler/schedule-topologies-evenly (Topologies. {topology-id topology}) cluster))))

1. Retrieve the topologies of scheduling.

Determine whether scheduling is required, or)
The number of workers that have been assigned is smaller than the number of workers configured
Topology has unallocated executors

public boolean needsScheduling(TopologyDetails topology) {    int desiredNumWorkers = topology.getNumWorkers();    int assignedNumWorkers = this.getAssignedNumWorkers(topology);    if (desiredNumWorkers > assignedNumWorkers) {        return true;    }    return this.getUnassignedExecutors(topology).size() > 0;}

2. Find the bad Slots

2.1 read available slots from each supervisordetails of the cluster (I .e. ports not signed by assigned)
Available-slots, ([node1 port2] [node2 port2])

2.2 read all executors from Topology
All-executors, ([1 3] [4 4] [5 7])

2.3 read the executor allocation relationship from schedulerassignmentimpl of the cluster and find the alive executors
Alive-assigned, node + port-> executor, {[node1 port1] [1 3], [node2 port1] [5 7]}
Alive-executors, ([1 3] [5 7])

2.4 find the slots (slots-can-reassign) that can be reassign In the slots of alive-assigned)
Determine whether reassign is available,
Whether the node is in the blacklist of the cluster, and whether the port is in the Allport of the Supervisor (the dead port has been filtered in the Allport)

2.5 total-slots-to-use should be equal to (available-slots + can-reassign-slots)
Of course, the maximum number of slots cannot be greater than the Worker Number of topology.

2.6 locate bad Slots
For example, if there are 7 executors and 3 workers, under normal circumstances, each worker is allocated 2 or 2 + 1 executor.
Here we define integer-divided and return a dictionary, which is quite strange, where key is, base, base + 1

(defn integer-divided [sum num-pieces]  (let [base (int (/ sum num-pieces))        num-inc (mod sum num-pieces)        num-bases (- num-pieces num-inc)]    (if (= num-inc 0)      {base num-bases}      {base num-bases (inc base) num-inc}      )))

Therefore, for each slot in alive-assigned, if the number of executors allocated for this slot is not base or base + 1, it is bad slot.
3 free bad Slots

The so-called free is to delete executors on all bad slots from executortoslot in schedulerassignmentimpl.

    Map<ExecutorDetails, WorkerSlot> executorToSlot;     /**     * Release the slot occupied by this assignment.     * @param slot     */    public void unassignBySlot(WorkerSlot slot) {        List<ExecutorDetails> executors = new ArrayList<ExecutorDetails>();        for (ExecutorDetails executor : this.executorToSlot.keySet()) {            WorkerSlot ws = this.executorToSlot.get(executor);            if (ws.equals(slot)) {                executors.add(executor);            }        }               // remove        for (ExecutorDetails executor : executors) {            this.executorToSlot.remove(executor);        }    }

4 evenscheduler/schedule-topologies-evenly

This function is a typical application of doseq, with two layers of doseq nesting.
The first doseq processing function is still a doseq
The second doseq processing function,. Assign

(defn schedule-topologies-evenly [^Topologies topologies ^Cluster cluster]  (let [needs-scheduling-topologies (.needsSchedulingTopologies cluster topologies)]    (doseq [^TopologyDetails topology needs-scheduling-topologies            :let [topology-id (.getId topology)                  new-assignment (schedule-topology topology cluster)                  node+port->executors (reverse-map new-assignment)]]      (doseq [[node+port executors] node+port->executors              :let [^WorkerSlot slot (WorkerSlot. (first node+port) (last node+port))                    executors (for [[start-task end-task] executors]                                (ExecutorDetails. start-task end-task))]]        (.assign cluster slot topology-id executors)))))

4.1 call schedule-topology

(defn- schedule-topology [^TopologyDetails topology ^Cluster cluster]  (let [topology-id (.getId topology)        available-slots (->> (.getAvailableSlots cluster)                             (map #(vector (.getNodeId %) (.getPort %))))        all-executors (->> topology                          .getExecutors                          (map #(vector (.getStartTask %) (.getEndTask %)))                          set)        alive-assigned (get-alive-assigned-node+port->executors cluster topology-id)        total-slots-to-use (min (.getNumWorkers topology)                                (+ (count available-slots) (count alive-assigned)))        reassign-slots (take (- total-slots-to-use (count alive-assigned))                             (sort-slots available-slots))        reassign-executors (sort (set/difference all-executors (set (apply concat (vals alive-assigned)))))        reassignment (into {}                           (map vector                                reassign-executors                                ;; for some reason it goes into infinite loop without limiting the repeat-seq                                (repeat-seq (count reassign-executors) reassign-slots)))]    (when-not (empty? reassignment)      (log-message "Available slots: " (pr-str available-slots))      )    reassignment))

The previous logic is the same. fp cannot save the computed results as conveniently as Java, so it needs to be re-computed every time.

Reassign-slots,
A. Calculate the number of slots that can be used for assignment. Available-slots is not used directly because of the Worker Number Limit, which may be less than available-slots.
B. Sort-slots: Sort slots by port

(Defn sort-slots [All-slots]; '(["N1" "p1"] ["N1" "p2"] ["N1" "P3"] ["N2" "p1"] ["N3" "p1 "] [" N3 "" p2 "]) (Let [split-up (Vals (group-by first all-slots)] (apply interleave-all split-up ); '(["N1" "p1"] ["N2" "p1"] ["N3" "p1"] ["N1" "p2"] ["N3" "p2 "] [" N1 "" P3 "])
C. Take the first slots (A.) from the list in the descending order. The reason why the slots are sorted by port is that executors can be distributed on different nodes as much as possible.

Reassign-executors, which is not executed by assign

The assignment process is very simple, that is, map the reassign-executors and reassign-slots. The comment explains why the count is added, but it should not be added, theoretically, it will stop when a coll ends, but for some reason it seems that it will not stop.
Repeat-seq is required because executors are often more than slots.

(map vector #{[1,3] [4,4] [5,6]}(repeat-seq 3 '(["n1" "p1"] ["n1" "p2"])))([[4 4] ["n1" "p1"]] [[5 6] ["n1" "p2"]] [[1 3] ["n1" "p1"]])

4.2 encapsulate the new assignment into workerslot and executordetails

4.3 Add the new assignment result to the executortoslot of schedulerassignmentimpl.

    /**     * Assign the slot to executors.     * @param slot     * @param executors     */    public void assign(WorkerSlot slot, Collection<ExecutorDetails> executors) {        for (ExecutorDetails executor : executors) {            this.executorToSlot.put(executor, slot);        }    }

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More