1. Foreword
Scheduler is the core component of Mesos, which is mainly responsible for allocating slave resources to each framework, and the common scheduling mechanism is Fifo,fair scheduler,capacity Scheduler,quincy,condor. Mesos in order to support multi-framework access, a two-tier scheduling mechanism is used, first, the resources are allocated to the framework by the allocator in Mesos, and then the resources are assigned to tasks by the framework's own scheduler. This paper focuses on the allocator module in Mesos.
(What is Apache Mesos?) Reference: "Unified resource management and scheduling platform (System) Introduction", this article analysis based on Mesos SVN Revision 1327410)
2. Mesos scheduling mechanism
The scheduling mechanism in Mesos is called "Resource offer", and the scheduling mechanism based on resource quantity is adopted, which is different from the slot based mechanism in Hadoop. In Mesos, slave reports the amount of resources (CPU and memory) directly to master, which assigns the amount of resources to the framework according to some mechanism, where "some mechanism" is "dominant Resource fairness (DRF)"
For a system similar to mesos with a double-layer scheduling framework, the following questions need to be addressed when designing: "How can mesos meet their needs without knowing the needs of each framework resource?" "And, more specifically," How do you mesos data locality when you don't know what data is stored in the framework? To address this problem, Mesos offers a "reject offer" mechanism that allows the framework to temporarily reject slave that do not meet its resource needs, where Mesos uses a "delay scheduling" scheduling mechanism similar to Hadoop.
In Mesos, job scheduling is a distributed process, and when failure occurs, it needs to show some efficiency and robustness. To this end, Mesos provides the following mechanisms:
(1) Filters mechanism. Each scheduling process requires Mesos-master to communicate with Framework-scheduler, and if some frameworks always reject slave, the scheduling performance is inefficient due to additional communication overhead. To avoid unnecessary communication, Mesos provides a filters mechanism that allows the framework to receive only "Slave with excess resources greater than L" or "receive only slave in the node list."
(2) rescinds mechanism. If a framework does not return the corresponding task for the assigned resource within a certain amount of time, Mesos reclaims its resources and assigns those resources to other frameworks.
3. Dominant Resource fairness (DRF)
DRF is a max-min fair resource allocation mechanism that supports multiple resources, where Max represents Max{cpu,mem} and Min represents Min{user1,user2,...} =MIN{MAX{CPU1,MEM1}, Max{cpu2,mem2}, ...}, where user represents the framework in Mesos, the algorithm pseudocode is shown in the following illustration:
For example, assuming the system has 9 CPUs and GB RAM, two user (framework) run two tasks, respectively, the amount of resources required for the <1 CPU, 4 gb> and <3 CPUs, 1 gb>. For user A, each task consumes 1/9 of the total CPU and 2/9 of the total memory, so the dominant resource for a is memory, and for User B, each task consumes 1/18 of the total CPU 1/3 and total memory, and B's dominant resource is CPU. The DRF will balance the dominant resources of all users, that is, the amount of resources a obtains is: <3 cpus,12 gb>, which can run 3 tasks, while B gets the amount of <6 CPUs, 2gb>, and runs 2 tasks, thus assigning, Each user acquires the same proportion of the dominant resource, namely: A acquires 2/3 of the Rams,b and obtains 2/3 of the CPUs.
A possible scheduling sequence for the DRF algorithm is shown in the following illustration:
The benefits of DRF are four features that can be met: Sharing incentive,strategy-proofness,envy-freeness and Pareto efficiency, referring to the "references" given later.
4. Mesos scheduling problem
The DRF scheduling algorithm in Mesos is excessively fair and does not take into account the actual application requirements. In the actual production line, often need similar to Hadoop capacity scheduler scheduling mechanism, all the resources into a number of queue, each queue to allocate a certain amount of resources, each user has a certain resources to use the upper limit; more scheduling strategies to use Is that each queue should be supported to customize its own scheduler policies, such as: fifo,priority, etc.
Because the Mesos adopts the double layer scheduling mechanism, in the actual scheduling, will face the design decision problem: The first and second tier scheduler implemented which scheduling mechanisms, namely: most of the scheduling mechanism to the first level scheduler, or the first level scheduler only support a simple resource allocation (the allocation ratio is assigned by the administrator)?
Mesos employs the resource offer mechanism (unlike the slot-based scheduling mechanism in Hadoop), which faces a resource fragmentation problem where the resources on each node are not fully allocated and the rest may not be sufficient for any task to run, Creates a memory fragmentation problem similar to the operating system.
you might also like: 1 Apache mesos overall architecture 2 Apache Mesos underlying base 3 Apache Mesos Module Communication Architecture 4 uncover the Distributed cloud computing framework you don't know