A Free Trial That Lets You Build Big!
Start building with 50+ products and up to 12 months usage for Elastic Compute Service
An important feature of Apache Mesos's ability to become the best Data Center Resource Manager is the ability to provide the same kind of grooming as a traffic policeman in the face of various types of applications. This paper will delve into the internal resource allocation of Mesos, and discuss how mesos can balance fair resources sharing according to customer application needs. Before starting, if the reader has not yet read the series of pre-order articles, it is recommended to read them first. The first is an overview of Mesos, the second is a description of level two architecture, and the third is data storage and fault tolerance.
We'll explore Mesos's resource allocation module to see how it determines what kind of resource solicitation to send to specific frameworks, and how to reclaim resources if necessary. Let's take a look at the Mesos task scheduling process:
From the previous description of the two-level architecture, we know that the dispatch of the Mesos Master Agent task first collects information about the available resources from the slave node, and then, in the form of resource solicitation, provides these resources to the framework on which they are registered.
The framework can choose to accept or reject resource invitations, depending on whether the task is compliant with resource constraints. Once the resource solicitation is accepted, the framework will collaborate with Master to dispatch the task and run the task on the corresponding slave node in the datacenter.
The decision to make a resource solicitation is implemented by the resource allocation module, which exists in master. The resource allocation module determines the order in which the framework accepts resource invitations and, at the same time, ensures that resources are shared equitably between inherently greedy frameworks. In homogeneous environments, such as Hadoop clusters, one of the most fair share allocation algorithms used is the maximum minimum fairness algorithm (Max-min fairness). The maximum minimum fairness algorithm maximizes the minimum resource allocation and provides it to the user to ensure that each user has access to a fair share of the resources needed to meet their needs; A simple example can explain how it works, please refer to Example 1 of the maximum minimum fair share algorithm page. As mentioned earlier, this usually works well in homogeneous environments. Resource requirements in homogeneous environments are virtually non-volatile, and the types of resources involved include CPU, memory, network bandwidth, and I/O. However, resource allocation is more difficult when scheduling resources across data centers and heterogeneous resource requirements. For example, when user A's task requires 1 cores of CPU, 4GB of memory, and each task of User B requires 3 cores CPU, 1GB memory, how to provide the appropriate fair share allocation policy? When user A's task is memory intensive, and User B's task is CPU intensive, how can it be equitably allocated a package of resources?
Because Mesos is dedicated to managing resources in heterogeneous environments, it implements a pluggable resource allocation module architecture that gives users the most appropriate allocation policies and algorithms for specific deployments. For example, the user can implement the weighted maximum minimum fairness algorithm, allowing the specified framework to obtain more resources than the other framework. By default, Mesos includes a strict priority resource allocation module and an improved fair Share resource allocation module. The algorithm implemented by the strict priority module gives priority to the framework, so that it always receives and accepts resource invitations sufficient to meet its task requirements. This ensures that critical applications limit the overhead of dynamic resource shares in Mesos, but can potentially starve other frameworks.
For these reasons, most users use DRF (dominant resource fairness algorithm dominant Resource fairness) by default, which is an improved fair share algorithm that is more suitable for heterogeneous environments in Mesos.
DRF and Mesos are derived from the Berkeley Amplab team and are encoded as the default resource allocation policy for Mesos.
Readers can read DRF's original paper from here and here. In this article, I'll summarize some of the key points and provide some examples, which I believe will explain DRF more clearly. Let's start the secret journey.
The goal of DRF is to ensure that each user, the framework in Mesos, receives a fair share of its most needed resources in heterogeneous environments. To master DRF, we need to understand the concepts of dominant resources (dominant resource) and dominant share (dominant share). The framework's dominant resource is its most needed resource type (CPU, memory, and so on), presented as a percentage of available resources in the Resource solicitation. For example, for a computationally intensive task, its framework's dominant resource is the CPU, and depending on the task that is computed in memory, its framework's dominant resource is memory. Because resources are assigned to the framework, DRF tracks the percentage share of the resource types owned by each framework, and the highest percentage of all resource type shares owned by the framework is the dominant share of the framework. The DRF algorithm calculates the dominant share using all the registered frameworks to ensure that each framework receives a fair share of its dominant resources.
Is the concept too abstract? Let's use an example to illustrate. Suppose we have a resource solicitation that contains 9 cores of CPU and 18GB of memory. Framework 1 running tasks required (1 cores CPU, 4GB memory), framework 2 running tasks required (3 cores CPU, 1GB memory)
Each task of the framework 1 consumes 1/9 of the total CPU and 2/9 of the total memory, so the dominant resource for the framework 1 is memory. Similarly, each task of the framework 2 will have 1/3 of the total CPU and 1/18 of the total memory, so the dominant resource of the framework 2 is the CPU. DRF will try to provide each framework with an equal amount of dominant resources, as their dominant share. In this example, DRF will work with the framework to allocate the following: The framework 1 has three tasks, the total allocation is (3 cores CPU, 12GB memory), the framework 2 has two tasks, the total allocation is (6 cores CPU, 2GB memory).
At this point, each framework's dominant resource (the framework 1 of the memory and the framework 2 of the CPU) eventually gets the same dominant share (2/3 or 67%), so that after providing the two framework, there will not be enough resources available to run other tasks. It is important to note that if only two of the tasks in the framework 1 need to be run, then the framework 2 and the other registered framework will receive all the remaining resources.
So, how does DRF calculate and produce the above results? As mentioned earlier, the DRF allocation module tracks the resources allocated to each framework and the dominant share of each framework. Each time, DRF is sent to the framework as a resource solicitation with the lowest dominant share in all tasks running in the framework. If there is enough resources available to run its task, the framework will accept the offer. Through the examples in the DRF paper quoted earlier, we are going through each step of the DRF algorithm. For the sake of simplicity, the example does not consider the factor that the resource is freed back to the resource pool after the short task is completed, and we assume that each framework will have an unlimited number of tasks to run and that each resource solicitation will be accepted.
Recalling the above example, let's say that there is a resource solicitation that contains 9 cores of CPU and 18GB of memory. The framework 1 runs the task required (1 cores CPU, 4GB memory), and the framework 2 runs the task required (3 cores CPU, 2GB memory). The task of the Framework 1 consumes 1/9 of the total CPU, and the total memory of 2/9,framework 1 is the dominant resource of memory. Similarly, each task of the Framework 2 will have a CPU total of 1/3, and the total memory of 1/18,framework 2 is the dominant resource.
Each row in the table above provides the following information:
Note that the lowest dominant share in each row is displayed in bold to find.
Initially, the dominant share of the two frameworks was 0%, and we assumed that DRF first chose the Framework 2, and of course we can assume the framework 1, but the end result is the same.
It is important to note that you can create a resource allocation module that uses weighted DRF to favor a framework or a set of frameworks. As mentioned earlier, you can also create custom modules to provide organization-specific allocation policies.
In general, most tasks are now short-lived, and Mesos can wait for tasks to complete and reallocate resources. However, it is also possible to run long-running tasks on the cluster, which are used to handle suspended jobs or poorly behaved frameworks.
It is important to note that the resource allocation module has the ability to revoke a task when the resource is not released quickly enough. Mesos tried to undo the task: Send a request to the executor to end the specified task, and give a grace period for the executor to clean up the task. If the executor does not respond to the request, the allocation module ends the executor and all the tasks on it.
An allocation policy can be implemented to prevent revocation of a specified task by providing a guaranteed configuration related to the framework. If the framework is lower than the guaranteed configuration, Mesos will not be able to end the task for that framework.
We also need to learn more about Mesos resource allocation, but I will be stopped. Next, I'm going to say something different about the Mesos community. I believe this is an important topic to consider, because open source not only includes technology, but also community.
Speaking of the community, I will write some tutorials on the installation of Mesos and the creation and use of the framework. After a practical teaching article, I will come back to do some more in-depth topics such as how the framework interacts with master, how Mesos works across multiple data centers, and so on.
As always, I encourage the reader to provide feedback, especially if you find something wrong in the place where I am marking, please give me feedback. I am not all-knowing, humbly consulted, so very much look forward to the reader's correction and inspiration. We can also communicate on Twitter, please pay attention to @hui_kenneth.
Resource allocation for Mesos
Start building with 50+ products and up to 12 months usage for Elastic Compute Service