Ideally, our requests for yarn resources should be met immediately, but the real-world resources are often limited, especially in a very busy cluster, where a request for an application resource often needs to wait for some time to get to the appropriate resources. In yarn, the scheduler is responsible for allocating resources to the application. In fact, scheduling itself is a problem, it is difficult to find a perfect strategy to solve all the application scenarios. To do this, yarn offers a variety of schedulers and configurable strategies for our choice.One, the choice of the scheduler
There are three types of scheduler in yarn that can be selected:
FIFO Scheduler ,
Capacity Scheduler ,
FairS cheduler .
FIFO SchedulerPut the application in the order of submission in a queue, this is a first-out queue, in the allocation of resources, the first in the queue to allocate resources to the top of the application, to the top of the application needs to meet the next allocation, and so on.
FIFO SchedulerIs the simplest and easiest to understand scheduler and does not require any configuration, but it does not apply to shared clusters. Large applications can consume all cluster resources, which causes other applications to be blocked. In a shared cluster, it is more appropriate
Capacity Scheduler to adopt or, both of
Fair Scheduler these schedulers allow large tasks and small tasks to be submitted with some system resources at the same time.
The following "Yarn Scheduler comparison Diagram" shows the differences between these schedulers, it can be seen that in the FIFO scheduler, small tasks will be blocked by large tasks.
For the capacity scheduler, there is a dedicated queue for running small tasks, but setting up a queue specifically for small tasks can pre-occupy a certain cluster of resources, which results in the execution time of large tasks lagging behind the time when the FIFO scheduler is used.
In the Fair scheduler, we do not need to pre-occupy a certain system resources, the fair scheduler will dynamically adjust the system resources for all the running job. As shown, when the first big job commits, only this one job is running, at which point it gets all the cluster resources, and when the second small task commits, the Fair Scheduler allocates half the resources to this small task, allowing the two tasks to share the cluster resources fairly.
It is important to note that in the Fair scheduler, there is a delay in submitting from the second task to getting the resource because it waits for the first task to release the container that is occupied. When a small task executes, it frees up the resources it occupies, and the big task gains all the system resources. The final effect is that the fair scheduler is getting high resource utilization and can ensure that small tasks are completed in time.
Yarn Scheduler Comparison chart:
The capacity scheduler allows multiple organizations to share the entire cluster, and each organization can gain a portion of the compute power of the cluster. By assigning a dedicated queue to each organization, and then assigning a certain cluster resource to each queue, the entire cluster can serve multiple organizations by setting up multiple queues. In addition, the queue can be divided vertically, so that multiple members within an organization can share this queue resource, within a queue, the resource is scheduled to use a first-in-A-out (FIFO) policy.
From the diagram above, we already know that a job may not be able to use the entire queue of resources. If, however, there are multiple jobs running in this queue, if the queue has enough resources, then it is allocated to those jobs, if the queue has insufficient resources. In fact, the capacity scheduler may still allocate additional resources to this queue, which is the concept of "resilient queue" (queue elasticity) .
In normal operation, the capacity scheduler does not force the release of container, and when a queue resource is insufficient, the queue can only obtain container resources that are freed by other queues. Of course, we can set a maximum resource usage for the queue so that this queue consumes too much free resources, which makes it impossible for other queues to use these idle resources, which is where the flex queue needs to weigh.2.2 Configuration of container scheduling
Suppose we have a queue of the following levels:
root├── prod└── dev ├── eng └── science
Here is a simple configuration file for the capacity scheduler with the file name
capacity-scheduler.xml . In this configuration, two sub-queues are defined under the root queue
prod , and
dev 40% and 60% of the capacity are respectively accounted for. It is important to note that the configuration of a queue is specified by attributes
yarn.sheduler.capacity.<queue-path>.<sub-property> , which
<queue-path> represent the queue's inheritance tree, such as
<sub-property> General refers to,
As we can see, the
dev queue is again divided into
science two sub-queues of the same capacity.
maximum-capacity property is set to 75%, so even if the
prod queue is completely idle
dev it will not take up all of the cluster resources, that is, the
prod queue still has 25% of available resources to use for emergencies. We note that
science there is no property set for the two queues
maximum-capacity , which means that
science the job in the queue may use
dev all the resources of the entire queue (up to 75% of the cluster). Similarly, because the
prod maximum-capacity property is not set, it is likely to occupy all the resources of the cluster.
Capacity Containers in addition to the ability to configure queues and their capacity, we can also configure the maximum number of resources a user or app can allocate, how many apps can run concurrently, ACL authentication for queues, and so on.2.3 Setting of the queue
About the settings of the queue, depending on our specific application. For example, in MapReduce, we can
mapreduce.job.queuename specify the queue to use through attributes. If the queue does not exist, we will receive an error when we submit the task. If we don't define any queues, all apps will be placed in a
Note: For the capacity scheduler, our queue name must be the last part of the queue tree and will not be recognized if we use the queue tree. For example, in the above configuration, we use
eng as the queue name is possible, but if we use
dev.eng are invalid.
The goal of the Fair Scheduler is to allocate fair resources to all applications (the definition of fairness can be set by parameters). The Yarn Scheduler comparison chart above shows the fair scheduling of two applications in a queue; Of course, fair scheduling can also work across multiple queues. For example, suppose there are two users A and B, each with a queue. When a starts a job and B does not have a task, a gets all of the cluster resources; When B starts a job, A's job will continue to run, but after a while the two tasks will each get half the cluster resources. If at this point B starts the second job and the other job is running, it will share B's resources with the first job of B, that is, the two jobs of B will be used for One-fourth of the cluster resources, and A's job is still used for half of the resources in the cluster. As a result, resources are ultimately shared equally among two users. The process is as follows:
The use of the scheduler is
yarn-site.xml configured through the parameters in the configuration file
yarn.resourcemanager.scheduler.class , and the capacity Scheduler Scheduler is used by default. If we are going to use the Fair scheduler, we need to configure the fully qualified name of the Fairscheduler class on this parameter:
The configuration file for the Fair Scheduler is located in a file under the Classpath
fair-scheduler.xml , which can be
yarn.scheduler.fair.allocation.file modified by properties. Without this configuration file, the Fair scheduler uses an allocation strategy similar to the one described in section 3.1: The scheduler automatically creates a queue for users when they submit the first app, the name of the queue is the user name, and all the apps are assigned to the appropriate user queue.
We can configure each queue in the configuration file, and we can configure the queue hierarchically like the capacity scheduler. For example, refer
capacity-scheduler.xml to configuring Fair-scheduler:
The hierarchy of queues is implemented through nested
<queue> elements. All queues are children of the
root queue, even if we are not
<root> in the element. In this configuration, we
dev have divided the queue into
science two queues.
The queue in the Fair scheduler has a weighting attribute (which is the definition of fairness), and this attribute is used as a basis for fair dispatch. In this example, when the scheduler allocates cluster resources to the sum, it is
dev treated as fair,
eng and the
science queue does not have a weight defined, it is evenly distributed. The weight here is not a percentage, we replace the above 40 and 60 respectively with 2 and 3, the effect is the same. Note that for queues that are automatically created by users when there are no profiles, they still have weights and a weight value of 1.
There can still be different scheduling policies within each queue. The default scheduling policy for queues can be configured with top-level elements
<defaultQueueSchedulingPolicy> and, if not configured, by default with Fair dispatch.
Although it is a fair scheduler, it still supports FIFO scheduling at the queue level. The scheduling policy for each queue can be overridden by its internal
<schedulingPolicy> elements, in which case the
prod queue is specified to be scheduled in FIFO, so the tasks submitted to the
prod queue can be executed in the order of FIFO rules. It is important to note that
dev the scheduling between and is still fairly scheduled, equally
science also fair.
Although not shown in the above configuration, each queue can still be configured with the maximum, minimum resource consumption, and maximum number of applications that can be run.3.4 Setting of the queue
The Fair Scheduler employs a set of rule-based systems to determine which queue the application should be placed in. In the example above, the
<queuePlacementPolicy> element defines a list of rules in which each rule is tried until the match succeeds. For example, in the first rule above, the
specified application is placed in the queue it specifies, and if the application does not specify a queue name or the queue name does not exist, it does not match the rule and then tries the next rule. The
primaryGroup rule attempts to place the application in a queue named after the User's UNIX group name , without which the queue is not created and the next rule is tried instead. When all rules in the current polygon are not met, the rule is triggered
default and the application is placed in the
Of course, we can not configure
queuePlacementPolicy the rules, the scheduler defaults to the following rules:
<queuePlacementPolicy><rule name="specified" /><rule name="user" /></queuePlacementPolicy>
The above rule can be attributed to a sentence, unless the queue is precisely defined, otherwise the queue name is created as a user name.
There is also a simple configuration policy that allows all applications to be placed in the same queue (default) so that all applications can share the cluster equally, rather than between users. This configuration is defined as follows:
<queuePlacementPolicy><rule name="default" /></queuePlacementPolicy>
To implement the above features we can also set it without using the configuration file, so that the
yarn.scheduler.fair.user-as-default-queue=false application will be placed in the default queue instead of the individual user name queue. In addition, we can set up
yarn.scheduler.fair.allow-undeclared-pools=false so that users cannot create queues.
When a job is submitted to an empty queue in a busy cluster, the job does not execute immediately, but blocks until the running job releases system resources. To make the execution time of the commit job more predictable (you can set the waiting time-out), the Fair Scheduler supports preemption.
Preemption is to allow the scheduler to kill containers that occupy more than its share of the resource queue, and these containers resources can be allocated to the queues that should have access to those share resources. It is important to note that preemption reduces the execution efficiency of the cluster because the terminated containers needs to be re-executed.
The preemption feature can be enabled by setting a global parameter
yarn.scheduler.fair.preemption=true . In addition, there are two parameters to control the preemption expiration time (these two parameters are not configured by default and require at least one configuration to allow preemption of container):
- minimum share preemption timeout- fair share preemption timeout
If the queue
minimum share preemption timeout does not receive minimal resource protection for a specified amount of time, the scheduler will preempt containers. We can configure this time-out for all queues through the top-level elements in the configuration file
<defaultMinSharePreemptionTimeout> ; we can also
<minSharePreemptionTimeout> elements within an element to specify a time-out for a queue.
Similarly, if the queue
fair share preemption timeout does not get half of the equal resources in the specified time (this ratio can be configured), the scheduler will preempt containers. This timeout can be configured by the top-level element
<defaultFairSharePreemptionTimeout> and element-level elements to
<fairSharePreemptionTimeout> configure the time-out for all queues and a queue individually. The ratios mentioned above can be
<defaultFairSharePreemptionThreshold> configured by (Configuring all Queues) and
<fairSharePreemptionThreshold> (Configuring a queue), which defaults to 0.5.
Other articles related to yarn:
Hadoop Yarn Detailed
Yarn memory allocation management mechanism and related parameter configuration
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Yarn Scheduler Scheduler Detailed