Ideally, our requests for yarn resources should be met immediately, but the real-world resources are often limited, especially in a very busy cluster, where a request for an application resource often needs to wait for some time to get to the appropriate resources. In yarn, the scheduler is responsible for allocating resources to the application. In fact, scheduling itself is a problem, it is difficult to find a perfect strategy to solve all the application scenarios. To do this, yarn offers a variety of schedulers and configurable strategies for our choice.
One, the choice of the scheduler
There are three types of scheduler in yarn that can be selected: FIFO Scheduler
, Capacity Scheduler
, FairS cheduler
.
FIFO Scheduler
Put the application in the order of submission in a queue, this is a first-out queue, in the allocation of resources, the first in the queue to allocate resources to the top of the application, to the top of the application needs to meet the next allocation, and so on.
FIFO Scheduler
Is the simplest and easiest to understand scheduler and does not require any configuration, but it does not apply to shared clusters. Large applications can consume all cluster resources, which causes other applications to be blocked. In a shared cluster, it is more appropriate Capacity Scheduler
to adopt or, both of Fair Scheduler
these schedulers allow large tasks and small tasks to be submitted with some system resources at the same time.
The following "Yarn Scheduler comparison Diagram" shows the differences between these schedulers, it can be seen that in the FIFO scheduler, small tasks will be blocked by large tasks.
For the capacity scheduler, there is a dedicated queue for running small tasks, but setting up a queue specifically for small tasks can pre-occupy a certain cluster of resources, which results in the execution time of large tasks lagging behind the time when the FIFO scheduler is used.
In the Fair scheduler, we do not need to pre-occupy a certain system resources, the fair scheduler will dynamically adjust the system resources for all the running job. As shown, when the first big job commits, only this one job is running, at which point it gets all the cluster resources, and when the second small task commits, the Fair Scheduler allocates half the resources to this small task, allowing the two tasks to share the cluster resources fairly.
It is important to note that in the Fair scheduler, there is a delay in submitting from the second task to getting the resource because it waits for the first task to release the container that is occupied. When a small task executes, it frees up the resources it occupies, and the big task gains all the system resources. The final effect is that the fair scheduler is getting high resource utilization and can ensure that small tasks are completed in time.
Yarn Scheduler Comparison chart:
II. Configuration of capacity Scheduler (container Scheduler) 2.1 Introduction to Container scheduling
The capacity scheduler allows multiple organizations to share the entire cluster, and each organization can gain a portion of the compute power of the cluster. By assigning a dedicated queue to each organization, and then assigning a certain cluster resource to each queue, the entire cluster can serve multiple organizations by setting up multiple queues. In addition, the queue can be divided vertically, so that multiple members within an organization can share this queue resource, within a queue, the resource is scheduled to use a first-in-A-out (FIFO) policy.
From the diagram above, we already know that a job may not be able to use the entire queue of resources. If, however, there are multiple jobs running in this queue, if the queue has enough resources, then it is allocated to those jobs, if the queue has insufficient resources. In fact, the capacity scheduler may still allocate additional resources to this queue, which is the concept of "resilient queue" (queue elasticity) .
In normal operation, the capacity scheduler does not force the release of container, and when a queue resource is insufficient, the queue can only obtain container resources that are freed by other queues. Of course, we can set a maximum resource usage for the queue so that this queue consumes too much free resources, which makes it impossible for other queues to use these idle resources, which is where the flex queue needs to weigh.
2.2 Configuration of container scheduling
Suppose we have a queue of the following levels:
root├── prod└── dev ├── eng └── science
Here is a simple configuration file for the capacity scheduler with the file name capacity-scheduler.xml
. In this configuration, two sub-queues are defined under the root queue prod
, and dev
40% and 60% of the capacity are respectively accounted for. It is important to note that the configuration of a queue is specified by attributes yarn.sheduler.capacity.<queue-path>.<sub-property>
, which <queue-path>
represent the queue's inheritance tree, such as root.prod
queues, <sub-property>
General refers to, capacity
and maximum-capacity
.
As we can see, the dev
queue is again divided into eng
science
two sub-queues of the same capacity. dev
maximum-capacity
property is set to 75%, so even if the prod
queue is completely idle dev
it will not take up all of the cluster resources, that is, the prod
queue still has 25% of available resources to use for emergencies. We note that eng
science
there is no property set for the two queues maximum-capacity
, which means that eng
science
the job in the queue may use dev
all the resources of the entire queue (up to 75% of the cluster). Similarly, because the prod
maximum-capacity property is not set, it is likely to occupy all the resources of the cluster.
Capacity Containers in addition to the ability to configure queues and their capacity, we can also configure the maximum number of resources a user or app can allocate, how many apps can run concurrently, ACL authentication for queues, and so on.
2.3 Setting of the queue
About the settings of the queue, depending on our specific application. For example, in MapReduce, we can mapreduce.job.queuename
specify the queue to use through attributes. If the queue does not exist, we will receive an error when we submit the task. If we don't define any queues, all apps will be placed in a default
queue.
Note: For the capacity scheduler, our queue name must be the last part of the queue tree and will not be recognized if we use the queue tree. For example, in the above configuration, we use prod
and eng
as the queue name is possible, but if we use root.dev.eng
or dev.eng
are invalid.
Third, the Fair Scheduler (Fair Dispatcher) configuration 3.1 Fair Dispatch
The goal of the Fair Scheduler is to allocate fair resources to all applications (the definition of fairness can be set by parameters). The Yarn Scheduler comparison chart above shows the fair scheduling of two applications in a queue; Of course, fair scheduling can also work across multiple queues. For example, suppose there are two users A and B, each with a queue. When a starts a job and B does not have a task, a gets all of the cluster resources; When B starts a job, A's job will continue to run, but after a while the two tasks will each get half the cluster resources. If at this point B starts the second job and the other job is running, it will share B's resources with the first job of B, that is, the two jobs of B will be used for One-fourth of the cluster resources, and A's job is still used for half of the resources in the cluster. As a result, resources are ultimately shared equally among two users. The process is as follows:
3.2 Enable Fair Scheduler
The use of the scheduler is yarn-site.xml
configured through the parameters in the configuration file yarn.resourcemanager.scheduler.class
, and the capacity Scheduler Scheduler is used by default. If we are going to use the Fair scheduler, we need to configure the fully qualified name of the Fairscheduler class on this parameter: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
.
3.3 Configuration of the queue
The configuration file for the Fair Scheduler is located in a file under the Classpath fair-scheduler.xml
, which can be yarn.scheduler.fair.allocation.file
modified by properties. Without this configuration file, the Fair scheduler uses an allocation strategy similar to the one described in section 3.1: The scheduler automatically creates a queue for users when they submit the first app, the name of the queue is the user name, and all the apps are assigned to the appropriate user queue.
We can configure each queue in the configuration file, and we can configure the queue hierarchically like the capacity scheduler. For example, refer capacity-scheduler.xml
to configuring Fair-scheduler:
The hierarchy of queues is implemented through nested <queue>
elements. All queues are children of the root
queue, even if we are not <root>
in the element. In this configuration, we dev
have divided the queue into eng
science
two queues.
The queue in the Fair scheduler has a weighting attribute (which is the definition of fairness), and this attribute is used as a basis for fair dispatch. In this example, when the scheduler allocates cluster resources to the sum, it is 40:60
prod
dev
treated as fair, eng
and the science
queue does not have a weight defined, it is evenly distributed. The weight here is not a percentage, we replace the above 40 and 60 respectively with 2 and 3, the effect is the same. Note that for queues that are automatically created by users when there are no profiles, they still have weights and a weight value of 1.
There can still be different scheduling policies within each queue. The default scheduling policy for queues can be configured with top-level elements <defaultQueueSchedulingPolicy>
and, if not configured, by default with Fair dispatch.
Although it is a fair scheduler, it still supports FIFO scheduling at the queue level. The scheduling policy for each queue can be overridden by its internal <schedulingPolicy>
elements, in which case the prod
queue is specified to be scheduled in FIFO, so the tasks submitted to the prod
queue can be executed in the order of FIFO rules. It is important to note that prod
dev
the scheduling between and is still fairly scheduled, equally eng
and science
also fair.
Although not shown in the above configuration, each queue can still be configured with the maximum, minimum resource consumption, and maximum number of applications that can be run.
3.4 Setting of the queue
The Fair Scheduler employs a set of rule-based systems to determine which queue the application should be placed in. In the example above, the <queuePlacementPolicy>
element defines a list of rules in which each rule is tried until the match succeeds. For example, in the first rule above, the specified
application is placed in the queue it specifies, and if the application does not specify a queue name or the queue name does not exist, it does not match the rule and then tries the next rule. The primaryGroup
rule attempts to place the application in a queue named after the User's UNIX group name , without which the queue is not created and the next rule is tried instead. When all rules in the current polygon are not met, the rule is triggered default
and the application is placed in the dev.eng
queue.
Of course, we can not configure queuePlacementPolicy
the rules, the scheduler defaults to the following rules:
<queuePlacementPolicy><rule name="specified" /><rule name="user" /></queuePlacementPolicy>
The above rule can be attributed to a sentence, unless the queue is precisely defined, otherwise the queue name is created as a user name.
There is also a simple configuration policy that allows all applications to be placed in the same queue (default) so that all applications can share the cluster equally, rather than between users. This configuration is defined as follows:
<queuePlacementPolicy><rule name="default" /></queuePlacementPolicy>
To implement the above features we can also set it without using the configuration file, so that the yarn.scheduler.fair.user-as-default-queue=false
application will be placed in the default queue instead of the individual user name queue. In addition, we can set up yarn.scheduler.fair.allow-undeclared-pools=false
so that users cannot create queues.
3.5 preemption (preemption)
When a job is submitted to an empty queue in a busy cluster, the job does not execute immediately, but blocks until the running job releases system resources. To make the execution time of the commit job more predictable (you can set the waiting time-out), the Fair Scheduler supports preemption.
Preemption is to allow the scheduler to kill containers that occupy more than its share of the resource queue, and these containers resources can be allocated to the queues that should have access to those share resources. It is important to note that preemption reduces the execution efficiency of the cluster because the terminated containers needs to be re-executed.
The preemption feature can be enabled by setting a global parameter yarn.scheduler.fair.preemption=true
. In addition, there are two parameters to control the preemption expiration time (these two parameters are not configured by default and require at least one configuration to allow preemption of container):
- minimum share preemption timeout- fair share preemption timeout
If the queue minimum share preemption timeout
does not receive minimal resource protection for a specified amount of time, the scheduler will preempt containers. We can configure this time-out for all queues through the top-level elements in the configuration file <defaultMinSharePreemptionTimeout>
; we can also <queue>
configure <minSharePreemptionTimeout>
elements within an element to specify a time-out for a queue.
Similarly, if the queue fair share preemption timeout
does not get half of the equal resources in the specified time (this ratio can be configured), the scheduler will preempt containers. This timeout can be configured by the top-level element <defaultFairSharePreemptionTimeout>
and element-level elements to <fairSharePreemptionTimeout>
configure the time-out for all queues and a queue individually. The ratios mentioned above can be <defaultFairSharePreemptionThreshold>
configured by (Configuring all Queues) and <fairSharePreemptionThreshold>
(Configuring a queue), which defaults to 0.5.
Other articles related to yarn:
Hadoop Yarn Detailed
Yarn memory allocation management mechanism and related parameter configuration
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Yarn Scheduler Scheduler Detailed