Fourth Chapter three yarn scheduling

Source: Internet
Author: User

In the ideal country, requests sent by yarn applications can be immediately responded to. In the real world, resources are limited, in a

On a busy cluster, an application often needs to wait for some of its request processing to complete. Assigning resources to applications based on predefined guidelines is

YARN Scheduler's work. Scheduling is usually a difficult point, there is no "best" policy, it is yarn Why provide an optional scheduler

And the policy to be provided. Let's take a look at it next.


Scheduler options

Yarn has three kinds of scheduler:fifo,capacity,fair. FIFO Scheduler put the application in a queue, according to his

In the order in which they are submitted (FIFO). The request for the first application in the queue is allocated first, and once its request is satisfied, the

The next application will be processed, and so on.

The advantage of FIFO Scheduler is that it is easy to understand 0 configurations, but not for shared clusters. A large application will use all the

Resources, so all apps must wait. It is best to use capacity scheduler or fair Scheler on a shared cluster. Both of these allow

Jobs that require a long run can be completed in time, while allowing users to run a small query in parallel and get the results back in a reasonable amount of time.

The different points of the three scheduler are shown in Figure 4-3, which shows that in FIFO Scheduler (i), the small job is blocked until the large job

End.

With capacity Scheduler (ii), a dedicated stand-alone queue allows a small job to start immediately after it is submitted,

is at the cost of taking advantage of the overall cluster, because resources need to be reserved for it. This means that large job ends are later than using FIFO.

With Fair Scheduler (iii), it does not need to reserve resources because he automatically balances resources between running jobs. When you start the first

Job, this is the only job that runs, so it gets all the resources in the cluster. When the second job starts, it gets half the cluster

Resources, so each job uses a split-second resource.

Note that there is a delay between the job opening and the resource it gets it because it needs to wait for the resource to be released. Resources are no longer needed when job execution ends

, another job comes back to continue using all the resources of the cluster. The overall effect is the high utilization of the cluster and the timely completion of the small job.

Figure 4-3 compares the underlying operations of these three schedulers. In the second two sections, we explain more advanced configurations for capacity and fair.

Capacity Scheduler Configuration

Capacity Scheduler shares a Hadoop cluster through an organization, each of which allocates a certain amount of cluster resources. Each organization

Create a dedicated queue and use a decimal to configure the cluster capacity used. Queues can be further layered, allowing the organization to have different

Share clusters between user groups. In a queue, the application's schedule uses FIFO.

As we saw in Figure 4-3, a single job cannot use more resources than its queue. However, if the queue

With multiple jobs and free resources, capacity Scheduler may allocate idle resources to jobs in the queue, although this can cause

Queue capacity exceeded. This behavior is known as the queue elasticity (queuing resiliency).

In normal operation, capacity scheduler does not force termination of container to obtain it, so if a queue is not requested

As a result, the capacity is reduced and the request increases, and it can only release resources as other container are processed and the capacity can return to normal levels.

You can alleviate this problem by configuring the maximum capacity of the queue, and after configuring the maximum capacity, the queue does not occupy too many other queue capacity.

This is the price of the queue elasticity, of course, to find a reasonable trade-off value through constant experimentation.

Imagine a queue hierarchy that looks like this:

Root
├──prod
└──dev
├──eng
└──science

Example 4-1 shows a sample of the capacity scheduler configuration file at this level, called Capacity-scheduler.xml. It's in root queue

Two queue,prod and dev are defined, accounting for 40% and 60% capacity respectively. Note that a particular queue is passed yarn.scheduler.capacity.

<queue-path>.<sub-property> to set the configuration Properties,<queue-path> is a hierarchical path separated by dots, such as Root.prod.

Example 4-1.        A Basic configuration file for the capacity Scheduler<?xml version= "1.0"?><configuration> <property> <name>yarn.scheduler.capacity.root.queues</name> <value>prod,dev</value> </pro perty> <property> <name>yarn.scheduler.capacity.root.dev.queues</name> <value> eng,science</value> </property> <property> <name>yarn.scheduler.capacity.root.prod.ca pacity</name> <value>40</value> </property> <property> <name>yarn. scheduler.capacity.root.dev.capacity</name> <value>60</value> </property> <propert Y> <name>yarn.scheduler.capacity.root.dev.maximum-capacity</name> &LT;VALUE&GT;75&LT;/VALUE&G    T        </property> <property> <name>yarn.scheduler.capacity.root.dev.eng.capacity</name> <value>50</value> </property> <property> <name>yarn.scheduler.capacity.root.dev.sci Ence.capacity</name> <value>50</value> </property></configuration>
As you can see, the dev queue is further divided into the Eng and science queue and has the same capacity. Dev has one of the most

Large capacity 75%, so when the prod queue is idle, the dev queue does not occupy all of the cluster resources. In other words, the prod queue

There is always a 25% cluster resource that can be used immediately. Because the other queue does not have the maximum capacity configured, the Eng or science queue

The job in May occupy all of the dev queue's capacity (75% of the cluster), or the Prod queue occupies the entire cluster.

In addition to configuring the queue hierarchy and capacity, there are settings to control the maximum number of resources that a user or app can allocate,

The number of applications that can run at a time, as well as the queue ACL. Please see the reference page for details.


Queue Locator

Specifies which queue the application is placed in, and the different applications have different ways. For example, MapReduce, you set attribute MapReduce.

Job.queuename is the name of the queue you want to use. If the queue does not exist, you will get an error when committing. If not

Configure the queue and the app will be placed in a queue called default.


Fair Scheduler Configuration

Fair Scheduler tries to allocate resources, all running apps get the same resources. Figure 4-3 shows how to split the

An application in a quueu; however, the split is usually between the queue, which we will see later.

To understand how a resource is shared between queues, imagine two users A and B, each with their own queue (Figure 4-4). A opens a job,

Because B does not have a request, it gets all the resources. Then when a job is still running, B opens a job, and after a while, each

The job uses half the resources in the way we saw earlier. Now, if B opens a second job, it will work with the other jobs in B

Share resources, so each job in B occupies One-fourth resources, and a continues to occupy half of the resources. The result is a split between the users of the resource.




Enable Fair Scheduler

Which sheduler to use is determined by the configuration Yarn.resourcemanager.scheduler.class. Capacity Scheduler is used by default (some Hadoop default

Use fair Scheduler, for example CDH), but you can set the Yarn.resourcemanager.scheduler.class in the Yarn-site.xml to change, and its value is

The full path of the scheduler class, Org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.


Queue configuration

Fair Scheduler is configured through a configuration file called Fair-scheduler.xml in the Classpath, possibly by setting the Yarn.scheduler.fair.allocation.file

Change the file name). If there is no configuration file, then the operation of Fair Scheduler as described earlier: Each application is placed in the user

Commits its first application when it is dynamically created in a queue.

The configuration of each queue is defined in the configuration file. This allows you to configure a hierarchy of queues, just like capacity scheduler supports. For example, we can define

Prod and dev queue, as we did in the capacity scheduler, as shown in example 4-2:

Example 4-2. An allocation file for the Fair scheduler<?xml version= "1.0"?><allocations>    < defaultqueueschedulingpolicy>fair</defaultqueueschedulingpolicy>    <queue name= "prod" >        <weight>40</weight>        <schedulingPolicy>fifo</schedulingPolicy>    </queue>    <queue name= "Dev" >        <weight>60</weight>        <queue name= "eng"/>        <queue Name= "Science"/>    </queue>    <queuePlacementPolicy>        <rule name= "specified" create= " False "/>        <rule name=" Primarygroup "create=" false "/>        <rule name=" Default "queue=" Dev.eng "/ >    </queuePlacementPolicy></allocations>
Defines the queue hierarchy through nested queue elements. All the queue are child elements of the root queue, even if they are not nested within a root queue element. This

We divide the dev queue into two queue, one Eng, and another science.

A queue can have weights that are used to assign calculations. In this example, it is considered fair to allocate clusters between prod and dev according to a 40:60 ratio. Eng and Science

The queue does not define weights, so they are evenly distributed between them. Weights are not necessarily percentages, although in the case of simplicity, their sum is equal to 100. We can define

The weights are 2 and the same for the same purpose.

The queue can have different scheduling policies. The queue default policy can be set by the top-level element defaultqueueschedulingpolicy, or, if omitted, by using Fari Scheduler.

Fair Scheduler also supports FIFO policies, as well as dominant Resource fairness, which is explained later in this chapter. For a specific queue, you can use the Schedulingpolicy

element to override the global variable. In this case, the PROD queue uses FIFO scheduler because we want each job to be ordered and executed in the shortest time possible. Attention

Between the prod and the dev queue, the fair share is used to allocate resources, as well as between the Eng and science queues (and internally).

Although this profile is not displayed, the queue can configure the minimum and maximum resources, and the maximum number of applications to run. The minimum resource setting is not hard, but scheduler is used to differentiate

Resource allocation priority. If two queues are lower than their split value, then the mid-term is much lower than its minimum value to allocate resources first. Minimum resource settings can also be used to enforce possession,

discussed later.


Queue Locator

Fair Scheduler uses a rule-based system to determine which queue an application is placed in. In example 4-2, the Queueplacementpolicy element contains some rules, each of which

All take turns trying until there is a match. The first rule, spcified, places the application in the queue it defines, if there is no definition, or if the family property queue does not exist, then this

Rules do not match, try the next rule. The Primarygroup rule attempts to place the application in a queue that corresponds to the user's user group name, and if not, does not create, but tries the next

Rule. The default rule will place all apps in the Dev.eng queue.

Queueplacementpolicy can be omitted completely, in this case the default behavior is the same as the following definition:

<queuePlacementPolicy>    <rule name= "specified"/>    <rule name= "user"/></ Queueplacementpolicy>
In other words, unless the display defines a queue, the user name is made as a queue and is created if necessary.

Another simple queue placement strategy is to place all applications in the same queue. This allows resources to be split between apps, rather than between users. Definition and the following is

Equivalent to:

<queuePlacementPolicy>    <rule name= "Default"/></queueplacementpolicy>
You can place the app in the default queue instead of the Per-user queue by setting Yarn.scheduler.fair.user-as-default-queue to False, so you don't need to use

The quota file in the preceding section. In addition, you need to set Yarn.scheduler.fair.allow-undeclared-pools to false so that users cannot dynamically create a queue.


Preemption

When a job is submitted to an empty queue on a busy cluster, the job cannot start immediately until the resource is freed from a job that is already running on the cluster. In order to

Make job Opening time more controllable, Fair Scheduler support preemption.

Preemption allows scheduler to terminate the container of a queue running beyond its shared resources, so that the resource can be allocated to a queue below the equally divided resource. Note that preemption

The overall efficiency of the cluster is reduced because the terminated container need to be re-executed.

Preemption can be turned on by setting the global yarn.scheduler.fair.preemption to True. There are two related preemption timeout settings: one to ensure

Minimum share, a fair share that is used to ensure that the two definitions are in seconds. By default, timeouts are not set, so you must set at least one to allow the

Container to preempt.

If a queue waits for minimum share preemption timeout for so long that it has not received a minimum guaranteed resource, then scheduler may preempt other

The container. The default timeout for all queues can be set through the top-level element defaultminsharepreemptiontimeout, which can be passed through elements in a queue

Minshare Preemptiontimeout To set the time-out for this queue.

Similarly, if a queue is still less than half the fair share after waiting for fair share preemption timeout, scheduler may preempt

The other container. The default timeout for all queues can be set through the top-level element defaultfairsharepreemptiontimeout, which can be passed through elements in a queue

Fairsharepreemptiontimeout to set the time-out for this queue. This critical value can also be set by setting the Defaultfairsharepreemptionthreshold and

Fairsharepreemptionthreshold (per-queue) to change.


Delay scheduling

All yarn Scheduler try to satisfy the location request. On a busy cluster, if an application requests a specific node, there is a good chance that another container is

Run on top. The obvious approach is to relax the request and assign the same rack on the container. But in practice, if you wait for a short time (no more than a few seconds), the allocation

The chance for the requested node to soar, which increases the efficiency of the cluster. This feature is called delay scheduling, capacity Scheduler and the Fair Scheduler support this feature.

Each node of YARN management periodically sends a heartbeat request to the resource manager----is sent by default once per second. Heartbeat carries node Manager's running container and

Can allocate information about a resource, so every heartbeat is an opportunity for an application to run.

When using deferred scheduling, scheduler does not simply use the first scheduling opportunity it receives, but waits for the maximum value of a given scheduling opportunity before easing the request and

And get the next scheduling opportunity.

For capacity Scheduler, the delay is dispatched by setting the Yarn.scheduler.capacity.node-locality-delay to a positive integer to indicate that the relaxed node is constrained to the same rack

The number of scheduling opportunities that it intends to miss before the arbitrary node is removed.

The Fair Scheduler also uses a scheduling opportunity number to delay, although it is expressed as a ratio of cluster size. For example, set Yarn.scheduler.fair.locality.threshold.node to 0.5

means that scheduler has at least half of the nodes that provide scheduling opportunities before accepting nodes with rack. There is a similar property, Yarn.scheduler.fair.locality.threshold.rack,

Is the threshold value that is set to accept another rack.


Dominant Resource fairness

The concept of capacity or fairness is well established when only a single type of resource needs to be dispatched, such as content. If you have two users running the app, you can measure both

The amount of content that is used by an app. However, when there are multiple types of resources, things become complicated. If a user's application requires a lot of CPU and very little content, and the other

What is the comparison between the two applications that require very little CPU and a lot of content?

Yarn's scheduler solves this problem by looking at the user's explicit resources and using it to measure the use of the cluster. This method is called dominant Resource fairness,

Abbreviation DRF. It is best to use an example to illustrate.

Imagine a cluster with 100 CPUs and 10TB of memory. Apply a request container (2 CPU,300GB), apply B request container (6 CPU,100GB). A's request takes up

(2%,3%), so the content is dominant, because its ratio (3%) is greater than the proportion of the CPU (2%). B's request (6%,1%), so the CPU is dominant. Because

The explicit container request for B is twice times (6% to 3%) of a, and the number of container allocated to it is half of the split.

The default is not to use DRF, so when the resource is calculated, only the content is considered, and the CPU is ignored. Capacity scheduler can be set by setting the Capacity-scheduler.xml in the

Yarn.scheduler.capacity.resource-calculator uses DRF for Org.apache.hadoop.yarn.util.resource.DominantResourceCalculator.

For fair Scheduler, DRF may be enabled by setting the top-level element defaultqueueschedulingpolicy to DRF.


Fourth Chapter three yarn scheduling

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.