Understanding Yarn Scheduler

Source: Internet
Author: User

Introduced

In yarn, the Resource Scheduler (Scheduler) is an important component in ResourceManager, which is responsible for allocating and scheduling the resources of the entire cluster (CPU, memory). Allocations are distributed in the form of resource container to individual applications (such as MapReduce jobs), and applications collaborate with NodeManager on the node where the resource resides to accomplish specific tasks, such as reduce task, using container.

The scheduler is configured in pluggable form, and the framework provides three types of scheduler by default:

1) FIFO Scheduler:
FIFO is first served. Job submission jobs are an important factor in resource allocation prioritization.

2) Capacity Scheduler:
With capacity as the center, the resources are divided into several queues, and the resources are allocated according to their own logic in each queue. For example, in queue A can be scheduled resources accounted for 80%, queue B occupies the remaining 20%, each queue to accept the corresponding job request, allocated in their own resources.

3) Fair Scheduler:
Uphold the principle of fairness, as far as possible to get the average resources of each job. In the job 2 after submission, the original JOB1 occupies the resources allocated generally to the job 2, so as to achieve "fairness."

Capacity Scheduler Configuration

With capacity scheduled clusters, resources are divided into a series of queues, each of which manages a portion of the entire cluster resource. The inside of the queue can be nested again, forming a hierarchy. The resources in the queue are assigned in a FIFO manner.

Typically, a job cannot use a resource that exceeds the capacity of the queue, but if there are more than one job in one queue and there are idle resources, the scheduler assigns resources to the job, even if this causes the queue to exceed the capacity limit. This feature is called Queue resiliency (elasticity). To avoid queues that consume too many other queues of resources, you can configure a maximum capacity that the queue can use only resources within that capacity, but at the expense of some flexibility.

Scheduler configuration

Let's say we have the following queue hierarchy:

    • Root
      • Prod
      • Dev
        • Eng
        • Science

In this hierarchy, the prod and dev two queues are defined under the root queue, assuming that the proportion of cluster resource capacity is 40% and 60%, respectively. Dev is divided into the Eng and science teams, which account for 50% of the capacity. A configuration file (Capacity-scheduler.xml) Under this structure is shown below:

<configuration>  <property >    <name>Yarn.scheduler.capacity.root.queues</name>    <value>Prod,dev</value>  </Property >  <property >    <name>Yarn.scheduler.capacity.root.dev.queues</name>    <value>Eng,science</value>  </Property >  <property >    <name>Yarn.scheduler.capacity.root.prod.capacity</name>    <value>40</value>  </Property >  <property >    <name>Yarn.scheduler.capacity.root.dev.capacity</name>    <value>60</value>  </Property >  <property >    <name>Yarn.scheduler.capacity.root.dev.maximun-capacity</name>    <value>75</value>  </Property >  <property >            <name>Yarn.scheduler.capacity.root.dev.eng.capacity</name>    <value>50</value>  </Property >  <property >            <name>Yarn.scheduler.capacity.root.dev.science.capacity</name>    <value>50</value>  </Property ></configuration>

These configurations are in a nested format, and the queue hierarchy is represented by a point number. Note that the dev queue is configured with a Maximun-capacity property with a value of 75%. In other words, when PROD is idle, the maximum resource that dev can use is only 75% so that prod only 25% of the resources are available immediately.

In addition to the capacity configuration, you can configure the maximum number of resources that a single user or program can use, while running several applications, ACL control, and so on.

Queue Placement

After the queues are partitioned, the application also needs to configure the resources in which queues to use. For example, for a mapreduce job, the mapreduce.job.queuename property configures the queue used by the job. If this queue does not exist, it will report an error when the job is submitted. If no queue is specified, the job is placed in a queue called default. Note that the queue name configured here requires only the last paragraph, which is only needed eng , and cannot be usedroot.dev.eng

Fair Scheduler Configuration

The Fair scheduler tries to ensure that the running applications are assigned to the average resource, where the average is defined by the number of resources, such as memory size. In fact, the Fair Scheduler also uses the queue mechanism to work. Use to interpret the allocation process:

Suppose there are two users A and B, each holding queue A and B. At the outset, a started job1, because B had no resources to use, so job1 occupied all the resources. After a while, B started job2, and after a while (waiting for JOB1 to release resources), JOB1 and job2 each accounted for half of the resources. If User B then starts job3, then job2 and JOB3 will share queue B, which is 25% of the total resources of the cluster. When Job2 is finished, job3 occupies the entire queue B (50%).

So we come to the conclusion that fairness refers to the fairness between users, not the fairness of the application. In the above example, users A and B accounted for 50% of resources, when B started two jobs, not 3 jobs each accounted for One-third, but A and b accounted for the 50%,b internally and then divided into two jobs.

Which scheduler is used, which yarn-site.xml is passed in yarn.resourcemanager.scheduler.class . The default is capacity Scheduler, although Cloudera's release CDH uses fair Scheduler by default. To use fair Scheduler, we set this property to:

org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler

The configuration information for the Fair Scheduler is placed in the Fair-scheduler.xml file, and RM loads the file from the classpath, and the filename can be changed by setting it yarn.scheduler.fait.allocation.file .

Here is a configuration file:

<?xml version= "1.0"?><allocations>  <defaultqueueschedulingpolicy>Fair</defaultqueueschedulingpolicy>  <queue name="prod">    <weight>40</weight>    <schedulingpolicy>Fifo</schedulingpolicy>  </Queue>  <queue name="Dev">    <weight>40</weight>    <queue name="eng"/>    <queue name="Science"/>  </Queue>  <queueplacementpolicy>    <rule name="Specified" create="false"/>    <rule name="Primarygroup" Create="false"/>     <rule name="Default" queue="Dev.eng"/>  </queueplacementpolicy></Allocations>

Based on the above configuration, when prod and Dev are considered to be fair when they occupy 40% and 60% respectively, Eng and science are considered fair when they account for 50% of Dev. Note that what is important here is the proportions, not the percentages, such as 40 and 60, which can be replaced by 2 and 3, with only the same proportions.

The scheduling policy used by PROD is not configured for Fifo,dev, all use the document-level default policy, which is the fair policy configured by the Defaultqueueschedulingpolicy element. Although each queue can be configured with its own scheduling policy, the Fair Fairness policy is still used between prod and Dev, between Eng and science.

Application Queue Configuration

The queue in which the application is placed, with the Queueplacementpolicy configuration, from a series of rule from top to bottom, until successful. In the above example, first try to put the application in its own configured queue, if there is no configuration or configured queue does not exist, then use the second rule primarygroup, try to run the app in the name of the user's primary UNIX group name queue. If it is not, use the default Dev.eng.

Applied placement rules can be completely non-configurable, with the following rules by default:

<queuePlacemntPolicy>  <rule name="specified" />  <rule name="user"/></queuePlacementPolicy>

You can also use the following configuration to make all apps run in the default queue:

<queuePlacemntPolicy>  <rule name="default"/></queuePlacementPolicy>
Priority preemption

When a cluster is in a busy state, if we submit a job to one of the empty queues, we need to wait for other jobs to release the resources to start running. In order to allow time for the release to be controlled within the forecast range, Fair Scheduler supports priority (preemption).

Priority run scheduler kills the container of the queue that occupies (in accordance with the principle of fairness), freeing up space occupied by the queue's application. Priority may reduce the efficiency of the cluster because the container that are killed need to be re-executed. The wait time before killing is configured by the F element within the queue element in the Allocaion file airSharePreemptionTimeut , without the use of global elements when configured, defaultFairSharePreemptionTimeut While Fairsharepreemptionthreshold and Defaultfairsharepreemptionthreshold are used to set the wait time, the resources that originally belong to the queue, for example, set to 0.5, indicating that when the waiting time, If the original attribute of the queue (in accordance with the principle of fairness) is not half of the resources available, then the container that kills other apps is started.

Delay scheduling

When an application requests to run on a particular node, if the node is just too busy to be assigned to the app, one option is to select the other node in the same rack nearest to it. In practice, however, it is found that a small period of time is likely to be assigned to a specified node, thereby increasing the efficiency of the cluster. This feature is called delayed scheduling (delay scheduling). Capacity Scheduler and Fairscheduler support this feature.

By default, NodeManager in the cluster wants RM to send the heartbeat every second, and the heartbeat contains the container that the node is currently running and the resources available for the new container. So each heartbeat is a scheduling opportunity (scheduling opportunity). When deferred scheduling is enabled, for resource requests with node location requirements, the scheduler does not simply use the first scheduling opportunity (sacrificing data locality), but instead tries to wait until the maximum number of scheduling opportunities, once the required scheduling opportunity is found, and allocated to the application. If there is no suitable node resource until the maximum number of times, the data locality requirements are relaxed, and the same rack is used for the second step.

The configuration of the maximum number of opportunity, using the configuration in Capacity scheduler yarn.scheduler.capacity.node-locality-delay , with a positive integer value. Fair Scheduler slightly different, configuration is a yarn.scheduler.fair.lcality.threshold.node property, for example, set to 0.5, to wait for half of the nodes in the cluster to send a heartbeat to provide scheduling opportunity, before deciding whether to relax the requirement to adopt the other nodes on the same rack. The corresponding yarn.scheduler.fair.lcality.threshold.rack configuration to discard the same rack requires a different rack of machines before you need to wait for how many nodes to send the heartbeat.

Equity of resources

Fairness is easier to define when there is only one resource in the cluster that needs to be dispatched. For example, only memory resources, it is easy to determine the size of the memory fairness. But when there are multiple resources to dispatch, things start to get complicated. For example, one application request allocates large amounts of memory and very few CPUs, while the other uses a large amount of CPU and a small amount of memory, and how is this compared?

Yarn's approach is measured by the dominant use of resources, called dominant Resource Fair, referred to as DRF. We use a simple example to illustrate. Suppose a cluster has 100 CPUs and 10TB of memory. A container with a request (2 CPU,200GB) is applied with a ratio of (2% and 3%), at which time the memory usage requirement dominates (3%>2%). b Application Request (6%,1%) of the container, then the CPU dominates. So the ratio of a application to B application is 3%:6%, based on the principle of fairness, the number of container assigned to B application will be half of a application.

By default, DRF is not turned on, so only memory is considered when calculating fairness, and the CPU is ignored. To enable Drf,capacity Scheduler you need to capacity-scheduler.xml set the property yarn.scheduler.capacity.resource-calculator to org.apache.hadoop.yarn.util.resource.DominantResourceCalculator . Fair Scheduler is set Defaultqueueschedulingpolicy to DRF in the corresponding allocation XML file.

Understanding Yarn Scheduler

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.