Analysis of configuration parameters of Hadoop yarn (4)-fair Scheduler related parameters

Source: Internet
Author: User
Keywords nbsp; Application if

First in Yarn-site.xml, set the configuration parameter Yarn.resourcemanager.scheduler.class to Org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSche Duler.

The configuration options for the Fair Scheduler include two parts, Some of them are used in Yarn-site.xml to configure scheduler-level parameters, while the other is in a custom profile (default is Fair-scheduler.xml), which is primarily used to configure information on the amount of resources, weights, and so on for each queue.

To find out what Fair Scheduler is, read my article "Hadoop Fair Scheduler Analysis."

1. configuration file Yarn-site.xml

(1) Yarn.scheduler.fair.allocation.file: Customize the location of the XML configuration file, which is mainly used to describe the attributes of each queue, such as the amount of resources, weights, etc., the specific configuration format will be described later.

(2) &http://www.aliyun.com/zixun/aggregation/37954.html ">nbsp; Yarn.scheduler.fair.user-as-default-queue: Whether the user name is specified as the queue name of the application when the application does not specify a queue name. If set to False or not set, all applications with unknown queues are committed to the default queue, and the defaults are true.

(3) Yarn.scheduler.fair.preemption: Whether the preemption mechanism is enabled and the default value is False.

(4) Yarn.scheduler.fair.sizebasedweight: When allocating resources within a queue, resources are assigned to each application by default by a fair polling method, which provides an alternative way of allocating resources by allocating resources according to the number of application resource requirements, that is, the greater the number of requirements resources, the more resources allocated. By default, this parameter value is false.

(5) Yarn.scheduler.assignmultiple: Whether to start the bulk allocation function. When a node has a large amount of resources, it can be assigned one time, or it can be assigned multiple times. By default, this parameter value is false.

(6) Yarn.scheduler.fair.max.assign: If the bulk allocation function is turned on, you can specify the number of container assigned at one time. By default, this parameter value is-1, which means no limit.

(7) Yarn.scheduler.fair.locality.threshold.node: When an application requests a resource on a node, it can accept the maximum resource scheduling opportunity that can be skipped. When you assign a resource on one node to an application according to an allocation policy, if the node is not the node that the application expects, you can optionally skip the allocation opportunity to temporarily assign the resource to another application until the node resource that meets the application needs to appear. In general, a heartbeat represents a scheduling opportunity, and this parameter represents the percentage of the total number of nodes skipped by the schedule, which, by default, is 1.0, which means that no scheduling opportunities are skipped.

(8) Yarn.scheduler.fair.locality.threshold.rack: When an application requests a rack of resources, it can accept the maximum resource scheduling opportunity that can be skipped.

(9) YARN.SCHEDULER.INCREMENT-ALLOCATION-MB: Memory regularization unit, default is 1024, which means that if a container request resource is 1.5GB, the scheduler will be normalized to ceiling (1.5 GB /1GB) * 1G=2GB.

(a) Yarn.scheduler.increment-allocation-vcores: Virtual CPU regularization unit, the default is 1, meaning and the memory of structured units similar.

2. Custom profile

Fair Scheduler allows the user to place the queue information exclusively in a configuration file (the default is Fair-scheduler.xml), and for each queue, the administrator can configure the following options:

(1) Minresources: Minimum resource guarantee, formatted as "X MB, Y vcores", when the minimum resource guarantee for a queue is not met, it will take precedence over other sibling queues, and for different scheduling policies (described later), the minimum amount of resource assurance is different. For the fair policy, only memory resources are considered, that is, if a queue uses more memory resources than its minimum amount of resources, it is considered to be satisfied, and for the DRF policy, consider the amount of resources used by the primary resource, that is, if the amount of primary resources in a queue exceeds its minimum resource amount, it is considered to be satisfied.

(2) Maxresources: The maximum amount of resources that can be used, fair scheduler will ensure that the amount of resources used per queue does not exceed the maximum amount of resources available for the queue.

(3) Maxrunningapps: The maximum number of applications running concurrently. By limiting this number, the intermediate output resulting from the overload map task running simultaneously can be prevented from exploding the disk.

(4) Minsharepreemptiontimeout: Minimum share preemption time. If a resource pool has been using less than the minimum amount of resources during that time, the resource will begin to preempt.

(5) Schedulingmode/schedulingpolicy: The scheduling mode used by the queue, can be FIFO, fair or DRF.

(6) Aclsubmitapps: A list of Linux users or user groups that can submit applications to the queue, by default "*", which means that any user can submit applications to the queue. It is important to note that this property has inheritance, that is, the list of child queues inherits the list of parent queues. When this property is configured, a "," split between users or groups of users, separated by spaces between users and groups of users, such as "User1, User2 group1,group2".

(7) Acladministerapps: The list of administrators for this queue. An administrator of a queue can manage resources and applications in the queue, such as killing any application.

Administrators can also add maxrunningjobs properties to individual users to limit the number of applications that run at most concurrently. In addition, administrators can set the default values for the above properties by using the following parameters:

(1) Usermaxjobsdefault: The default value of the user's Maxrunningjobs property.

(2) Defaultminsharepreemptiontimeout: The default value for the Minsharepreemptiontimeout property of the queue.

(3) Defaultpoolschedulingmode: The default value for the Schedulingmode property of the queue.

(4) Fairsharepreemptiontimeout: Fair share amount preemption time. If a resource pool uses less than half the amount of resources within that time, it starts to preempt the resource.

The instance assumes that you want to set up three queues QueueA, Queueb, and QUEUEC for a Hadoop cluster, where QUEUEB and QUEUEC are child queues for QueueA, and that the average user can run up to 40 applications at the same time, However, the user UserA can run up to 400 applications at the same time, and you can set the following in a custom configuration file:

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.