Scheduling and isolation of memory and CPU resources in Hadoop YARN

Source: Internet
Author: User
Keywords manager if
Tags analysis check computing power configuration configuration parameters cpu default different

At the same time support scheduling memory and CPU resources (default only supports memory, if you want to further scheduling the CPU, you need to make some configuration), this article describes how Hadoop YARN scheduling and isolation of these resources.

In YARN, resource management is done jointly by the ResourceManager and the NodeManager, where the scheduler in the ResourceManager is responsible for allocating resources and NodeManager is responsible for providing and isolating resources. After the ResourceManager assigns resources to a task on a NodeManager (this is called "http://www.aliyun.com/zixun/aggregation/13808.html"> Resource Scheduling), the NodeManager needs to provide the task with the appropriate Resources, and even guarantee that these resources should be monopolized and provide the basis for the operation of the mission. This is called resource isolation.

For more information on Hadoop YARN resource scheduler, please refer to my article: YARN / MRv2 Resource Manager in-depth analysis - Resource Scheduler.

Before formal introduction of specific resource scheduling and isolation, first taste the memory and CPU characteristics of these two resources, which are two different nature of the resources. The number of memory resources will determine the mission life and death, if the memory is not enough, the task may fail to run; in contrast, CPU resources are different, it will only determine the speed of the task running, will not affect life and death.

[Hadoop YARN memory resource scheduling and isolation]

Based on the above considerations, YARN allows users to configure the available physical memory resources on each node. Note that this is "available" because memory on one node is shared by several services such as YARN part, HDFS part, To HBase, YARN configuration can only be used by yourself, the configuration parameters are as follows:

(1) yarn.nodemanager.resource.memory-mb

Indicates the total amount of physical memory that YARN can use on this node. The default is 8192 (MB). Note that if your node has less than 8GB of memory, you need to reduce the value. YARN does not intelligently detect the physical memory of a node Total amount.

(2) yarn.nodemanager.vmem-pmem-ratio

The task of using 1MB of physical memory, the maximum use of virtual memory, the default is 2.1.

(3) yarn.nodemanager.pmem-check-enabled

Whether to start a thread to check each task is using the amount of physical memory, if the task exceeds the assigned value, then kill it directly, the default is true.

(4) yarn.nodemanager.vmem-check-enabled

Whether to start a thread to check each task is using the amount of virtual memory, if the task exceeds the assigned value, then kill it directly, the default is true.

(5) yarn.scheduler.minimum-allocation-mb

The minimum amount of physical memory that a single task can claim is 1024 (MB) by default. If a task requests less physical memory, the corresponding value is changed to this number.

(6) yarn.scheduler.maximum-allocation-mb

The maximum amount of physical memory that a single task can claim, the default is 8192 (MB).

By default, YARN uses a thread-monitoring method to determine if a task is overusing memory, and if it is found to be excessive, kill it directly. Cgroups lack of flexibility in the control of memory (the task can not exceed the memory limit at any time, if it is exceeded, it will be killed or reported OOM), and the Java process will double the memory at the moment of creation and then plunge to normal In this case, thread monitoring is more flexible (when it is found that the process tree memory momentarily doubled beyond the set value can be considered normal, will not kill the task), YARN does not provide Cgroups memory isolation mechanism .

[Hadoop YARN CPU resource scheduling and isolation]

In YARN, the organization of CPU resources is still under investigation. Currently, version 2.2.0 is only a preliminary, very coarse-grained implementation. The finer granularity of CPU partitioning has been proposed and is being refined and implemented .

The current CPU is divided into a virtual CPU (CPU virtual Core), where the virtual CPU is YARN introduced its own concept, the original intention is that taking into account the CPU performance of different nodes may be different, each CPU has the computing power is not the same, For example, a physical CPU may have twice the computing power of another physical CPU. At this time, you can make up for this difference by configuring more virtual CPUs for the first physical CPU. When users submit jobs, you can specify the number of virtual CPUs needed for each task. In YARN, CPU related configuration parameters are as follows:

(1) yarn.nodemanager.resource.cpu-vcores

Indicates the number of virtual CPUs that YARN can use on the node. The default is 8. Note that it is currently recommended to set this value as the same as the number of physical CPU cores. If your node has fewer than 8 CPU cores, you need to reduce this value, whereas YARN does not intelligently probe the total number of physical CPUs for the node.

(2) yarn.scheduler.minimum-allocation-vcores

The minimum number of virtual CPUs that can be applied for a single task is 1 by default. If the number of CPUs that a task requests is less than this number, the corresponding value is changed to this number.

(3) yarn.scheduler.maximum-allocation-vcores

The maximum number of virtual CPUs that a single task can claim is 32, which is the default.

By default, YARN will not schedule CPU resources, you need to configure the appropriate resource scheduler for your support, the specific reference to my two articles:

(1) Analysis of Hadoop YARN configuration parameters (4) -Fair Scheduler related parameters

(2) Analysis of Hadoop YARN configuration parameters (5) -Capacity Scheduler related parameters

By default, NodeManager does not isolate CPU resources anymore. You can enable CPU isolation by enabling Cgroups.

Due to the unique nature of CPU resources, the current CPU allocation is still coarse-grained. For example, many tasks may be IO-intensive, consume very little CPU resources, and if you assign a CPU to them at this point, it is a serious waste and you can have him share a CPU with several other tasks , That is, we need to support more granular CPU expressions.

Drawing on the division of CPU resources in Amazon EC2, we propose that the minimum unit of CPU is the EC2 Compute Unit (ECU), and one ECU represents the processing power of the 2007 Opteron or 2007 Xeon processor from 1.0-1.2 GHz. YARN presents the YARN Compute Unit (YCU), which is currently an integer, defaults to 720, set by the parameter yarn.nodemanager.resource.cpu-ycus-per-core, to represent the computational power of a CPU core This feature does not exist in version 2.2.0 and may be added to version 2.3.0.) In this way, when a user submits a job, it directly specifies the required YCU. For example, the specified value is 360, which means 1/2 CPU Core, the actual performance, only use a CPU core 1/2 to calculate the time. Note that at the operating system level, the CPU resources are allocated on a time slice basis. You can say that one process uses 1/3 of the CPU time slice, or 1/5 slice time slice.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.