Discussion on resource isolation mechanism of Hadoop yarn

Last Update:2015-03-17 Source: Internet

Author: User

Keywords Hadoop

Tags apache course cpu cpu usage disk hadoop hadoop 2 https

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Hadoop yarn also supports two resource scheduling for both memory and CPU, and in yarn, resource management is performed by ResourceManager and NodeManager, where the scheduler in ResourceManager is responsible for allocating resources, and NodeManager is responsible for the supply and isolation of resources. This article Dong Xi will introduce some of the progress of yarn in resource isolation.

Author's original:

Resource scheduling and resource isolation is the most important and basic two functions of yarn as a resource management system. Resource scheduling is done by ResourceManager, and resource isolation is implemented by various NodeManager, and in the article "scheduling and isolating memory and CPU two resources in the Hadoop yarn," I've covered yarn memory and CPU resource isolation, This article will introduce some progress of yarn in resource isolation.

When it comes to resources, we usually refer to three kinds of resources, memory, CPU, and IO. By default, yarn does not isolate any resources, and, of course, programs written in the Java language use the isolation mechanism built into the JVM to isolate memory resources. With the gradual improvement of yarn, there are significant advances in memory, CPU and IO Three kinds of resource isolation.

First, memory resource isolation is introduced. Memory resources are yarn from the beginning of the management and scheduling of resources, taking into account the specificity of memory resources, yarn did not explicitly force the memory resource isolation, so as to avoid the memory jitter when the task was not gracefully killed. Of course, if you write a task in the Java language, you can use the memory isolation mechanism provided by the JVM, which is a good choice. For yarn, the current effort is to monitor the process tree for each task, and if the process tree for each task uses more total physical memory or total virtual memory than preset values, then the entire process tree is killed by sending term and kill two signals in turn. If you run some special tasks or services on YARN, you want to use cgroups to strictly isolate the memory, you can focus on: https://issues.apache.org/jira/browse/YARN-1856

Next, CPU resource isolation is described. CPU resource scheduling from the Hadoop 2.2.0 has been well supported, but CPU resource isolation support is very bad, has been completed or is doing the following work;

Reference CPU resources are used and isolated in percentages. Through the Cgroup cpu.shares parameters, this method can ensure that the CPU resources on each node are fully shared and used, resulting in higher CPU utilization. Starting with the Hadoop 2.2.0 has been supported, but to enable this feature, you need to go through more complex parameter configuration and tuning, related jira: https://issues.apache.org/jira/browse/YARN-3

(2) Limit the CPU Resource usage limit per container. The one by one CPU isolation methods can guarantee the lower CPU usage of each CONTAIENR, in most cases, you may get as much CPU resources as you expect, and this isolation will severely limit the CPU usage limit, for example, if you want to use 2 CPUs, you will limit your use to only 2. Can not be used, even if the same machine still has a large number of idle CPU resources, will not allow you to use. This function is implemented through the Cgroup Cpu.cfs_quota_us and cpu.cfs_period_us two parameters, there are currently available patch, but has not yet been merge into the backbone, specific reference: https:// issues.apache.org/jira/browse/yarn-810

(3) Limit the CPU limit used by yarn. Implementation mechanism and (2) are the same, there are available patch, specific reference: https://issues.apache.org/jira/browse/YARN-2440

Note that yarn allows you to configure the number of physical CPUs that can be used on each node, as well as the ratio of the physical CPU to the virtual CPU, while the user requests resources only to request virtual CPUs. By default, the physical CPU and virtual CPU are 1:1, if your cluster is heterogeneous, some nodes on the CPU has more computing power, you can adjust the physical CPU and virtual CPU ratio. The concept of virtual CPU is to draw lessons from "Physical memory and virtual memory", the main purpose is to eliminate the heterogeneity of CPU computing ability in the cluster.

The

Finally describes the IO resource. IO resources are divided into disk IO and network io two kinds. Currently yarn has been working on both fronts, and preliminary design documents have been released. IO resource isolation than CPU and memory more complex, in order to facilitate users to quantify IO resources, yarn modeled "virtual CPU" concept, introduced the "Virtual Disk" (Vdisk), the first phase will attempt to use Cgroup Blkio module to achieve disk IO isolation. Of course, before implementing this functionality, you need to add IO resources to the scope of scheduler management so that resource dispatchers in Hadoop, such as the Fair Scheduler or the capacity Scheduler, can schedule disk IO and network IO.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More