Hadoop yarn also supports two resource scheduling for both memory and CPU, and in yarn, resource management is performed by ResourceManager and NodeManager, where the scheduler in ResourceManager is responsible for allocating resources, and NodeManager is responsible for the supply and isolation of resources. This article Dong Xi will introduce some of the progress of yarn in resource isolation.
Author's original:
Resource scheduling and resource isolation is the most important and basic two functions of yarn as a resource management system. Resource scheduling is done by ResourceManager, and resource isolation is implemented by various NodeManager, and in the article "scheduling and isolating memory and CPU two resources in the Hadoop yarn," I've covered yarn memory and CPU resource isolation, This article will introduce some progress of yarn in resource isolation.
When it comes to resources, we usually refer to three kinds of resources, memory, CPU, and IO. By default, yarn does not isolate any resources, and, of course, programs written in the Java language use the isolation mechanism built into the JVM to isolate memory resources. With the gradual improvement of yarn, there are significant advances in memory, CPU and IO Three kinds of resource isolation.
First, memory resource isolation is introduced. Memory resources are yarn from the beginning of the management and scheduling of resources, taking into account the specificity of memory resources, yarn did not explicitly force the memory resource isolation, so as to avoid the memory jitter when the task was not gracefully killed. Of course, if you write a task in the Java language, you can use the memory isolation mechanism provided by the JVM, which is a good choice. For yarn, the current effort is to monitor the process tree for each task, and if the process tree for each task uses more total physical memory or total virtual memory than preset values, then the entire process tree is killed by sending term and kill two signals in turn. If you run some special tasks or services on YARN, you want to use cgroups to strictly isolate the memory, you can focus on: https://issues.apache.org/jira/browse/YARN-1856
Next, CPU resource isolation is described. CPU resource scheduling from the Hadoop 2.2.0 has been well supported, but CPU resource isolation support is very bad, has been completed or is doing the following work;
Reference CPU resources are used and isolated in percentages. Through the Cgroup cpu.shares parameters, this method can ensure that the CPU resources on each node are fully shared and used, resulting in higher CPU utilization. Starting with the Hadoop 2.2.0 has been supported, but to enable this feature, you need to go through more complex parameter configuration and tuning, related jira: https://issues.apache.org/jira/browse/YARN-3
(2) Limit the CPU Resource usage limit per container. The one by one CPU isolation methods can guarantee the lower CPU usage of each CONTAIENR, in most cases, you may get as much CPU resources as you expect, and this isolation will severely limit the CPU usage limit, for example, if you want to use 2 CPUs, you will limit your use to only 2. Can not be used, even if the same machine still has a large number of idle CPU resources, will not allow you to use. This function is implemented through the Cgroup Cpu.cfs_quota_us and cpu.cfs_period_us two parameters, there are currently available patch, but has not yet been merge into the backbone, specific reference: https:// issues.apache.org/jira/browse/yarn-810
(3) Limit the CPU limit used by yarn. Implementation mechanism and (2) are the same, there are available patch, specific reference: https://issues.apache.org/jira/browse/YARN-2440
Note that yarn allows you to configure the number of physical CPUs that can be used on each node, as well as the ratio of the physical CPU to the virtual CPU, while the user requests resources only to request virtual CPUs. By default, the physical CPU and virtual CPU are 1:1, if your cluster is heterogeneous, some nodes on the CPU has more computing power, you can adjust the physical CPU and virtual CPU ratio. The concept of virtual CPU is to draw lessons from "Physical memory and virtual memory", the main purpose is to eliminate the heterogeneity of CPU computing ability in the cluster.
The
Finally describes the IO resource. IO resources are divided into disk IO and network io two kinds. Currently yarn has been working on both fronts, and preliminary design documents have been released. IO resource isolation than CPU and memory more complex, in order to facilitate users to quantify IO resources, yarn modeled "virtual CPU" concept, introduced the "Virtual Disk" (Vdisk), the first phase will attempt to use Cgroup Blkio module to achieve disk IO isolation. Of course, before implementing this functionality, you need to add IO resources to the scope of scheduler management so that resource dispatchers in Hadoop, such as the Fair Scheduler or the capacity Scheduler, can schedule disk IO and network IO.