Yarn resource Scheduler
- With the popularization of hadoop, the number of users in a single hadoop cluster is growing. Applications submitted by different users often have different service quality requirements. Typical applications include:
- Batch Processing job. This type of job usually takes a long time and has no strict requirements on the completion time, such as data mining and machine learning applications.
- Notebook. This job is expected to return results in a timely manner, such as performing queries using hive.
- Productive operation. Such jobs require a certain amount of resource guarantee, such as statistical value calculation and spam data analysis.
1. Basic Architecture
- The resource scheduler is one of the core components in yarn and plug-in. It defines a complete set of interface specifications so that users can implement their own schedulers as needed.
- Yarn comes with three common resource schedulers: FIFO, capacity schedity, and fair scheduler. Of course, you can compile a new resource scheduler according to the interface specifications and run it through simple configuration.
- Yarn's resource manager is actually an event processor. It needs to process scheduler-eventtype events from external 6 and process the events according to their specific meanings, the six events have the following meanings:
- Node_removed: indicates that a computing node is removed from the cluster (it may be a node failure or the administrator actively removes it). When the resource scheduler receives this event, it needs to remove the corresponding resources from the total amount of resources that can be allocated.
- Node_added: indicates that a computing node is added to the cluster. When the resource scheduler receives this event, it needs to add the new resources to the total amount of allocated resources.
- Application_added: indicates that ResourceManager receives a new application. Generally, the resource manager needs to maintain an independent data structure for each application to facilitate unified management and resource allocation. Resource Manager adds the application to the Response Data Structure
- Application_removed: indicates that an application stops running (may succeed or fail). The Resource Manager clears the application from the response data structure.
- Container_expired: After the resource scheduler assigns a iner to an application-master, if the applicationmaster does not use the container for a certain period of time, the resource scheduler will reallocate the container (after recycling ).
- Node_update: After ResourceManager receives the information reported by nodemanager through the heartbeat mechanism, a node_update event is triggered. Because a new container may be released, this event triggers resource allocation. That is to say, this event is the most important event in the six events. It triggers the core resource allocation mechanism of the resource scheduler.
1.1 resource Representation Model
- Nodemanager registers with ResourceManager when it starts. The registration information includes the CPU and memory that can be allocated to the node. Both values can be set through the configuration options, as shown below:
- Yarn. nodemanager. Resource. Memory-MB: the total amount of physical memory that can be allocated. The default value is 8 GB.
- Yarn. nodemanager. vmem-pmem-ratio: the maximum amount of physical memory used by the task corresponds to the maximum amount of virtual memory available. The default value is 2.1, which indicates that 1 MB of physical memory is used. A maximum of MB of virtual memory can be used.
- Yarn. nodemanager. Resource. CPU-vcores: Number of virtual CPUs that can be allocated. The default value is 8. To divide CPU resources in a more fine-grained manner and take into account the differences in CPU performance, yarn allows administrators to divide each physical CPU into several virtual CPUs based on actual needs and CPU performance, the administrator can separately configure the number of available virtual CPUs for each node. When you submit an application, you can also specify the number of virtual CPUs required for each task.
- Scheduling semantics supported by yarn:
- Request specific resources on a node
- Request specific resources on a specific rack
- Add or remove some nodes to the blacklist and no longer allocate resources to these nodes.
- Request to return some resources
- Unsupported scheduling semantics:
- Request specific resources on any node
- Request specific resources on any Rack
- Requests a group or several groups of resources that meet certain characteristics.
- Ultra-fine granularity resources. Such as CPU performance requirements and CPU binding
- Dynamically adjusts container resources and allows dynamic adjustment of container resources as needed
1.2 Resource Scheduling Model
- Yarn adopts a two-layer resource scheduling model.
- In the first layer, the resource scheduler in ResourceManager allocates resources to each applicationmaster.
- In Layer 2, applicationmaster further allocates resources to its internal tasks
- The yarn resource allocation process is asynchronous. That is to say, after the resource scheduler assigns the resource to an application, it does not immediately push the resource to the corresponding applicationmaster, but temporarily puts it in a buffer zone, wait for the applicationmaster to take the initiative through the periodic HEARTBEAT (pull-based communication model)
- Yarn adopts the incremental resource allocation mechanism (when the resources requested by the application cannot be guaranteed for the moment, resources on a node are reserved for the application until the accumulated idle resources are released to meet the application requirements ), this mechanism will cause waste, but will not starve to death.
- The yarn resource scheduler uses the primary resource fair scheduling algorithm. The basic design concept of DRF is to apply the maximum minimization fair algorithm to the primary resource, then, the multi-dimensional resource scheduling problem is transformed into a single resource scheduling problem, that is, DRF always maximizes the minimum of all primary resources.
1.3 resource preemption Model
- In the resource scheduler, you can set a minimum resource and a maximum resource for each queue. The minimum resource is the amount of resources that each queue needs to ensure when resources are insufficient, the maximum resource is the resource usage that the queue cannot exceed in extreme cases.
- To improve resource utilization, the resource scheduler (including capacity schedity and fair Scheduler) resources in the queue with lighter loads are temporarily allocated to the queue with heavy loads (that is, the minimum resource volume is not hard resource guarantee. When the queue does not need any resources, it does not satisfy its minimum resource, but temporarily allocates idle resources to other queues that need resources. Only when the queue with lighter load suddenly receives the newly submitted application, the scheduler further allocates resources belonging to this queue to it. However, because resources may be used by other queues at this time, the scheduler must wait for other queues to release resources before returning these resources to the original master. This usually takes an uncertain wait time. To prevent the application from waiting for too long, the scheduler will seize the resource if it finds that the resource is not released after waiting for a period of time.
- Only when the enabled scheduler implements the preemptableresourceschedcesinterface and the parameter yarn. resourceManager. secheduler. monitor. when the enable value is set to true (the default value is false), ResourceManager enables resource preemption. Resource preemption is triggered by a third-party policy, which is implemented into some plug-in component classes (implementing the schedulingeditpolicy Interface) and through the yarn parameter. resourceManager. schdloud. monitor. specify ies (by default, yarn provides the default proporitonalcapacitypreemptionpolicy)
- ResourceManager will traverse these policy classes in sequence and further encapsulate them by the monitoring class schedulingmonitor. schedulingmonitor will periodically call the editschedule () function of the Policy class to determine which container resources are preemptible.
- In yarn, the entire process of resource preemption can be summarized as follows:
- Schedulingeditpolicy detects the resources to be preemptible and sends the resources to ResourceManager through the event drop_reservation and preempt_container.
- ResourceManager calls the dropcontainerreservation and preemptcontainer functions of resourceschediner and marks the container to be preemptible.
- ResourceManager receives heartbeat information from applicationmaster and returns the total amount of resources to be released and the list of resources to be preemptible to it through heartbeat response. After the applicationmaster receives the list, you can perform the following operations:
- Kill these container
- Select and kill other container to make up the total amount
- No processing is performed. Some container may release resources or the ResourceManager will kill the container.
- Schedulingeditpolicy detects that within a period of time, the applicationmaster does not manually kill the agreed iner, then these container is encapsulated into the kill_container event and sent to ResourceManager.
- ResourceManager calls the killcontainer function of resourceschediner, while resourceschediner indicates the container to be killed.
- ResourceManager receives the heartbeat information from nodemanager and returns the list of containers to be killed through the heartbeat response. After receiving the list, nodemanager kills the containers and notifies ResourceManager through heartbeat.
- ResourceManager receives heartbeat information from applicationmaster and sends the list of killed iner to it through heartbeat response
- In yarn, queues are organized in a tree structure. The amount of resources currently available for a queue depends on the minimum amount of resources a (configured by the administrator), queue resource demand (the total amount of resources required by applications in the waiting or running state) and the free resources of the same parent brother queue C (redundant resources can be shared with other queues ), this means that R has different values at different time points. You can use the recursive algorithm to find R = f (a, B, c). In this way, if a queue is currently using resources u> r, You need to seize (U-R) resources from the queue
- To avoid resource waste as much as possible, yarn preferentially selects iner with lower priority as the resource preemptible object, and does not immediately kill container, but leaves the task of releasing resources to the application itself: resourceManager sends the list of containers to be killed to the corresponding applicationmaster to expect it to use a certain mechanism to release the resources occupied by these containers. For example, after saving the status, then, the corresponding iner is killed to avoid computing waste. If the applicationmaster does not take the initiative to kill the container after a period of time, ResourceManager will force the container to be killed.
1.4 hierarchical Queue Management Mechanism
- Hierarchical queue organization has the following features:
- Sub-queue
- Queues can be nested. Each queue can contain sub-queues.
- You can only submit applications to the underlying queue, that is, the leaf queue.
- Minimum capacity
- Each sub-queue has a "minimum capacity ratio" attribute, indicating that the capacity percentage of the parent queue can be used.
- The scheduler always gives priority to the queue with the lowest current resource usage (current/minimum capacity) and allocates resources for it.
- The minimum capacity is not "the minimum capacity that will always be guaranteed". That is to say, if the minimum capacity of a queue is 20, and all queues in the queue only use 5, the remaining 15 may be allocated to other required queues.
- The minimum capacity is not less than 0, but cannot be greater than the "Maximum capacity"
- Maximum capacity
- To prevent a queue from over-consuming resources, you can set a maximum capacity for the queue, which is a resource usage limit. The total amount of resources used at any time cannot exceed this value.
- By default, the maximum capacity of a queue is infinite. This means that when a queue is allocated only 20% of resources and no other queues have applications, the queue may use 20% of resources, when another queue has an application submitted, it will be returned gradually
- User permission management
- The administrator can configure the operating system users and user groups for each leaf Queue (hadoop allows one operating system user or user group to correspond to one or more queues), or the administrator of each queue, it can kill any application in the queue and change the priority of any application (by default, users can only manage their own applications)
- System Resource Management
- Yarn resource management and scheduling are completed by the scheduler. The administrator can set the resource capacity of each queue and the resource volume of each user in the scheduler, the scheduler schedules applications according to these resource constraints.
Yarn resource Scheduler