HDP Learning--yarn Resource Management __HDP

Source: Internet
Author: User
First, Overview

  YARN (yet Another Resource negotiator) is the computing framework for Hadoop, and if HDFs is considered a filesystem for the Hadoop cluster, then YARN is the operating system of the Hadoop cluster. yarn is the central architecture of Hadoop .
Operating systems, such as Windows or Linux Admin-installed programs to access resources (such as CPUs, memory, and disk), similarly, yarn provides a variety of types of management architects (batch, interactive, online, streaming ...). Can be used to perform operational data across the entire cluster. YARN manages resource allocation for the various types of data processing workloads, prioritizes and schedules jobs, a ND enables authentication and multitenancy.


Mutitenancy:software Multitenancy is achieved when a single instance of application serves multiple groups of users, or "tenants." Each tenant shares common access to a application, hardware, and underlying resources (including data), but with specific and potentially unique privileges granted by the application based on their identification. This is in contrast with multi-instance architectures, where each user gets a unique instance of a application, and the A Pplication then competes for the behalf of its tenant. A typical example of a multitenant application architecture would be SaaS cloud computing, where multiple users and even M Ultiple companies are accessing the same instance of the application in the same time (for example, Salesforce CRM). A typical example of a multi-instance architecture would is applications running in virtualized or IaaS environments (for example, applications running in KVM virtual machines).


Note: In previous versions of Hadoop, resource management was part of the MapReduce, in which case a program would handle both task scheduling and job processing, starting with hadoop2.0, MapReduce simplified, and data processing running on the yarn framework.

Second, architecture

At the large level yarn consists of two nodes master and worker (Salve): master node
The ResourceManager component runs on a master node and manages resources globally to all YARN. Worker Node
The NodeManager component runs on each of the worker node in the cluster, and executes all tasks as directed by the Glob Al ResourceManager component.

The following figure is a slight diagram of the yarn architecture:
2.1 NodeManager

NodeManager is a daemon/service that runs on every node of the work, her function:

It manages the behalf of the requesting services (such as the ResourceManager and, Applicationmasters).
It tracks the health of the node and communicates its status with the ResourceManager.
2.2 Nodemanager:container

When resource Manager sends a Applicationmaster request (a request to start an application and run the work it needs), NodeManager starts allocating resources (CPU, memory).
2.3 Containter Definition

A container is a unit of work within a YARN application this is allocated specific CPU and memory resources by the Nodeman Ager on behalf of the ResourceManager. The container is the component that performs the work of the specific YARN. A container is launched the new Applicationmaster request are made by the ResourceManager. When a job was executed, the Applicationmaster requests additional resources from the ResourceManager (via the NodeManager On which it is running). If additional can be allotted, the ResourceManager can then request additional containers to run this task from Across the cluster.

The following figure is the function of container:
2.4 Nodemanager:applicationmaster features of 2.4.1 Applicationmaster:

2.4.2 Applicationmaster and Container

Once NodeManager produces a container, Applicationmaster is started by the resources in container.
2.4.3 Applicationmaster's job scheduing

Applicationmaster submits a request to the ResourceManager to query the cluster's capabilities, and then receives an authorization message regarding which resources are required or allowed. Applicationmastere will communicate with NodeManager, assign these resources to containers, and then configure these container to perform the task.
Note:
NodeManager Although you manage and monitor resource usage for container, you cannot see the Applied Work task (job task is not visible).
Applicationmaster Tracks and monitors application resource usage and processes, \ If a task fails, there will be applicationmaster corrections, but at the container level, it will be handled by NodeManager, Frequent communication between NodeManager and Applicationmaster in the cluster.

The following figure applicationmaster the job scheduing process:

constantly create container, Until the Applicationmaster resource runs out or all the tasks are allocated. A legend below
shows a three-node cluster, containers, applicationmaster, Job tasks:

In this example, Job1 applicat Ionmaster in NodeManager2, Job1 's first task job Task1 in NM1, Job Task2 in NM2, completed all the tasks required by JOB1. The same job2 are distributed on different NodeManager.
Most Important: Applicationmaster can create container on any available NodeManager in the cluster, the default action is to collect the tasks on the data block and put them on a more compute-capable node. Even without those data.
Note: Move calculation instead of data to a processing-capable node. 2.5 YARN ResourceManager (Master Node)

ResourceManager:

YARN has two kinds of scheduler:
Fairscheduler and Capacityscheduler

The ResourceManager component is comprised of a number of this perform three main duties:scheduling, Node Manage 

ment, and security. The YARN Scheduler is a single component this controls resource usage according to parameters set by the Hadoop Administra Tor. This is allows for greater efficiency by allowing different organizations to use a centrally pooled set of cluster (multitenancy) while in the same time controlling each tenant's access to those resources. This ensures which each organizations can is guaranteed the minimum required resources needed into order to meet its SLAs. At the same time, this is also allows organizations to access excess capacity not being used by the others, thus providing El Asticity and lower overall cost of deployment.

The scheduling mechanism used and specific settings are under the control of the Hadoop Administrator. Side Note:there are two YARN Scheduler Options–fairscheduler and Capacityscheduler. These would be discussed in the more DetaIl elsewhere. Node and Applicationmaster Management in the ResourceManager are accomplished via a number of services that perform a varie Ty of Tasks:monitor nodemanagers for Heartbeat (sent by NodeManager every second by default, expected within s) Submit Applicationmaster launch requests to appropriate nodemanagers Verify that resource container components were a Ctually launched on appropriate nodemanagers (within minutes) and attempts restart if required monitors applicationmas Ters running in containers for heartbeat (expects one every minutes) and attempts restart if required note:only Ationmaster is monitored.
 The JOB task monitoring is the responsibility of the applicationmaster itself. Maintain a list of submitted applicationmasters across the cluster and their current state the ResourceManager serves as A Web application proxy and Controls access to the resources via ACLs. It manages resource/application security via token-based systems which that verify That all container requests are valid.  The applicationmaster must pass a verified containertoken to the NodeManager that contains information about the "resources" That should is allocated to that container. This checking mechanism prohibits a rogue applicationmaster to allocating more resources than it has been allotted by th


 E ResourceManager.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.