Apache Hadoop yarn–concepts & Applications

Source: Internet
Author: User
Tags hadoop mapreduce

As previously described, YARN is essentially a system for managing distributed. It consists of a ResourceManager, which arbitrates all available cluster, and a Per-nodenodemanager, whi CH takes direction from the ResourceManager and are responsible for managing resources in a single node.

Resource Manager

In YARN, the ResourceManager is, primarily, a pure scheduler. In essence, it's strictly limited to arbitrating available resources in the system among the competing Applications–a MA  Rket maker if you are. It optimizes for cluster utilization (keep-all-in-use-all) against various-constraints-as such Guarantees, fairness, and SLAs. To allow for different policy constraints the ResourceManager has a pluggablescheduler that allows for different algorithm s such as capacity and fair scheduling to be used as necessary.

Applicationmaster

Many would draw parallels between YARN and the existing Hadoop MapReduce system (MR1 in Apache Hadoop 1.x). However, the key difference is the new concept of a applicationmaster.

The applicationmaster is, effect, an instance of a framework-specific Library and is Responsible for negotiating of the ResourceManager and working with the NodeManager (s) to execute and monitor The containers and their resource consumption. It has the responsibility of negotiating appropriate resource containers from the ResourceManager, tracking their status a nd monitoring progress.

The applicationmaster allows YARN to exhibit the following key Characteristics:Scale:The application Master provides Much of the functionality of the "traditional ResourceManager so" entire system can scale more dramatically. In Tests, we ' ve already successfully simulated 10,000 node clusters composed of modern hardware without significant issue. This is one of the key reasons which we have chosen to the ResourceManager as A pure scheduler i.e. It does ESN ' t attempt to provide fault-tolerance. We shifted that to become a primary responsibility of the Applicationmaster instance. Furthermore, since there is a instance of a applicationmaster per application, the applicationmaster itself isn ' t a comm On bottleneck in the cluster. Open:moving all application framework specific code into the Applicationmaster generalizes the "system so" we can now Support multiple frameworks such as MapReduce, MPI and Graph processing.

It's a good point to interject some of the key YARN design decisions:move all complexity (to the extent possible) to The applicationmaster while providing sufficient functionality to allow application-framework authors sufficient Flexibil ity and Power. Since It is essentially user-code, does not trust the Applicationmaster (s) i.e. no applicationmaster is not a privileged SE Rvice. The YARN system (ResourceManager and NodeManager) has to protect itself from faulty or malicious applicationmaster (s) and Granted to them in all costs.

It's useful to remember this, in reality, the every application has its own instance of a applicationmaster. However, it ' s completely feasible to implement a applicationmaster to manage a set of applications (e.g. Applicationmaste R for Pig or Hive to manage a set of MapReduce jobs). Furthermore, this concept has been stretched to manage long-running services which manage their own applications (e.g. Lau Nch HBase in YARN via a hypothetical hbaseappmaster).

Resource Model

YARN supports a very general resource model for applications. An application (via the Applicationmaster) can request highly specific requirements such (hostname, rackname–we are in the process of generalizing this further to support more complex network topologies with Y ARN-18). Memory (in MB) CPUs (cores, for now) in future, expect us to add more resource-types such as disk/network I/O, GPUs etc.

Resourcerequest and Container

YARN is designed to allow individual applications (via the Applicationmaster) to utilize cluster resources in a shared, SE Cure and multi-tenant manner. Also, it remains aware of cluster topology in order to efficiently schedule and optimize data access i.e. reduce data moti On for applications to the extent possible.

In order to meet those goals, the "Scheduler" (in the ResourceManager) has extensive information about a applicatio N ' s resource needs, which allows it to make better scheduling-decisions all across in the applications. This is leads us to the resourcerequest and the resulting Container.

Essentially an application can ask for specific resource requests via the Applicationmaster to satisfy it resource. The Scheduler responds to a resource request by granting a container, which satisfies the requirements of laid out by the Ap Plicationmaster in the initial resourcerequest.

Let ' s look at the Resourcerequest–it has the following form:

<resource-name, priority, Resource-requirement, number-of-containers>

Let's walk through each component of the resourcerequest to understand this better. Resource-name is either hostname, rackname or * to indicate no preference. In future, we expect to support even more complex topologies for virtual machines on a host, more complex networks etc. PR The iority is intra-application priority the for this request (to stress, this isn ' t across multiple). Resource-requirement is required capabilities such as memory, CPU etc. (at the time of writing YARN only supports memory a nd CPU). Number-of-containers is just a multiple of such containers.

Now, on to the Container.

Essentially, the Container is the resource allocation, which are the successful result of the ResourceManager granting a SP Ecific Resourcerequest. A Container grants rights to "a" specific amount of resources (memory, CPU etc.

The applicationmaster has to take the Container and present it to the NodeManager the host, on managing the which R is allocated, to the "resources" for launching its tasks. Of course, the Container allocation is verified, in the secure mode, to ensure that applicationmaster (s) cannot fake alloc Ations in the cluster.

Container specification during Container Launch

While a Container, as described above, are merely a right to use a specified amount of resources on a specific machine (Nod Emanager) in the cluster, the Applicationmaster has to provide considerably more information to the NodeManager Y launch the container.

YARN allows applications to launch no process and, unlike existing Hadoop MapReduce in hadoop-1.x (aka MR1), it isn ' t lim Ited to Java applications alone.

The YARN Container Launch specification API is platform agnostic and Contains:command line to launch the process within T He container. Environment variables. Local resources necessary in the machine prior to launch, such as Jars, shared-objects, auxiliary data files etc. Security-related tokens.

This allows the "applicationmaster to work" with the NodeManager to launch containers A from simple shell ranging to C /java/python processes on Unix/windows to full-fledged virtual machines (e.g. KVMs).

Yarn–walkthrough

The armed with the knowledge of the above concepts, it'll be useful to sketch how applications conceptually the in work.

Application execution consists of the following steps:application submission. Bootstrapping the Applicationmaster instance for the application. Application execution managed by the Applicationmaster instance.

Let's walk through A application execution sequence (steps are illustrated in the diagram): A client Program submit S the application, including the necessary specifications To launch the Application-specific Applicationmaster itself. The ResourceManager assumes the responsibility to negotiate a specified container into which to start the Applicationmaster and Then launches the Applicationmaster. The Applicationmaster, on Boot-up, registers with the Resourcemanager–the registration allows the client Progr AM to query the ResourceManager for details, which allow it to  directly communicate with its own applicationmaster. During normal operation the Applicationmaster negotiates appropriate resource containers via the Resource-request protocol . On successful container allocations, the Applicationmaster launches the container by providing the container launch Ication to the NodeManager. The launch specification, typically, includes the NECEssary information to allow the container to communicate with the Applicationmaster itself. The application code executing within the container then provides necessary information (progress, status etc.) to its APP Licationmaster via An application-specific protocol. During the application execution, the client that submitted the program communicates directly with the Applicationmaster T o Get status, progress updates etc. via a application-specific protocol. Once the application is complete and all necessary work has been finished, the Applicationmaster deregisters with the Res Ourcemanager and shuts down, allowing their own container to be repurposed.

In our next post in this series we dive to the guts of the YARN system, particularly the Resourcemanager–stay tuned!

ref:http://hortonworks.com/blog/apache-hadoop-yarn-concepts-and-applications/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.