Verb Another Resource Negotiator Introduction

Source: Internet
Author: User
Keywords Hadoop

With this check, a more general Hadoop framework is available that supports not only MapReduce but also other distributed processing models. This article describes the new Hadoop architecture and identifies the information you need to know before you switch to the schema.

Http://www.aliyun.com/zixun/aggregation/14417.html with MapReduce ">apache Hadoop is the backbone of distributed data processing. With its unique horizontal expansion of the physical cluster architecture and the fine processing framework originally developed by Google, Hadoop has exploded in the new field of large data processing. Hadoop also developed a rich variety of application ecosystems, including Apache Pig (a powerful scripting language) and Apache Hive (a data warehouse solution with a similar SQL interface).

Unfortunately, the ecosystem is built on a programming model that does not solve all the problems in large data. MapReduce provides a specific programming model that, although simplified through tools such as Pig and Hive, is not a panacea for large data. Let's first introduce MapReduce 2.0 (MRv2)-or verb Another Resource negotiator (YARN)-and quickly review the Hadoop architecture before YARN.

A brief introduction to Hadoop and MRV1

The Hadoop cluster can be extended from a single node (where all Hadoop entities run on the same node) to thousands of nodes (where the functionality is dispersed between nodes to increase parallel processing activity). Figure 1 illustrates an advanced component of a Hadoop cluster.

Figure 1. A simple demo of the Hadoop cluster architecture

A Hadoop cluster can be decomposed into two abstract entities: the MapReduce engine and the Distributed File system. The MapReduce engine is able to execute the MAP and Reduce tasks across the cluster and report the results, where the Distributed file system provides a storage mode that can replicate data across nodes for processing. The Hadoop Distributed File System (HDFS) is defined to support large files (where each file is typically a multiple of MB).

When a client makes a request to a Hadoop cluster, the request is managed by Jobtracker. Jobtracker works with Namenode to distribute the work as close to the data it handles as possible. Namenode is the primary system of the file system, which provides metadata services to perform data distribution and replication. Jobtracker the Map and Reduce tasks into available slots on one or more tasktracker. Tasktracker performs Map and Reduce tasks with DataNode (Distributed File System) on data from DataNode. When the Map and Reduce tasks are complete, Tasktracker tells Jobtracker that the latter determines when all tasks are completed and eventually tells the customer that the job is complete.

As you can see in Figure 1, MRV1 implements a relatively simple cluster Manager to perform MapReduce processing. MRV1 provides a tiered cluster management model in which large data jobs infiltrate a cluster in the form of a single Map and Reduce task and are eventually aggregated into jobs to report to the user. But this simplicity has some secrets, but it's not a very secret question.

MRV1 defects

The first version of MapReduce has both advantages and disadvantages. MRV1 is the standard large data processing system currently in use. However, this architecture is inadequate, mainly in large clusters. When the cluster contains more than 4,000 nodes (where each node may be multi-core), it is unpredictable. One of the biggest problems is cascading failures, and because of the attempt to replicate data and overloaded nodes, a failure can lead to a severe deterioration of the entire cluster through a network flooding pattern.

But the biggest problem with MRV1 is multi-tenant. As cluster size increases, a desirable approach is to use a variety of models for these clusters. MRV1 nodes are dedicated to Hadoop, so you can change their use for other applications and workloads. This capability can also be enhanced when large data and Hadoop become a more important usage model in cloud deployments, because it allows the physical use of Hadoop on the server without virtualization without adding management, calculation, and input/output overhead.

Let's look at the new architecture of YARN and see how it supports MRV2 and other applications that use different processing models.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.