Addressing extensibility bottlenecks Yahoo plans to restructure Hadoop-mapreduce

Source: Internet
Author: User
Tags failover zookeeper hadoop mapreduce

Http://cloud.csdn.net/a/20110224/292508.html

The Yahoo! Developer Blog recently sent an article about the Hadoop refactoring program. Because they found that when the cluster reaches 4000 machines, Hadoop suffers from an extensibility bottleneck and is now ready to start refactoring Hadoop.

the bottleneck faced by MapReduce

The trend observed from cluster size and workload is that MapReduce's jobtracker needs to be overhauled to address its scalability, memory consumption, threading model, reliability, and performance of several flaws. MapReduce has found a growing cost in the ongoing repair of the framework over the past 5 years. The tight coupling of various Hadoop modules now makes it difficult to continue to improve on the basis of existing designs. This has already reached consensus within the community, so they are ready to start refactoring Hadoop. However, from an operational point of view, any minor or fixed bug that makes a huge change can force a system-wide upgrade to Hadoop MapReduce.

Next Generation MapReduce ideation

According to the blog article, the main idea of the new architecture is to split the original Jobtracker function into one: ResourceManager Management resource allocation, Applicationmaster Management task monitoring and scheduling. ResourceManager is similar to the original Jobtracker, as the control center of the whole cluster, while Applicationmaster is a separate instance for each application, and application is a set of tasks submitted by the user, It can consist of one or more jobs. Each slave runs a NodeManager instance that functions like the original Tasktracker.

1. Hierarchical Management

At present, the resource management and task scheduling of Hadoop is done in Jobtracker, which needs to replicate the resource allocation and dispatch of all the tasks. Task is a very microscopic dispatch unit, usually each job will produce hundreds of tasks, and the system at the same time there will be a large number of jobs at the same time, which makes Jobtracker management burden becomes very heavy. The new architecture decentralized this management task to individual Applicationmaster,resourcemanager only manage resource allocations for each application. This can be controlled at a reasonable level even if there are many application,resourcemanager in the system, which is also the biggest advantage of the new architecture.

2.ApplicationMaster should be running on master or slave.

The new architecture actually shifts the management and scheduling tasks to Applicationmaster, and if the node where the Applicationmaster resides is hung up, the entire task needs to be redo. The original Jobtracker can run on the relatively stable master, the error probability is low; now applicationmaster run on a lot of slave, the probability of error is very high. Moreover, the new architecture breaks the original simple master-slave model, and the communication and dependency between nodes becomes more complex, which increases the difficulty of network optimization. If you put applicationmaster all on master, the burden on Master is very heavy (need to handle various persistent heartbeat and explosive RPC requests such as gettaskcompletionevents), But this problem can be solved by a distributed master (Google has already implemented).

3. Resource management methods

Originally, simply using a simple static slot as a resource unit does not describe the resource status of the cluster well. The new architecture will control the CPU, memory, disk, and network resources with finer granularity. Each task is executed in container and can only use the system resources to which it is assigned. The allocation of resources can be realized by dynamic adjustment of static estimation.

4. Support for other programming models

Because the task is managed and dispatched by Applicationmaster, and the Applicationmaster is relatively independent of the other modules of the system, users can even deploy their own applicationmaster to support other programming models. This makes other applications that are less suitable for mapreduce to run in the same Hadoop cluster.

Scalability Implementation

Scalability is very important to the current hardware development trend, currently MAPREDUCE cluster has 4000 hosts. However, the 2009-year cluster of 4000 hosts (8 cores, 16GB of memory, 4TB storage) Only 2011 years in the cluster of 4000 hosts (16 cores, 48GB memory, 24TB storage) Half of the processing power. In addition, taking into account operational costs, forcing the cluster to run 6000 hosts may be more likely in the future.

implementation of Availability

Resourcemanager--resourcemanager uses Apache's zookeeper to implement failover. When ResourceManager fails, the cluster status can be quickly restored through Apache zookeeper. After a failover, all the queues running applications are restarted.

Applicationmaster--mapreduce's NextGen supports specific checks for applicationmaster applications. The applicationmaster of MapReduce can be recovered from the failure and restored to the state of the HDFs save by itself.

Compatibility Implementation

MapReduce's NextGen uses the Wire-Compatibility Protocol (wire-compatible protocols) to allow different versions of servers and clients to exchange information. In future releases, this feature will remain in place to ensure that the cluster is still compatible after the upgrade.

Cluster Implementation

MapReduce NextGen resource uses regular concept scheduling and allocates resources to individual applications. Each machine in the cluster is conceptually composed of resources, such as memory, I/O bandwidth, and so on.

support for other programming models

The MapReduce NextGen provides a completely generic computational framework to support MapReduce and other paradigms.

The architecture ultimately allows users to implement custom Applicationmaster, which can require ResourceManager resources to exploit them. Therefore, it supports a variety of programming, such as Mapreduce,mpi,master-worker and iterative models on Hadoop. and allow the appropriate framework to be used for each application. This runs MapReduce outside the custom framework for applications such as K-means, Page-rank.

Conclusion

Apache's Hadoop, especially Hadoop's MapReduce, is a very successful open source processing project for big data sets. We recommend that the MapReduce for Hadoop improve usability, improve cluster usage, and provide a paradigm programming architecture and

To achieve rapid development. Yahoo will work with the Apache Foundation to increase Hadoop's ability to deal with big data to a new level.

RELATED Links: http://developer.yahoo.com/blogs/hadoop/posts/2011/02/maprmapre-nextgen/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.