Recently, the Yahoo Developer blog has sent an article introducing the Hadoop refactoring program (now this page has been deleted, the original URL: http://developer.yahoo.com/blogs/hadoop/posts/2011/02/maprmapre-nextgen/). Because they found that Hadoop encountered an extensibility bottleneck as the cluster reached 4000 machines, they were now ready to start refactoring Hadoop.
The bottleneck facing MapReduce
The trend observed in cluster size and workload is that MapReduce's jobtracker needs to be overhauled to address its scalability, memory consumption, threading model, reliability, and performance flaws. MapReduce has found that costs are increasing during the past 5 years of continuous repair. The tight coupling of the various modules of Hadoop now makes it difficult to improve on the basis of existing design. This has already been agreed in the community, so they are ready to start refactoring the Hadoop. However, from an operational point of view, any minor or fixed bug changes can make the Hadoop mapreduce enforce a system-wide upgrade.
The concept of Next generation MapReduce
According to the blog article, the main idea of the new architecture is to split the original Jobtracker functions into one: ResourceManager Management resource allocation, Applicationmaster Management task monitoring and scheduling. ResourceManager is similar to the original Jobtracker, as the control center of the whole cluster, and Applicationmaster is a separate instance for each creator, creator is a set of tasks submitted by the user, It can consist of one or more jobs. Each slave runs a NodeManager instance that functions like the original Tasktracker.
1. Hierarchical Management
Currently, the resource management and task scheduling of Hadoop is done in Jobtracker, and it needs to replicate resource allocation and scheduling for all tasks. Task is a very microscopic dispatch unit, usually each job will produce hundreds of tasks, and the system at the same time there will be a large number of jobs at the same time, which makes Jobtracker management burden becomes very heavy. The new architecture will devolve this management task to each Applicationmaster,resourcemanager to manage only the resource allocations for each creator. So even if there are a lot of application,resourcemanager in the system can be controlled at a reasonable level; This is also the biggest advantage of the new architecture.
Should 2.ApplicationMaster run on master or slave?
The new architecture actually shifts the management and scheduling tasks to Applicationmaster, and the entire task needs to be repeated if the node where the Applicationmaster resides is hung up. Originally Jobtracker can run on the relatively stable master, the error probability is low; now applicationmaster run on some slave, the probability of error is very high. Moreover, the new architecture breaks the original simple master-slave model, and the communication and dependency between nodes become more complicated, which increases the difficulty of network optimization. If you put applicationmaster all on master, then Master will have a very heavy burden (need to deal with a variety of persistent heartbeat and explosive RPC requests such as gettaskcompletionevents), However, this problem can be solved through distributed master (Google has already implemented).
3. Resource management approach
The original simple static slot as a resource unit is really not a good description of the resource situation of the cluster. The new architecture will more granular control of CPU, memory, disk, network resources. Each task is executed in container and can only use the system resources to which it is assigned. The allocation of resources can be realized by dynamic adjustment of static estimation.
4. Support other programming models
Because the management and scheduling of tasks are carried out by Applicationmaster, Applicationmaster are relatively independent of other modules of the system, and users can even deploy their own applicationmaster to support other programming models. This allows other applications that are not suitable for MapReduce implementations to run in the same Hadoop cluster.
Scalability Implementation
Scalability is important for current hardware trends, and currently has 4000 hosts in the MapReduce cluster. However, the 2009-year cluster of 4000 hosts (8 cores, 16GB of RAM, 4TB storage) has only half the processing power of 4000 hosts (16 cores, 48GB memory, 24TB storage) in the cluster for 2011 years. In addition, considering the operating costs, forcing the cluster to run 6000 hosts, there may be more.
Usability implementation
Resourcemanager--resourcemanager uses Apache's zookeeper to implement failover. When ResourceManager fails, the cluster state can be quickly restored through the Apache zookeeper. After a failover, all applications running in the queue are restarted.
Applicationmaster--mapreduce NextGen supports specific checks for applicationmaster applications. MapReduce Applicationmaster can recover from failure and recover to HDFs saved state by itself.
Compatibility implementation
The MapReduce NextGen uses the wire-Compatibility Protocol (wire-compatible protocols) to allow different versions of servers and clients to exchange information. In future releases, this feature will remain in place to ensure that the cluster is still compatible after the upgrade.
Cluster implementation
MapReduce NextGen resource uses regular concept scheduling and assigns resources to individual applications. Each machine in a cluster is conceptually composed of resources, such as memory, I/O bandwidth, and so on.
Support for other programming models
MapReduce's NextGen provides a fully generic computing framework to support MapReduce and other paradigms.
The architecture ultimately allows users to implement custom Applicationmaster, which can require ResourceManager resources to exploit them. Therefore, it supports multiple programming, such as Mapreduce,mpi,master-worker and iterative models on Hadoop. and allows the appropriate framework to be used for each application. This runs MapReduce outside the custom framework for applications such as K, Page-rank.
Conclusion
Apache Hadoop, especially the mapreduce of Hadoop, is a very successful Open-source project that handles large datasets. We recommend that Hadoop MapReduce improve usability, improve cluster usage, and provide a paradigm of programming architecture as well as
To achieve rapid development. Yahoo will work with the Apache Foundation to raise the ability of Hadoop to handle large data to a new level.