Basic terminology Interpretation in Hadoop 2.0

Source: Internet
Author: User
Keywords Hadoop hdfs

(1) Hadoop 1.0

The first generation of Hadoop, composed of distributed storage System HDFS and distributed computing Framework MapReduce, HDFS consists of a namenode and multiple datanode, The MapReduce consists of a jobtracker and multiple tasktracker, and the Hadoop version is Hadoop 1.x and 0.21.x,0.22.x.

(2) Hadoop 2.0

The second generation of Hadoop is proposed to overcome various problems with the HDFs and MapReduce in Hadoop 1.0. Aiming at the extensibility problem of the single Namenode restriction HDFs in Hadoop 1.0, the HDFs Federation is proposed, which enables multiple namenode to separate directories to achieve access isolation and lateral expansion. For Hadoop The MapReduce of the 1.0 in terms of extensibility and multiple framework support is a new resource management framework yarn (verb Another Resource negotiator), which separates resource management and job control functions in Jobtracker. Implemented by component ResourceManager and Applicationmaster, where ResourceManager is responsible for resource allocations for all applications, and Applicationmaster is only responsible for managing one application. The Hadoop version corresponds to Hadoop 0.23.x and 2.x.

(3) MapReduce 1.0 or MRV1 (MapReduce version 1)

The first generation MapReduce computing framework, which consists of two parts: the programming model (programming models) and the Run-time environment (runtime Environnement). Its basic programming model is to abstract the problem into a map and reduce two phases, where the map phase parses the input data into Key/value, the iteration calls the map () function, and then outputs it to the local directory in key/value form. The reduce phase then handles the same value as the key and writes the final result to the HDFs. Its run-time environment consists of two types of services: Jobtracker and Tasktracker, where jobtracker is responsible for resource management and control of all jobs, and Tasktracker is responsible for receiving commands from Jobtracker and executing it.

(4) MapReduce 2.0 or MRv2 (MapReduce version 2) or NextGen Mapreduc

MapReduce 2.0 or MRV2 has the same programming model as MRV1, the only difference being the run-time environment. MRv2 is a MRv1 that runs on the yarn of the resource management framework after processing on MRV1 basis, it is no longer composed of Jobtracker and Tasktracker, but becomes a job control process Applicationmaster, And Applicationmaster is only responsible for the management of a job, as for the management of resources, it is completed by yarn.

In short, MRV1 is an independent off-line computing framework, while MRV2 is a MRv1 that runs on top of yarn.

(5) Yarne

The resource management framework in Hadoop 2.0, a framework manager that allocates resources and provides a run-time environment for various frameworks. MRV2 is the first computational framework to run on yarn, and other computational frameworks, such as spark and Storm, are being ported to yarn. Yarn is similar to the resource management system Mesos and earlier torque a few years ago.

(6) HDFS Federation

The HDFS has been improved in Hadoop 2.0 so that Namenode can be scaled horizontally, with each namenode part of the directory, which not only enhances the HDFs extensibility, but also makes the HDFs more isolated.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.