MRV1 's old and new API compatibility analysis with MRV2, respectively

Last Update:2018-07-20 Source: Internet

Author: User

Tags new set versions

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

MRV1 's old and new API compatibility analysis with MRV2, respectively

1. Basic Concepts

MRV1 is a mapreduce implementation in Hadoop 1.X, which is implemented by the programming model (old and new programming interfaces), the runtime environment (consisting of Jobtracker and Tasktracker) and the Data processing engine (Maptask and Reducetask) are composed of three parts. The framework supports insufficient support such as extensibility, fault tolerance (Jobtracker single point), and multi-frame (only one computational model that supports MapReduce). For the internal implementation of MRV1, read my latest book, "Hadoop Technology Insider: An in-depth analysis of MapReduce architecture design and implementation principles."

MRv2 is a mapreduce implementation in Hadoop 2.X that reuses the MRV1 programming model and data processing engine implementations at the source level, but the runtime environment consists of YARN and Applicationmaster, Yarn is a new resource management system, and it is because of its universality, the core of MRv2 has shifted from a single mapreduce computing frame to a resource management system yarn, that is, yarn as the core of the unified framework management system, Applicationmaster is responsible for data segmentation, Task partitioning, resource application and task scheduling and fault tolerance for mapreduce operations (read my latest book " Hadoop Technology Insider: In-depth analysis of MapReduce architecture design and implementation principles of the 12th Chapter Understanding Applicationmaster Implementation. ）。

The mapreduce application Programming Interface (API) is consistent, whether in MRV1 or MRv2 . To enable smooth migration of user applications to Hadoop 2.0,MRv2 as much as possible to ensure backward compatibility of programming interfaces, but because MRV2 itself is improved and optimized, it has a few problems with backward compatibility, which is what this article describes.

2. Application Programming Interface Compatibility

MRV2 programming interface Compatibility Discussions related to the Hadoop Jira link are: https://issues.apache.org/jira/browse/MAPREDUCE-5108.

As we all know, the MapReduce application programming interface has two sets, namely the old API (mapred) and the new API (Mapredue), their performance is not different (because the kernel implementation is the same), the difference is only the programming interface definition, the new API itself has better encapsulation , and better in terms of compatibility and extensibility. about MapReduce old and new API introduction and comparison can read my latest book, "Hadoop Technology Insider: In-depth analysis of MapReduce architecture design and implementation principles" in the 3rd chapter "MapReduce programming Model".

yarn provides binary compatibility for various applications of the MapReduce-based legacy API. However, for the new API in hadoop1.x, only source-level compatibility is provided. Rohit explained that:

In Hadoop 2.0, each client submits a variety of mapreduce applications to the MapReduce V2 framework running on yarn. In Hadoop 1.0, each client submits the Maprecude application to the MapReduce v1 framework. Both of these new and old APIs refer to the Maprecude framework available to developers to create MapReduce applications.

The Org.apache.hadoop.mapred API is the oldest API and is most widely used in the creation of MapReduce applications. Any MapReduce v1 application developed using the Mapred API (that is, the old API) can be submitted to the MapReduce V2 framework running on yarn and run in that framework. In this case, you do not need to modify the MapReduce application.

The Org.apache.hadoop.mapreduce API is a relatively new set of APIs for the MapReduce framework. There is no binary compatibility for these APIs between MapReduce v2 and the MapReduce v2 running on yarn. Existing MAPREDUCEV1-based applications If these new APIs are used, they need to be recompiled using the hadoop2.x Hadoop package . once recompiled, the app can be submitted to the MapReduce V2 framework running on yarn and run in that framework.

In summary, applications written using the MRV1 Legacy API can run programs on MRV2 directly using the previous jar package, but applications written with the MRV1 new API will not, need to be recompiled using the MRV2 programming library and modify incompatible parameters and return values based on compilation errors , MRv2 's changes to the API are mainly focused on function parameters and return values , as follows:

Question by Wade: if we develop a MapReduce program directly on the hadoop2.x platform, we should choose to use the old API or use the new API??? Personal understanding by Wade: To develop the MapReduce program directly on the hadoop2.x platform, it is best to use the new API development, the previous one is only if you previously developed the project on the hadoop1.x platform, the project based on the old API is relatively easy to transfer to yarn, and based on the new API needs to recompile. developing the MapReduce program on the Hadoop yarn platform using MapReduce's new API-demo:http:// tech.it168.com/a2013/0912/1533/000001533118_3.shtml Analysis: Apache Version derivation

The Apache Hadoop version is divided into two generations, we call the first generation Hadoop 1.0, and the second generation Hadoop called Hadoop 2.0. first generation Hadoop consists of three large versions, 0.20.x,0.21.x and 0.22.x, of which0.20.x eventually evolved into 1.0.xand became stable, while 0.21.x and 0.22.x added Namenode New major features such as Ha (0.21/0.22 is the MapReduce implementation in the next generation of Hadoop, except that the resource management system is still used Jobtracker, no yarn is used, and many of the points are changed). The second generation of Hadoop consists of two versions, 0.23.x (released in 2011) and 2.x (i.e., hadoop2.x, which evolved directly from the hadoop0.23.x branch), which is completely different from Hadoop 1.0 and is a whole new set of architectures Includes both HDFs Federation and Yarn Systems, adding Namenode ha and wire-compatibility two significant features compared to 0.23.x,2.x. Since hadoop0.23.x, HDFs Federation and yarn have been included.

after research hadoop0.23.x found that there is no new API in the hadoop1.x. starting with the 0.20.x, it contains both the old and new sets of APIs. So MRv2 is more compatible with MRV1 's old API. The directory structure is exactly the same.

Hadoop versions of Evolution table:

This article, reproduced from Dong's blog

This article link address: http://dongxicheng.org/mapreduce-nextgen/mrv1-mrv2-api-compatibility/

also, refer to: http://www.infoq.com/cn/news/2013/11/hadoop-yarn-ga

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More