Yarn gives Hadoop a new ability

Last Update:2014-12-22 Source: Internet

Author: User

Keywords These new can run

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Recently, the http://www.aliyun.com/zixun/aggregation/13537.html ">hadoop 2 GA release, with the help of Yarn,hadoop 2, can create data processing applications that work locally in Hadoop." By separating cluster resource management concerns from data processing, yarn enables Hadoop to be applied to data processing other than Map-reduce. So a lot of new projects are likely to come true. Projects like Stinger and Tez, for example, focus on getting the expected human interaction response time in some situations. Storm is dedicated to stream data processing. Spring has announced the Spring YARN Framework, and Java developers who want to write their own YARN applications can use it to achieve their goals. By looking for a balance between the storage and cluster management platforms of Hadoop, data processing applications now enable users to interact with data in a variety of ways. We talked to Hortonworks's product manager Rohit Bakhshi about yarn and what yarn meant for Hadoop users. Rohit shared his ability to talk about yarn. The simple view is that Hadoop has been moving forward and that more and more companies (not just web-scale companies) want to keep all incoming data in Hadoop. So their users can interact with the data in a variety of ways: batch, interactive, real-time data flow analysis, and so on. And more importantly, they want to be able to perform these interactions at the same time without a single application or query that takes up all the resources of the cluster during the interaction.

Using yarn to transform Apache Hadoop 2 into a multiple-application data system, the Hadoop community can handle the new generation of requirements that Hadoop faces. Yarn meets the actual requirements at the bottom, rather than dealing with these requirements with a commercial add-on------can make the user's environment more complex, yarn to meet the needs of these enterprises.

Looking ahead, companies will be able to deploy multi-tenant clusters that serve multiple goals that meet the requirements of SLAs for different organizations and application frameworks. Binary compatibility is provided for various applications by using mapred Api,yarn. But in Hadoop 1.x, only the MapReduce API is used to provide source-level compatibility. Rohit explains that in Hadoop 2.0, each client submits various MapReduce applications to the MapReduce V2 framework running on yarn. In Hadoop 1.0, each client submits a maprecude application to the MapReduce v1 framework.

Both types of APIs refer to the Maprecude framework available to developers to create MapReduce applications. The Org.apache.hadoop.mapred API is the earliest API that is most widely used in the creation of MapReduce applications. Any MapReduce v1 application developed using the Mapred API can be submitted to the MapReduce V2 framework running on yarn and run in the framework. In this case, there is no need to modify the MapReduce application.

The

Org.apache.hadoop.mapreduce API is the newer set of APIs for the MapReduce framework. These APIs do not provide binary compatibility between MapReduce V2 and MapReduce v2 running on yarn. Existing MapReduce v1 Applications that use these APIs will need to be recompiled using the hadoop2.x Hadoop package. Once recompiled, the application can be submitted to the MapReduce V2 framework running on yarn and run in the framework. Readers can learn more about this. The process of upgrading an existing Hadoop cluster is also straightforward and convenient for Hadoop and HDP (including all associated Apache Hadoop components) to support "in-place" upgrades, which can be upgraded from HDP 1.3 (Hadoop 1.x) to HDP 2.0 (hadoop2.x). All existing data is maintained, and metadata is upgraded in-place without migrating. The configuration has been upgraded from HDP 1.3 to HDP 2.0, discarding some of the configuration properties from the previous configuration and adding some new configuration properties. So the existing HDP 1.3 configuration needs to be migrated to HDP 2.0. When we asked him if he would worry about companies that used Hadoop prematurely on smaller datasets, Rohit replied that he had a different view that we use Hadoop in a variety of ways, and that because it is open source, we can see a variety of uses. I don't think these usages are "premature"; in fact, many organizations start using Hadoop from a small cluster, which has only a few nodes and a few T-data, but eventually these environments grow until a data lake is formed and a medium data architecture is provided. Small clusters are not "premature"---they are seeds.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More