Recently, the http://www.aliyun.com/zixun/aggregation/13537.html ">hadoop 2 GA release, with the help of Yarn,hadoop 2, can create data processing applications that work locally in Hadoop." By separating cluster resource management concerns from data processing, yarn enables Hadoop to be applied to data processing other than Map-reduce. So a lot of new projects are likely to come true. Projects like Stinger and Tez, for example, focus on getting the expected human interaction response time in some situations. Storm is dedicated to stream data processing. Spring has announced the Spring YARN Framework, and Java developers who want to write their own YARN applications can use it to achieve their goals. By looking for a balance between the storage and cluster management platforms of Hadoop, data processing applications now enable users to interact with data in a variety of ways. We talked to Hortonworks's product manager Rohit Bakhshi about yarn and what yarn meant for Hadoop users. Rohit shared his ability to talk about yarn. The simple view is that Hadoop has been moving forward and that more and more companies (not just web-scale companies) want to keep all incoming data in Hadoop. So their users can interact with the data in a variety of ways: batch, interactive, real-time data flow analysis, and so on. And more importantly, they want to be able to perform these interactions at the same time without a single application or query that takes up all the resources of the cluster during the interaction.
Using yarn to transform Apache Hadoop 2 into a multiple-application data system, the Hadoop community can handle the new generation of requirements that Hadoop faces. Yarn meets the actual requirements at the bottom, rather than dealing with these requirements with a commercial add-on------can make the user's environment more complex, yarn to meet the needs of these enterprises.
Looking ahead, companies will be able to deploy multi-tenant clusters that serve multiple goals that meet the requirements of SLAs for different organizations and application frameworks. Binary compatibility is provided for various applications by using mapred Api,yarn. But in Hadoop 1.x, only the MapReduce API is used to provide source-level compatibility. Rohit explains that in Hadoop 2.0, each client submits various MapReduce applications to the MapReduce V2 framework running on yarn. In Hadoop 1.0, each client submits a maprecude application to the MapReduce v1 framework.
Both types of APIs refer to the Maprecude framework available to developers to create MapReduce applications. The Org.apache.hadoop.mapred API is the earliest API that is most widely used in the creation of MapReduce applications. Any MapReduce v1 application developed using the Mapred API can be submitted to the MapReduce V2 framework running on yarn and run in the framework. In this case, there is no need to modify the MapReduce application.
The
Org.apache.hadoop.mapreduce API is the newer set of APIs for the MapReduce framework. These APIs do not provide binary compatibility between MapReduce V2 and MapReduce v2 running on yarn. Existing MapReduce v1 Applications that use these APIs will need to be recompiled using the hadoop2.x Hadoop package. Once recompiled, the application can be submitted to the MapReduce V2 framework running on yarn and run in the framework. Readers can learn more about this. The process of upgrading an existing Hadoop cluster is also straightforward and convenient for Hadoop and HDP (including all associated Apache Hadoop components) to support "in-place" upgrades, which can be upgraded from HDP 1.3 (Hadoop 1.x) to HDP 2.0 (hadoop2.x). All existing data is maintained, and metadata is upgraded in-place without migrating. The configuration has been upgraded from HDP 1.3 to HDP 2.0, discarding some of the configuration properties from the previous configuration and adding some new configuration properties. So the existing HDP 1.3 configuration needs to be migrated to HDP 2.0. When we asked him if he would worry about companies that used Hadoop prematurely on smaller datasets, Rohit replied that he had a different view that we use Hadoop in a variety of ways, and that because it is open source, we can see a variety of uses. I don't think these usages are "premature"; in fact, many organizations start using Hadoop from a small cluster, which has only a few nodes and a few T-data, but eventually these environments grow until a data lake is formed and a medium data architecture is provided. Small clusters are not "premature"---they are seeds.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.