Hortonworks push new HDP boost Hadoop performance

Source: Internet
Author: User
Keywords To push new as to run
As a new generation of scenarios based on the Apache Hadoop yarn Architecture, HDP 2.0 (hdp,hortonworks data Platform,hortonworks) The advent of Hadoop evolved from a single purpose web-scale batch data processing platform into a multi-purpose operating system. Today, it can handle a variety of task types, such as bulk, interaction, online, and data flow.


case analysis of running SQL on Hadoop. For years, business analysts have used SQL as a query language to make temporary inquiries about the Data warehouse. If you use Hadoop to create a set of data pools, you can use SQL to query the data.


"But because SQL access is tied to Hadoop, it means that Hadoop is just a single application system, and the challenge arises," Hortonworks founder and architect, the former architect of the Yahoo Hadoop map-reduce development Team Arun Murthy wrote. "When I run a SQL query in my data, it consumes all of the cluster resources and leads to performance problems with other applications and jobs in the cluster-which is definitely not good news." ”


The key answer to this conundrum is yarn, the ' Other resource protocol mechanism ', which is the foundation of the recently released Hadpop 2. The Apache Hadoop yarn, as the Hadoop operating system, can replace the original single purpose batch processing data platform and evolve it into a multipurpose platform to achieve batch, interactive, online and data flow processing tasks.


yarn, as the primary resource manager and storage data access mediation mechanism in HDFs (the Hadoop Distributed File System), can provide enterprise users with multiple capabilities, including the ability to save data in a single location and interact in multiple ways-and always maintain service consistency.


as a provider of one of the most popular Hadoop distributions, HDP (the Hortonworks data platform), Hortonworks first responded to yarn by publishing its HDP 2.0 generic version.


HDP 2.0 is the first commercially available release based on Hadoop 2, providing users with a yarn architecture and a host of new features from the "Stinger" initiative phase two. The Stinger initiative is based on a technology community designed to provide the speed, scale, and breadth of the SQL semantics supported by Apache hive.


"HDP 2.0 is based on the yarn architecture, which enables us to achieve our development goals by delivering a modern data architecture that is enterprise-class Hadoop that is relative to existing and future data center technologies," Hortonworks vice president of corporate strategy Shaun Connolly points out.


"In our benchmark testing of our existing customer base, the classic MapReduce mission will only shift from 1.0 product lines to 2.0 product lines," Connolly added. "Everyone will get double the performance and thus get the ability to run the double number of tasks." The surplus in the cluster will also become more abundant. ”


at the same time, the Hive 0.12 (the main content of the second phase of the Stinger Initiative) will also compare the query activity with "human interaction response time rather than batch response time", thereby significantly improving query performance.


Connolly points out that the past 1400-second query activity has now been able to respond within 10 seconds. In phase III (corresponding to the 3721.html ">2014 first quarter"), we look forward to further improving response time by allowing in-memory temporary processing.


HDP 2.0 is currently open for download. Connolly says the Windows version of HDP 2.0 will meet with customers next month.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.