On average 24 times times faster than Hive, Impala Sword refers to Stinger

Source: Internet
Author: User
Keywords contrast nbsp;
Tags available for based cloudera data demand developed framework hadoop

Before yarn, Hadoop was only available for offline processing scenarios. Based on real-time demand, organizations have developed their own streaming framework, this time we are talking about two sql-on-hadoop projects, as well as two well-known Hadoop solution Providers--impala vs. Stinger.

Singer:stinger first appeared in Hive 0.11 (HDP 1.3), with a total of 3 phase goals, of which phase I and II had been delivered. Through Hortonworks's introduction, the first phase delivers 35-45 times the speed of all types of analysis, and the second phase delivers an additional 5-10-fold increase in performance.

Impala:impala released at the end of 2012, Google Dremel's open source implementation, developed by a renowned Hadoop solution provider Cloudera, is one of the most popular streaming frameworks of the moment. Cloudera's intention to develop the Tibetan antelope is clear--to improve the speed of hive SQL queries, In the 1.0 beta release is more claimed to be 3-90 times faster than Hive, and after the Impala official release, Cloudera said its concurrent execution of client processing speed even beyond the single machine hive.

Mesos, yarn, and other cluster resource management tools have led to direct competition between Stinger and Impala, and Cloudera's benchmark based on Tpc-ds.

Impala vs. Stinger

The test contrast version is Impala 1.1.1 and Hive 0.12 (integrated stinger), hive runs on the Orcfile dataset, Impala uses Parquet to store the same data. In order for hive to get the best performance, Cloudera also converted the Tpc-ds query into a SQL-92 join, optimized the join order manually, specified the partitioning field, and Impala did the same optimization.

The data size is 3TB, using a typical 5 Hadoop data node configuration. The query also uses a variety of types, includes a variety of standard joins and aggregations, and uses complex multi-level aggregations and subqueries.

The result of the test is that the Impala is 6-69 times faster than the hive, and the types include the following:

Written in the last

Here, you might have a question that benchmark tests that are 10 times times faster or even more than hive are seen everywhere, even between these tools, such as the following two:

HAWQ contrast hive and Impala (see article for more details)

Shark contrast hive and Impala (see blog for more details)

So what does this contrast mean? In fact, these should be due to the yarn after the launch of the opportunities and challenges: opportunities, the new resource manager so that different types of processing framework can run on the same Hadoop cluster, in this golden boom of the ecological circle, each more than a share of the benefits of self-evident; Yarn's new features allow more natural integration tools to improve performance, such as Stinger, so at the integration disadvantage they have to jump out to yell a few words, also appeared this and hive, in fact, is compared with Stinger performance. This shows that although the 2.0 version of the Hadoop biosphere has become more prosperous, but the pressure is self-evident.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.