Comparative analysis of Impala and Hive

Source: Internet
Author: User
Tags odbc

1. Impala Architecture

Impala is Cloudera in Google's Dremel inspired by the development of real-time interactive SQL large data query tool, Impala no longer use slow hive+mapreduce batch processing, Instead, by using a distributed query engine similar to the commercial parallel relational database (composed of Query planner, query Coordinator, and query Exec engine), you can directly select from HDFs or HBase, Join and statistic functions query data, which greatly reduces latency. The architecture is shown in Figure 1, and the Impala consists primarily of impalad, state store, and CLI.

Figure 1

Impalad: Runs on the same node as Datanode, represented by the Impalad process, which receives a query request from the client (the Impalad that receives the query request is coordinator. Coordinator through JNI call Java front-end interpretation of SQL query statements, generate a query plan tree, and then through the scheduler to distribute the execution plan to the other impalad with corresponding data execution), read and write data, execute the query in parallel, And the result is transmitted back to coordinator by the network streaming, and returned to the client by coordinator. Impalad also maintains a connection with the state store to determine which Impalad is healthy and can accept new work. Launch three thriftserver:beeswax_server (connect client) in Impalad, hs2_server (Borrow hive metadata), Be_server (Impalad internal use) and a impalaserver service.

Impala State STORE: Tracking the health status and location information of Impalad in a cluster, represented by the statestored process, by creating multiple threads to handle Impalad's registered subscriptions and maintaining heartbeat connections with each Impalad, Each Impalad caches a message in the state store, and when the state store is offline (Impalad discovers that the state store is offline, it goes into recovery mode, registers repeatedly, and when the state store rejoin the cluster, Automatically return to normal, update cached data) because the Impalad has the state store's cache still working, but because some impalad is invalidated, the cached data cannot be updated, causing the execution plan to be assigned to the failed Impalad, causing the query to fail.

CLI: A command-line tool that is provided to user queries (Impala shell uses Python implementations), while Impala also provides HUE,JDBC, ODBC uses interfaces.

2. Relationship with Hive

Impala and Hive are all the data query tools built on Hadoop with different emphasis on adaptation, but from the perspective of client use Impala and hive have a lot in common, such as datasheet metadata, ODBC/JDBC driver, SQL syntax, flexible file format, Storage resource pools, and so on. The relationship between Impala and Hive in Hadoop is shown in Figure 2. Hive is suitable for long time batch processing query analysis, and Impala is suitable for real-time interactive SQL query, Impala provides data analyst with quick experiment, validation idea of large data analysis tool. You can use hive for data conversion processing, and then use Impala to perform fast data analysis on the result dataset after hive processing.

Figure 2

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.