Cloudera Impala provides fast and interactive SQL queries directly on your HDFS or hbase. In addition to using the Unified Storage platform, Impala also uses the same
MetaStore, SQL syntax (hive SQL), ODBC driver and user interface hue beeswax (hive ). These provide a unified and common platform for batch processing and Real-Time query.
Cloudera impala is an effective tool for querying uncle data. Impala does not replace the batch processing framework built on mapreduce, such as hive. Hive and other frameworks built on mapreduce are suitable for batch processing tasks that require long running. For example, jobs of the batch extraction, conversion, and loading (ETL) type.
Architecture:
The following figure shows the position of Impala in the cloudera ecosystem:
The entire Impala solution consists of the following components:
Impala State store-this state store is used to coordinate impalad instances in all running environments-similar to namenode
Impalad-this process runs on datanodes for queries sent by the impala shell. Impalad accepts requests from the database connection layer and schedules and optimizes tasks. Impalad regularly updates its name and address to Impala State store. -- Similar to datanode
Impala shell-this tool is used to manage tasks and execute queries. For example, it connects to impalad and provides a set of standardized Query Interfaces Based on ODBC.
Impala performs the following query steps:
Submit hive SQL using hue beeswax, the impala shell, and ODBC
The Distributed Query Engine of Impala creates a query and assigns it to the cluster.
To achieve optimal performance, each node directly reads local HDFS and hbase
Impala features:
Impala provides the following support:
Supports SQL-92-based queries provided by most hive, including select, join and some statistical functions
The Supported file formats are text files and sequencefiles (which can be compressed to snappy, Gzip, and bzip. The former has the best performance. According to the official blog, other formats such as Avro, rcfile, lzo text and Doug cutting's trevni will be supported in the official version)
Supports common hive interfaces, such as ODBC driver and hue beeswax (User Interface)
Impala command line interface
Support for Kerberos Security Authentication
What impala can bring us:
Impala provides:
SQL APIs familiar to data analysts
Processing uncle data on hadoop in interactive mode
Data analysis is born to avoid the cost of modeling and ETL only for data analysis.
From: https://ccp.cloudera.com/display/IMPALA10BETADOC/Introducing+Cloudera+Impala