Introduction to cloudera impala

Source: Internet
Author: User

Cloudera Impala provides fast and interactive SQL queries directly on your HDFS or hbase. In addition to using the Unified Storage platform, Impala also uses the same
MetaStore, SQL syntax (hive SQL), ODBC driver and user interface hue beeswax (hive ). These provide a unified and common platform for batch processing and Real-Time query.

Cloudera impala is an effective tool for querying uncle data. Impala does not replace the batch processing framework built on mapreduce, such as hive. Hive and other frameworks built on mapreduce are suitable for batch processing tasks that require long running. For example, jobs of the batch extraction, conversion, and loading (ETL) type.

Architecture:

The following figure shows the position of Impala in the cloudera ecosystem:

The entire Impala solution consists of the following components:
Impala State store-this state store is used to coordinate impalad instances in all running environments-similar to namenode
Impalad-this process runs on datanodes for queries sent by the impala shell. Impalad accepts requests from the database connection layer and schedules and optimizes tasks. Impalad regularly updates its name and address to Impala State store. -- Similar to datanode
Impala shell-this tool is used to manage tasks and execute queries. For example, it connects to impalad and provides a set of standardized Query Interfaces Based on ODBC.

Impala performs the following query steps:
Submit hive SQL using hue beeswax, the impala shell, and ODBC
The Distributed Query Engine of Impala creates a query and assigns it to the cluster.
To achieve optimal performance, each node directly reads local HDFS and hbase

Impala features:
Impala provides the following support:
Supports SQL-92-based queries provided by most hive, including select, join and some statistical functions
The Supported file formats are text files and sequencefiles (which can be compressed to snappy, Gzip, and bzip. The former has the best performance. According to the official blog, other formats such as Avro, rcfile, lzo text and Doug cutting's trevni will be supported in the official version)
Supports common hive interfaces, such as ODBC driver and hue beeswax (User Interface)
Impala command line interface
Support for Kerberos Security Authentication

What impala can bring us:
Impala provides:
SQL APIs familiar to data analysts
Processing uncle data on hadoop in interactive mode

Data analysis is born to avoid the cost of modeling and ETL only for data analysis.

From: https://ccp.cloudera.com/display/IMPALA10BETADOC/Introducing+Cloudera+Impala

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.