Introduction of Big Data Offline Analysis Tool Hive
Source: Internet
Author: User
Keywordsbig data hive hive Introduction
Hive was developed by Facebook to solve the analysis of massive log data. Later, the open source was given to the Apache Software Foundation. It can be seen that the Apache Software Foundation is a magical organization. Many of the open source tools we have learned before have the figure of the Apache Software Foundation.
Official website definition:
The Apache Hive ™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL.
The version here is Hive-1.0.0
Several features of Hive
The biggest feature of Hive is to analyze big data through SQL-like, and avoid writing MapReduce programs to analyze data, which makes it easier to analyze data.
Data is stored on HDFS, Hive itself does not provide data storage function
Hive maps data into a database and tables, and the metadata information of the libraries and tables generally exists in a relational database (such as MySQL).
Data storage: It can store large data sets, and is not strict with data integrity and format.
Data processing: Because the Hive statement will eventually generate a MapReduce task to calculate, it is not suitable for real-time computing scenarios, it is suitable for offline analysis.
The core of
Hive The core of Hive is the driving engine, which consists of four parts:
Interpreter: The role of the interpreter is to convert HiveSQL statements into a syntax tree (AST).
Compiler: The compiler compiles the syntax tree into a logical execution plan.
Optimizer: The optimizer optimizes the logic execution plan.
Actuator: The executor is to call the underlying running framework to execute the logic execution plan.
Hive's underlying storage
Hive data is stored on HDFS, and the libraries and tables in Hive can be seen as a mapping to the data on HDFS. So HVIE must be running on a Hadoop cluster
Hive statement execution process
The executor in Hive is to put the final MapReduce program to be executed on YARN and execute it in a series of jobs.
Hive's metadata storage
Hive metadata is generally stored in a relational database such as MySQL, and Hive and MySQL interact through the MetaStore service.
Hive client
Hive has many kinds of clients.
cli command line client: Use the interactive window to communicate with Hive using the hive command line.
HiveServer2 client: use Thrift protocol to communicate, Thrift is a converter between different languages, is a protocol to connect programs in different languages, and access Hive through JDBC or ODBC.
HWI client: A client that comes with hive, but it is rough and generally not needed.
HUE client: interact with Hive through a Web page, which is used more often.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.