Hive Architecture Exploration

Source: Internet
Author: User
about Hive

Hive is a data warehouse software that helps users use SQL to manage/read and write large datasets on distributed storage. Users can connect to hive via CLI/JBDC driver, and so on.

Writing a MapReduce program directly to manipulate a dataset on Hadoop requires writing complex code, while using hive, you simply need a simple SQL statement, such as SELECT * from Test, to get the specified data on Hadoop. Hive Schema

Hive does not store data, and hive is just the data that operates on distributed storage, such as Hdfs/hbase.
In HDFs, for example, a simple file with no table information/column information is available on HDFs, and hive can read the data on HDFs and present it as table data. The file on the HDFs some what table, what column in the table, this is the meta-data, stored in the ordinary database, such as Derby/mysql.


Hive gets the metadata from the Metastore, translates the SQL statement into a mapreduce program, and gives it the data that Hadoop manipulates on HDFs.

What about hive inside?
1. Hive provides multiple services that can be started with the following command

Hive <parameters>--service serviceName <service parameters>

Service List:

Beeline 
cleardanglingscratchdir 
CLI 
hbaseimport 
hbaseschematool 
help 
hiveburninclient 
hiveserver2 
hplsql 
jar 
lineage 
llapdump 
llap 
llapstatus 
metastore 
Metatool 
orcfiledump 
rcfilecat 
schematool 

The usual is cli/hiveserver2/metastore. 1.1 CLI

Provides command line access to Hive 1.2 hiveserver2

Hive Thrift Server allows programs to access hive 1.3 such as JDBC driver metastore

Hive Meta Data Service. 2. Hive starts with 2.2.0 and contains the Hcatalog

Hcatalog is a data sheet and storage Management service based on Apache Hadoop that supports cross-data processing tools such as pig,mapreduce,streaming,hive.
With Hcatalog, hive metadata can also be used for other Hadoop-based tools. No matter which data processing tool the user uses, through Hcatalog, they can manipulate the same data.

Most of these are personal understandings, if not, hope to point out.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.