about Hive
Hive is a data warehouse software that helps users use SQL to manage/read and write large datasets on distributed storage. Users can connect to hive via CLI/JBDC driver, and so on.
Writing a MapReduce program directly to manipulate a dataset on Hadoop requires writing complex code, while using hive, you simply need a simple SQL statement, such as SELECT * from Test, to get the specified data on Hadoop. Hive Schema
Hive does not store data, and hive is just the data that operates on distributed storage, such as Hdfs/hbase.
In HDFs, for example, a simple file with no table information/column information is available on HDFs, and hive can read the data on HDFs and present it as table data. The file on the HDFs some what table, what column in the table, this is the meta-data, stored in the ordinary database, such as Derby/mysql.
Hive gets the metadata from the Metastore, translates the SQL statement into a mapreduce program, and gives it the data that Hadoop manipulates on HDFs.
What about hive inside?
1. Hive provides multiple services that can be started with the following command
Hive <parameters>--service serviceName <service parameters>
Service List:
Beeline
cleardanglingscratchdir
CLI
hbaseimport
hbaseschematool
help
hiveburninclient
hiveserver2
hplsql
jar
lineage
llapdump
llap
llapstatus
metastore
Metatool
orcfiledump
rcfilecat
schematool
The usual is cli/hiveserver2/metastore. 1.1 CLI
Provides command line access to Hive 1.2 hiveserver2
Hive Thrift Server allows programs to access hive 1.3 such as JDBC driver metastore
Hive Meta Data Service. 2. Hive starts with 2.2.0 and contains the Hcatalog
Hcatalog is a data sheet and storage Management service based on Apache Hadoop that supports cross-data processing tools such as pig,mapreduce,streaming,hive.
With Hcatalog, hive metadata can also be used for other Hadoop-based tools. No matter which data processing tool the user uses, through Hcatalog, they can manipulate the same data.
Most of these are personal understandings, if not, hope to point out.