Hive is a Hadoop-based data Warehouse platform. With hive, we can easily work with ETL. Hive defines a SQL-like query language: HQL, which converts a user-written QL into a corresponding MapReduce program based on Hadoop execution.
Hive is a data Warehouse framework that Facebook has just open source for August 2008, and its system targets are similar to pig, but there are mechanisms that pig does not currently support, such as richer type systems, more SQL-like query languages, table/ Partition the persistence of metadata.
Hive
can be seen from SQL to Map-reduce.
Mapper
components and architectures for hive:
Hive Web Interface startup:./hive--service Hwi
Browser access: http://localhost:9999/hwi/
By default, the hive metadata is saved in the embedded Derby database, allowing only one session connection and only for simple testing. In order to support multi-user multi-session, we need a separate meta-database, we use MySQL as a meta-database, the internal Hive for MySQL provides good support.
Hive Installation
Inline mode: Metadata remains in the built-in Derby mode, allowing only one session to connect
Local Standalone mode: Install MySQL locally and put the metadata inside MySQL
Remote mode: Metadata is placed in the remote MySQL database.
where are hive data placed?
The data is in the warehouse directory of HDFs, and a table corresponds to a subdirectory.
Local/tmp directory to store logs and execution plans
The hive table is divided into two types, inner table and appearance.
When Hive creates an internal table, it moves the data to the path that the data warehouse points to, and if you create an external table, only the path where the data resides is recorded, and no changes are made to the location of the data.
When you delete a table, the metadata and data for the internal table are deleted together, and the external table deletes only the metadata and does not delete the data. This makes the external table more secure and the data organization more flexible to facilitate the sharing of source data.
using MySQL as a storage database for hive Metastore
The main related tables are as follows:
Table name |
Description |
Association keys |
TBLs |
Basic information for all hive tables (table name, creation time, owner, etc.)
|
tbl_id,sd_id |
Table_param |
Table-level attributes, such as whether external tables, table comments, last modified time, etc. |
tbl_id |
COLUMNS |
Hive table fields information (field comment, Field name, field type, field ordinal) |
sd_id |
Sds |
HDFs Data directory and data format for all hive tables and table partitions |
sd_id,serde_id |
Serde_param |
Serialization of deserialization information, such as row separators, column separators, null representation characters, and so on |
serde_id |
Partitions |
Hive Table partition information (owning table, partition value) |
part_id,sd_id,tbl_id |
Partition_keys |
Hive partition Table partition key (that is, partition field) |
tbl_id |
Partition_key_vals |
Hive table partition name (key value) |
part_id |
The relationship between Hive and MySQL