The relationship between Hive and MySQL

Source: Internet
Author: User

Hive is a Hadoop-based data Warehouse platform. With hive, we can easily work with ETL. Hive defines a SQL-like query language: HQL, which converts a user-written QL into a corresponding MapReduce program based on Hadoop execution.

Hive is a data Warehouse framework that Facebook has just open source for August 2008, and its system targets are similar to pig, but there are mechanisms that pig does not currently support, such as richer type systems, more SQL-like query languages, table/ Partition the persistence of metadata.

Hive  can be seen from SQL to Map-reduce.  Mapper 

components and architectures for hive:

Hive Web Interface startup:./hive--service Hwi

Browser access: http://localhost:9999/hwi/

By default, the hive metadata is saved in the embedded Derby database, allowing only one session connection and only for simple testing. In order to support multi-user multi-session, we need a separate meta-database, we use MySQL as a meta-database, the internal Hive for MySQL provides good support.

Hive Installation

Inline mode: Metadata remains in the built-in Derby mode, allowing only one session to connect

Local Standalone mode: Install MySQL locally and put the metadata inside MySQL

Remote mode: Metadata is placed in the remote MySQL database.

where are hive data placed?

The data is in the warehouse directory of HDFs, and a table corresponds to a subdirectory.

Local/tmp directory to store logs and execution plans

The hive table is divided into two types, inner table and appearance.
When Hive creates an internal table, it moves the data to the path that the data warehouse points to, and if you create an external table, only the path where the data resides is recorded, and no changes are made to the location of the data.
When you delete a table, the metadata and data for the internal table are deleted together, and the external table deletes only the metadata and does not delete the data. This makes the external table more secure and the data organization more flexible to facilitate the sharing of source data.

using MySQL as a storage database for hive Metastore

The main related tables are as follows:
Table name Description Association keys
TBLs Basic information for all hive tables (table name, creation time, owner, etc.)
tbl_id,sd_id
Table_param Table-level attributes, such as whether external tables, table comments, last modified time, etc. tbl_id
COLUMNS Hive table fields information (field comment, Field name, field type, field ordinal) sd_id
Sds HDFs Data directory and data format for all hive tables and table partitions sd_id,serde_id
Serde_param Serialization of deserialization information, such as row separators, column separators, null representation characters, and so on serde_id
Partitions Hive Table partition information (owning table, partition value) part_id,sd_id,tbl_id
Partition_keys Hive partition Table partition key (that is, partition field) tbl_id
Partition_key_vals Hive table partition name (key value) part_id

The relationship between Hive and MySQL

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.