"Go" HiveQL: Learning about data definition

Last Update:2015-07-14 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Databases in 1.Hive:

It is a table directory or namespace that avoids table naming conflicts, and we typically use databases to organize production tables into logical groups.

Basic command:

(1) Create a database (if the database does not exist):

Create database if not exists time;

(2) View all the databases contained in hive:

show databases;

Note: Hive creates a directory for each database, and the table in the database is stored as a subdirectory of the database directory, with an exception to the table in the default database. Tables in the default directory are stored directly under Hive.metastore.warehouse.dir.

We can set the property hive.cli.print.current.db=true to let the CLI of hive display indicate which database is currently working under:

As above, first in the default directory, using the use time, after the command switch to the time database.

2. Storage of data in hive:

Divided into meta-data, "real" data and logs:

(1) Metadata: By default, the hive metadata is saved in the built-in Derby database, allowing only one session connection and only for simple testing. To support multi-user

, I need a separate meta-database, I use MySQL as a meta-database, the Hive inside of the MySQL provides a good support.

We can go into MySQL to view the metadata for the table in hive:

The default metadata tables are:

The main related tables are:

For example: We view the TBLs table for hive metadata information in MySQL:

As a control, we look at whether there are log tables and Hive_test tables in hive:

Note: There are three modes during the installation of hive:

Built-in mode: metadata remains in the inline derby mode, allowing only one session to connect

Local Standalone mode: Install MySQL locally and put the metadata inside MySQL

Remote mode: Metadata is placed in the remote MySQL database

(2) "real" data: exists on HDFs.

(3) Log: There is a local/tmp/${user_name}/hive.log (for example:/tmp/hadoop/hive.log is the author's storage path)

3. Tables in hive: divided into internal tables (also called management tables) and external tables.

(1) If you create an internal table, the data is moved to the path that the data warehouse points to. When the internal table is deleted, the metadata and data of the internal table are deleted together;

(2) If an external table is created, only the path of the data is recorded, and no changes are made to the location of the data (that is, the data already exists on HDFS, and the newly created table points to it). External tables only delete metadata and do not delete data. So the outer table is relative to

More secure, the data organization is more flexible and facilitates the sharing of source data.

4. Partitioning the table:

(1) In a hive select query, the entire table content is generally scanned, which consumes a lot of time to do unnecessary work. Sometimes you just need to scan a subset of the data in the table

This built-in table introduces the partition concept-faster queries for improved performance!

(2) A partitioned table refers to the partition space of the partition specified when the table was created. If you need to create a partitioned table, you need to call the optional parameter when creating the table

Partitioned by.

Eg:create Table Employees (name string, salary string) partitioned by (country string, state string);

At this point we just declare that the Employees table has a partition structure, but we have not created the partition, we can create the partition by loading the data, or we can add, modify, and delete the partition through the ALTER command. The following author uses the first method to create a partition and upload data:

Eg:load data local inpath '/home/hadoop/extend/data_employees ' into table employees partition (country= ' US ', state= ' CA ') ;

Hive creates the corresponding directory for this partition and uploads the file to this directory.

Note: Hive does not verify that the data loaded by the user and the schema of the table match. However, Hive verifies that the file format is consistent with the table structure definition, that is, if the storage format defined when the table is created is Sequencefile, the loaded file must also be in the Sequencefile format.

"Go" HiveQL: Learning about data definition

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

"Go" HiveQL: Learning about data definition

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

"Go" HiveQL: Learning about data definition

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support