Database and table for hive

Source: Internet
Author: User

This article introduces the basics of database (Database/schema) and tables (table) in hive, which is just a few common, basic, reasons for space.

Database and table for hive

Look at a sketch first:

Hive structure

As can be seen from the figure, hive as a "database", structurally active to the traditional database, also sub-database (Schema), each database has its own table.

1. Hive's default storage path on HDFs

The data for hive is stored on HDFs, and the default is a root directory, specified by the parameter Hive.metastore.warehouse.dir in Hive-site.xml. The default value is/user/hive/warehouse.

2. Database in Hive
    • Go to the Hive command line and execute show databases; command to list all the databases in hive, with a default database, and then to the default database after entering HIVE-CLI.
    • Using use DatabaseName; can switch to a database, with MySQL;
  1. Hive> show Databases;
  2. Ok
  3. Default
  4. lxw1234
  5. Usergroup_mdmp
  6. Userservice_mdmp
  7. Time taken: 0.442 seconds, fetched: 4 Row(s )
  8. Hive> Use lxw1234;
  9. Ok
  10. Time taken: 0.023 seconds
  11. Hive>
    • Hive The database in the HDFs the storage path on is:

${hive.metastore.warehouse.dir}/databasename.db

For example, the database storage path named lxw1234 is:

/user/hive/warehouse/lxw1234.db

    • Create Hive Database

Using HDFs superuser, enter HIVE-CLI with the syntax:

  1. CREATE (DATABASE| SCHEMA) [IF not EXISTS] database_name
  2. [COMMENT database_comment]
  3. [Locationhdfs_path]
  4. [with Dbproperties (property_name=property_value, ...)];

For example, create a database named lxw1234:

    1. CREATE DATABASE IF not EXISTS lxw1234
    2. COMMENT ' LxW's Big data field-lxw1234.com '
    3. Localtion ' hdfs://namenode/user/lxw1234/lxw1234.db/';

When created, you can specify where the database is stored on HDFS.

Note: After you create a database with HDFs Superuser, the owner of the database on the HDFs storage path is Superuser, and if the database is used for some or some users, you need to modify the path owner or authorize it in hive.

    • Modify Database

To modify Database properties:

ALTER (database| SCHEMA) database_name

SET dbproperties (Property_name=property_value, ...);

Modify Database owner:

ALTER (database| SCHEMA) database_name

SET OWNER [user| ROLE] User_or_role;

    • Deleting a database

DROP (database| SCHEMA) [IF EXISTS] database_name

[restrict| CASCADE];

By default, hive does not allow the deletion of a database with tables in it, if you want to delete the database, either delete the tables in the database first, or you can use the Cascade keyword, and hive will delete all the tables under the database by itself after using that keyword. The Restrict keyword is the default, that is, if a table exists, deleting the database is not allowed.

3. Tables in hive (table) 3.1 View all tables

Enter HIVE-CLI, using use databasename; After switching to the database, execute show tables; To see all the tables under the database:

    1. Hive> show Tables;
    2. Ok
    3. Lxw1
    4. lxw1234
    5. Table1
    6. T_site_log
3.2 Storage paths for tables

By default, the storage path for the table is:

${hive.metastore.warehouse.dir}/databasename.db/tablename/

You can use the DESC formatted tablename; command to view the details of the table, including the storage path:

location:hdfs://cdh5/hivedata/warehouse/lxw1234.db/lxw1234

3.3 Internal tables and external tables

The tables in hive are divided into internal tables (managed_table) and external tables (external_table).

    • The biggest difference between an internal table and an external table

The internal table drop will delete the data on the HDFs;

External table Drop will not delete data on HDFs;

    • Internal tables for scenarios:

Hive Intermediate tables, result tables, generally do not require external (such as local files, HDFs load data).

    • External tables for scenarios:

source table, you need to map the external data to the table periodically.

    • Our usage scenarios:

Every day the collected web logs are regularly streamed into HDFs text files, one day a directory;

An external table is created in hive as the source table, and the original log on the HDFs is mapped to the day partition of the external table by adding partitions;

In the external table (original log table) based on a large number of statistical analysis, the use of the intermediate table, the results table using internal table storage, data through Select+insert into the internal table.

3.4 Creating a Table

There are a number of syntax options for creating tables, which only list commonly used options.

Other See Hive official documentation:

Https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable

In one example:

CREATE EXTERNAL TABLE t_lxw1234 (

ID INT,

IP STRING COMMENT ' visitor IP ',

Avg_view_depth DECIMAL (5,1),

Bounce_rate DECIMAL (6,5)

) COMMENT ' LxW's Big data field-lxw1234.com '

Partitioned by (Day STRING)

ROW FORMAT Delimited

Fields TERMINATED by ', '

STORED as Textfile

Location ' hdfs://cdh5/tmp/lxw1234/';

    • Keyword external:

Indicates that the table is an external table, and if you do not specify the external keyword, the internal table

    • Keyword Comment

Add comments to tables and columns

    • Keyword partitioned by

Indicates that the table is a partitioned table, the partition field is day, and the type is string

    • Keyword row FORMAT delimited

Specifies the delimiter for the table, usually followed by the following keywords:

Field TERMINATED by ', '//Specify a comma in each line

LINES TERMINATED by ' \ n '//Specify line delimiter

COLLECTION ITEMS TERMINATED by ', '//specify delimiters between elements in the collection

Map KEYS TERMINATED by ': '//Specify the delimiter between the key and value of the map type in the data

As an example:

CREATE TABLE score (name string, score map<string,int>)

ROW FORMAT Delimited

Fields TERMINATED by ' \ t '

COLLECTION ITEMS TERMINATED by ', '

MAP KEYS TERMINATED by ': ';

The text data to be loaded is:

Biansutao ' math ': 80, ' language ': 89, ' English ': 95

Jobs ' language ': 60, ' math ': 80, ' English ': 99

    • Keyword stored as

Specifies the file storage format of the table on HDFs, with the optional file storage format:

Textfile//text, default value

Sequencefile//binary sequence file

Rcfile//columnstore format file Hive0.6 start support later

ORC//Columnstore format file, higher compression ratio and read-write efficiency than Rcfile, Hive0.11 start support later

Parquet//List storage format file, Hive0.13 start support later

    • Keyword location

Specifies the location where the table is stored on HDFs.

Database and table for hive

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.