This article introduces the basics of database (Database/schema) and tables (table) in hive, which is just a few common, basic, reasons for space.
Database and table for hive
Look at a sketch first:
Hive structure
As can be seen from the figure, hive as a "database", structurally active to the traditional database, also sub-database (Schema), each database has its own table.
1. Hive's default storage path on HDFs
The data for hive is stored on HDFs, and the default is a root directory, specified by the parameter Hive.metastore.warehouse.dir in Hive-site.xml. The default value is/user/hive/warehouse.
2. Database in Hive
- Go to the Hive command line and execute show databases; command to list all the databases in hive, with a default database, and then to the default database after entering HIVE-CLI.
- Using use DatabaseName; can switch to a database, with MySQL;
- Hive> show Databases;
- Ok
- Default
- lxw1234
- Usergroup_mdmp
- Userservice_mdmp
- Time taken: 0.442 seconds, fetched: 4 Row(s )
- Hive> Use lxw1234;
- Ok
- Time taken: 0.023 seconds
- Hive>
- Hive The database in the HDFs the storage path on is:
${hive.metastore.warehouse.dir}/databasename.db
For example, the database storage path named lxw1234 is:
/user/hive/warehouse/lxw1234.db
Using HDFs superuser, enter HIVE-CLI with the syntax:
- CREATE (DATABASE| SCHEMA) [IF not EXISTS] database_name
- [COMMENT database_comment]
- [Locationhdfs_path]
- [with Dbproperties (property_name=property_value, ...)];
For example, create a database named lxw1234:
- CREATE DATABASE IF not EXISTS lxw1234
- COMMENT ' LxW's Big data field-lxw1234.com '
- Localtion ' hdfs://namenode/user/lxw1234/lxw1234.db/';
When created, you can specify where the database is stored on HDFS.
Note: After you create a database with HDFs Superuser, the owner of the database on the HDFs storage path is Superuser, and if the database is used for some or some users, you need to modify the path owner or authorize it in hive.
To modify Database properties:
ALTER (database| SCHEMA) database_name
SET dbproperties (Property_name=property_value, ...);
Modify Database owner:
ALTER (database| SCHEMA) database_name
SET OWNER [user| ROLE] User_or_role;
DROP (database| SCHEMA) [IF EXISTS] database_name
[restrict| CASCADE];
By default, hive does not allow the deletion of a database with tables in it, if you want to delete the database, either delete the tables in the database first, or you can use the Cascade keyword, and hive will delete all the tables under the database by itself after using that keyword. The Restrict keyword is the default, that is, if a table exists, deleting the database is not allowed.
3. Tables in hive (table) 3.1 View all tables
Enter HIVE-CLI, using use databasename; After switching to the database, execute show tables; To see all the tables under the database:
- Hive> show Tables;
- Ok
- Lxw1
- lxw1234
- Table1
- T_site_log
3.2 Storage paths for tables
By default, the storage path for the table is:
${hive.metastore.warehouse.dir}/databasename.db/tablename/
You can use the DESC formatted tablename; command to view the details of the table, including the storage path:
location:hdfs://cdh5/hivedata/warehouse/lxw1234.db/lxw1234
3.3 Internal tables and external tables
The tables in hive are divided into internal tables (managed_table) and external tables (external_table).
- The biggest difference between an internal table and an external table
The internal table drop will delete the data on the HDFs;
External table Drop will not delete data on HDFs;
- Internal tables for scenarios:
Hive Intermediate tables, result tables, generally do not require external (such as local files, HDFs load data).
- External tables for scenarios:
source table, you need to map the external data to the table periodically.
Every day the collected web logs are regularly streamed into HDFs text files, one day a directory;
An external table is created in hive as the source table, and the original log on the HDFs is mapped to the day partition of the external table by adding partitions;
In the external table (original log table) based on a large number of statistical analysis, the use of the intermediate table, the results table using internal table storage, data through Select+insert into the internal table.
3.4 Creating a Table
There are a number of syntax options for creating tables, which only list commonly used options.
Other See Hive official documentation:
Https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable
In one example:
CREATE EXTERNAL TABLE t_lxw1234 (
ID INT,
IP STRING COMMENT ' visitor IP ',
Avg_view_depth DECIMAL (5,1),
Bounce_rate DECIMAL (6,5)
) COMMENT ' LxW's Big data field-lxw1234.com '
Partitioned by (Day STRING)
ROW FORMAT Delimited
Fields TERMINATED by ', '
STORED as Textfile
Location ' hdfs://cdh5/tmp/lxw1234/';
Indicates that the table is an external table, and if you do not specify the external keyword, the internal table
Add comments to tables and columns
Indicates that the table is a partitioned table, the partition field is day, and the type is string
- Keyword row FORMAT delimited
Specifies the delimiter for the table, usually followed by the following keywords:
Field TERMINATED by ', '//Specify a comma in each line
LINES TERMINATED by ' \ n '//Specify line delimiter
COLLECTION ITEMS TERMINATED by ', '//specify delimiters between elements in the collection
Map KEYS TERMINATED by ': '//Specify the delimiter between the key and value of the map type in the data
As an example:
CREATE TABLE score (name string, score map<string,int>)
ROW FORMAT Delimited
Fields TERMINATED by ' \ t '
COLLECTION ITEMS TERMINATED by ', '
MAP KEYS TERMINATED by ': ';
The text data to be loaded is:
Biansutao ' math ': 80, ' language ': 89, ' English ': 95
Jobs ' language ': 60, ' math ': 80, ' English ': 99
Specifies the file storage format of the table on HDFs, with the optional file storage format:
Textfile//text, default value
Sequencefile//binary sequence file
Rcfile//columnstore format file Hive0.6 start support later
ORC//Columnstore format file, higher compression ratio and read-write efficiency than Rcfile, Hive0.11 start support later
Parquet//List storage format file, Hive0.13 start support later
Specifies the location where the table is stored on HDFs.
Database and table for hive