HQL syntax is based on sqlline(http://sqlline.sourceforge.net/), the DDL mainly contains database, function, view creation, modification, deletion, reference : ( Https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL), syntax keywords:
- CREATE Database/schema, TABLE, VIEW, FUNCTION, INDEX
- DROP Database/schema, TABLE, VIEW, INDEX
- ALTER Database/schema, TABLE, VIEW
- SHOW Databases/schemas, TABLES, tblproperties, partitions, FUNCTIONS, Index[es], COLUMNS, CREATE TABLE
- DESCRIBE Database/schema, TABLE_NAME, view_name
DataBase:
Drop Syntax:
DROP (DATABASE| [IF EXISTS[restrict| CASCADE];
- RESTRICT: The default behavior, that is, when the database is not empty, the deletion is not allowed
- CASCADE: Delete the table first and then delete the database
- Delete databases: drop database if exists hello;
- Delete databases:drop database if exists HELLP cascade;
DataTable:
- CreateTable Syntax:
CREATE[Temporary][EXTERNAL]TABLE[IF not EXISTS][Db_name.]table_name[(Col_name data_type [COMMENT col_comment], ...)][Partitioned by (Col_name data_type [COMMENT col_comment], ...)][CLUSTERED by (Col_name, col_name, ...) [SORTED by (Col_name [asc| DESC], ...)][Skewed by (Col_name, col_name, ...)]On(Col_value, Col_value, ...), (Col_value, Col_value, ...), ...)[STORED as Directories][[ROW FORMAT Row_format][STORED as File_format]| STOREDBy‘Storage.handler.class.name‘[With Serdeproperties (...)]][Location Hdfs_path][tblproperties (Property_name=property_value, ...) [as select_statement ]create [temporary< Span style= "color: #ff0000;" >] [external table [if not Exists[db_name. ]table_name like Existing_table_or_view_name
- Description: table and column names are case insensitive, but Serde and property names are the opposite
- Switch database: Use hello;
- Create a table: Create a simple table manually
CreateTableIfNotExistsStudents (ID string, Code string, Name string, scoredecimal (20,8<street:string,city:string, State:string,zip:string> ' = "1.0
- Duplicating table structures: Building tables based on existing tables
Like students;
- CTAS (CREATE table as Select) Table: CREATE table and load query results into table, limit ( target table cannot be partition table, external table, bucket table )
- Try executing the following statement:create external Table empdemo1 as SELECT * from employee;
- Re:FAILED:SemanticException [Error 10070]: create-table-as-select cannot CREATE external TABLE (state=42000,code=10070 )
- Show, describe and the above description of the database syntax consistent, the key word for: table, such as: show tables; Describle students, describe extended students;
- Display specified field information: describe students.address;
- External table: See previous Chapter Hive (v): Hive and HBase integration
Partition table:
- A table can have one or more partitions, and each partition exists in a folder as a separate directory under the table folder. The partition is present in the table structure as a field and can be viewed by the describe Table command, but the field does not hold the actual data content , only the representation of the partition. In a hive select query, the entire table content is generally scanned, and it consumes a lot of time to do unnecessary work. Sometimes you only need to scan a subset of the data in the table, so the partition concept is introduced when the table is built. One of the Partition in the table corresponds to a directory under the table , Partition is the auxiliary query, narrow the query scope, speed up the data retrieval speed and the data according to certain specifications and conditions to manage
- Example:
Create Table student_p ( ID string, name string, int, by(region string, sex String);
Bucket table:
- For each table or partition, hive can be further organized into buckets, which means that buckets are more granular data range divisions . Hive is also an organization of buckets for a column. Hive uses a hash of the column values, divided by the number of buckets, to determine which bucket the record is stored in.
- Purpose of creating a bucket table:
-
- Get higher query processing efficiency, such as JOIN operation, in JOIN Operation two table has a same column, if the two tables are bucket operation, then save the same column value of the bucket to join operation can, can greatly less data volume of join
- Making sampling more efficient, when working with large datasets, in the stage of developing and modifying queries, it can be handy to run queries on a small subset of data in a dataset.
- The example build table statements are as follows:
Create Table Student_c ( ID string, name string, int, by(region string, sex String) clustered by-into-buckets ;
Hive (vi): HQL DDL