Hive (vi): HQL DDL

Source: Internet
Author: User

HQL syntax is based on sqlline(http://sqlline.sourceforge.net/), the DDL mainly contains database, function, view creation, modification, deletion, reference : ( Https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL), syntax keywords:

    • CREATE Database/schema, TABLE, VIEW, FUNCTION, INDEX
    • DROP Database/schema, TABLE, VIEW, INDEX
    • ALTER Database/schema, TABLE, VIEW
    • SHOW Databases/schemas, TABLES, tblproperties, partitions, FUNCTIONS, Index[es], COLUMNS, CREATE TABLE
    • DESCRIBE Database/schema, TABLE_NAME, view_name

DataBase:

  • Create syntax:
    CREATE (DATABASE|  [IF not EXISTS][COMMENT database_comment[locationhdfs_path[withdbproperties ( Property_name=property_value, ...) ];
  • Creating database: Create DB if not exists demo;
  • Display data with data: show database;
  • Filter by condition: show databases like ' dem* '; (Note that the wildcard character is * instead of%)
  • DB Description: CREATE database Hello with dbproperties (' Creator ' = ' Tgzhu ', ' date ' = ' 2016-07-12 ');
  • Display databases: describe database hello; (Basic information, description information is not visible)
  • Display Database information: describe extended hello;

Drop Syntax:

DROP (DATABASE|  [IF EXISTS[restrict| CASCADE];      
    • RESTRICT: The default behavior, that is, when the database is not empty, the deletion is not allowed
    • CASCADE: Delete the table first and then delete the database
    • Delete databases: drop database if exists hello;
    • Delete databases:drop database if exists HELLP cascade;

DataTable:

  • CreateTable Syntax:
    CREATE[Temporary][EXTERNAL]TABLE[IF not EXISTS][Db_name.]table_name[(Col_name data_type [COMMENT col_comment], ...)][Partitioned by (Col_name data_type [COMMENT col_comment], ...)][CLUSTERED by (Col_name, col_name, ...) [SORTED by (Col_name [asc| DESC], ...)][Skewed by (Col_name, col_name, ...)]On(Col_value, Col_value, ...), (Col_value, Col_value, ...), ...)[STORED as Directories][[ROW FORMAT Row_format][STORED as File_format]| STOREDBy‘Storage.handler.class.name‘[With Serdeproperties (...)]][Location Hdfs_path][tblproperties (Property_name=property_value, ...)  [as select_statement ]create [temporary< Span style= "color: #ff0000;" >] [external table [if not Exists[db_name. ]table_name like Existing_table_or_view_name            
  • Description: table and column names are case insensitive, but Serde and property names are the opposite
  • Switch database: Use hello;
  • Create a table: Create a simple table manually
    CreateTableIfNotExistsStudents (ID string, Code string, Name string, scoredecimal (20,8<street:string,city:string, State:string,zip:string> ' = "1.0            
  • Duplicating table structures: Building tables based on existing tables
    Like students;
  • CTAS (CREATE table as Select) Table: CREATE table and load query results into table, limit ( target table cannot be partition table, external table, bucket table )
  • Try executing the following statement:create external Table empdemo1 as SELECT * from employee;
  • Re:FAILED:SemanticException [Error 10070]: create-table-as-select cannot CREATE external TABLE (state=42000,code=10070 )
  • Show, describe and the above description of the database syntax consistent, the key word for: table, such as: show tables; Describle students, describe extended students;
  • Display specified field information: describe students.address;
  • External table: See previous Chapter Hive (v): Hive and HBase integration

Partition table:

    • A table can have one or more partitions, and each partition exists in a folder as a separate directory under the table folder. The partition is present in the table structure as a field and can be viewed by the describe Table command, but the field does not hold the actual data content , only the representation of the partition. In a hive select query, the entire table content is generally scanned, and it consumes a lot of time to do unnecessary work. Sometimes you only need to scan a subset of the data in the table, so the partition concept is introduced when the table is built. One of the Partition in the table corresponds to a directory under the table , Partition is the auxiliary query, narrow the query scope, speed up the data retrieval speed and the data according to certain specifications and conditions to manage
    • Example:
      Create Table student_p (  ID string,  name string,     int,   by(region string, sex String);  

Bucket table:

    • For each table or partition, hive can be further organized into buckets, which means that buckets are more granular data range divisions . Hive is also an organization of buckets for a column. Hive uses a hash of the column values, divided by the number of buckets, to determine which bucket the record is stored in.
    • Purpose of creating a bucket table:
      1. Get higher query processing efficiency, such as JOIN operation, in JOIN Operation two table has a same column, if the two tables are bucket operation, then save the same column value of the bucket to join operation can, can greatly less data volume of join
      2. Making sampling more efficient, when working with large datasets, in the stage of developing and modifying queries, it can be handy to run queries on a small subset of data in a dataset.
    • The example build table statements are as follows:
      Create Table Student_c (  ID string,  name string,     int,   by(region string, sex String)  clustered by-into-buckets ;

Hive (vi): HQL DDL

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.