Hive QL Introduction

Source: Internet
Author: User

Summary

This course learns the basic syntax and operation of Hive QL.

First, the experimental environment explained

1. Environment Login

No password automatic login, system user name Shiyanlou, password Shiyanlou

2. Introduction to the Environment

This experiment environment uses the Ubuntu Linux environment with the desktop, the experiment will use the program on the desktop:

  • Xfceterminal:linux command line terminal, Open will enter the bash environment, you can use the Linux command;
  • Firefox: Browser, can be used in the need for the front-end interface of the course, only need to open the environment to write the HTML/JS page;
  • GVim: Very useful editor, the simplest usage can refer to the course Vim editor.
  • Eclipse:eclipse is a well-known cross-platform, free integrated development environment (IDE). It is used primarily for Java language development, but it is also being developed as a development tool for languages such as C + + and Python through plugins.

3. Use of the environment

Use the Gvim editor to enter the code required for the experiment, and then use the xfceterminal command-line environment to compile and run, view the running results, run and share your experiment results, the lab building provides the backstage, cannot cheat, can prove that you have completed the experiment effectively.

The Experiment records page can be viewed in the "My Course", which contains each experiment and notes, as well as the effective learning time of each experiment (refers to the time of the experiment desktop operation, if there is no action, the system will be recorded as Daze time). These are the proof of authenticity of your studies.

Second, the program is marked

This tutorial on Hive QL is divided into the following points:

1. Data definition Operations-DDL

2. Data Manipulation-DML

3. Hive QL Query operation

III. data definition Operations-DDL

(1) The syntax for building a table (CREATE) is as follows:

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name [(col_name data_type [COMMENT col_comment], ...)] [COMMENT table_comment] [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] [ROW FORMAT row_format] [STORED AS file_format] [LOCATION hdfs_path]

Some of the key words above explain:

  • CREATE table creates a tables with the specified name. Throws an exception if a table of the same name already exists, and the user can ignore the exception with the If not EXIST option
  • The EXTERNAL keyword allows the user to create an external table that specifies a path to the actual data while the table is in progress (location)
  • like allows the user to copy an existing table structure, but does not replicate the data
  • COMMENT can add a description to a table and field
  • ROW FORMAT delimited [TERMINATED by char] [COLLECTION ITEMS TERMINATED by char] [MAP KEYS TERMINATED by char] [LINES TERMINATE D by Char] | SERDE Serde_name [with Serdeproperties (Property_name=property_value, Property_name=property_value, ...)]
     用户在建表的时候可以自定义 SerDe 或者使用自带的 SerDe。如果没有指定 ROW FORMAT 或者 ROW FORMAT DELIMITED,将会使用自带的 SerDe。在建表的时候,用户还需要为表指定列,用户在指定表的列的同时也会指定自定义的 SerDe,Hive 通过 SerDe 确定表的具体的列的数据。
  • STORED as
          SEQUENCEFILE      | TEXTFILE      | RCFILE          | INPUTFORMAT input_format_classname OUTPUTFORMAT             AS TEXTFILE。如果数据需要压缩,使用 STORED AS SEQUENCE 。

(2) Build table (CREATE)

  • Create a normal table
  • Create an external table
  • Create a partitioned table
  • Create Bucket Table
    • Create a normal table
CREATE EXTERNAL TABLE page_view(viewTime INT, userid BIGINT, page_url STRING, referrer_url STRING, ip STRING COMMENT ‘IP Address of the User‘, country STRING COMMENT ‘country of origination‘) COMMENT ‘This is the staging page view table‘;
    • To create an external table:
CREATEexternal table Page_view (viewtime int, userid bigint, Page_url  STRING, Referrer_url string, IP string COMMENT  IP Address of the User ', Country string COMMENT  Country of origination ') COMMENT  ' This is the Staging Page view table ' AS Textfile location        
    • create partition table:
create TABLE par_table ( Viewtime int, userid bigint, Page_url string, Referrer_url string, IP  STRING COMMENT  ' IP Address of the User ') COMMENT  ' This is the page vie W table ' partitioned by (date  STRING, pos string) row FORMAT Delimited ' \ t ' fields terminated  by  ' \ n ' STORED as sequencefile;       
    • To create a Bucket table:
CREATETABLE par_table (viewtimeINT, UserIDBIGINT, Page_urlstring, Referrer_url string, IP  STRING COMMENT  ' IP Address of the User ') COMMENT  ' This is the page vie W table ' partitioned by (date  STRING, pos string) CLUSTERED by (userid) SORTED by (viewtime) into 32 BUCKETS row format delimited ' \ t ' FIELDS terminated by  ' \ n ' STORED as sequencefile;            

(3) Modify table structure

  • Renaming a table
  • adding, deleting partitions
  • Add, update columns
  • Modify the name, type, location, and comment of a column
  • Increase the metadata information for a table
  • ...
    • Copy an empty table
CREATE TABLE empty_key_value_storeLIKE key_value_store;
    • Delete a table
DROP TABLE table_name
    • Renaming a table
ALTER TABLE table_name RENAME TO new_table_name
    • adding, deleting partitions
# 增加ALTER TABLE table_name ADD [IF NOT EXISTS] partition_spec [ LOCATION ‘location1‘ ] partition_spec [ LOCATION ‘location2‘ ] ...partition_spec: : PARTITION (partition_col = partition_col_value, partition_col = partiton_col_value, ...)# 删除ALTER TABLE table_name DROP partition_spec, partition_spec,...
    • Add, update columns
REPLACE 则是表示替换表中所有字段。ALTER TABLE table_name ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...)
    • Modify the name, type, location, and comment of a column
# 这个命令可以允许改变列名、数据类型、注释、列位置或者它们的任意组合ALTER TABLE table_name CHANGE [COLUMN]col_old_name col_new_name column_type[COMMENT col_comment][FIRST|AFTER column_name]
    • Increase the metadata information for a table
# 用户可以用这个命令向表中增加元数据信息 metadataALTER TABLE table_name SET TBLPROPERTIES table_properties table_properties:    : (property_name = property_value, ...)
    • Change file format and organization
ALTER TABLE table_name SET FILEFORMAT file_formatALTER TABLE table_name CLUSTERED BY(col_name, col_name, ...) [SORTED BY(col_name, ...)] INTO num_buckets BUCKETS
    • Create, delete views
# 创建视图CREATE VIEW [IF NOT EXISTS] view_name [ (column_name [COMMENT column_comment], ...) ][COMMENT view_comment][TBLPROPERTIES (property_name = property_value, ...)]AS SELECT ...# 删除视图DROP VIEW view_name
    • creating, deleting functions
# 创建函数CREATE TEMPORARY FUNCTION function_name AS class_name# 删除函数DROP TEMPORARY FUNCTION function_name
    • Presentation/Description Statements
# 显示 表show tables;# 显示 数据库show databases;# 显示 分区show partitions;# 显示 函数show functions;# 描述 表/列describe [EXTENDED] table_name[DOT col_name]
Iv. Data Manipulation-DML

Hive does not support inserting a single line of INSERT statements, nor does IT support update operations. The data is loaded into the built-in table as load and cannot be modified once the data is imported.

(1) Loading files into a data table

LOAD DATA [LOCAL] INPATH ‘filepath‘ [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
  • Load

    The operation is simply a copy/move operation that moves the data file to the location corresponding to the Hive table.

  • FilePath

    • Relative paths, for example: project/data1
    • Absolute path, for example:/user/hive/project/data1
    • Contains the full URI of the pattern, for example: hdfs://namenode:9000/user/hive/project/data1
  • The LOCAL keyword

    • The local load command is specified to find the filepath in the local file system. If the discovery is a relative path, the path is interpreted as the current path to the current user. The user can also specify a full URI for the local file, for example: File:///user/hive/project/data. The load command then copies the files in the filepath to the target file system. The target file system is determined by the location properties of the table. The copied data file is moved to the location of the table's data.
    • No local is specified if FilePath points to a full uri,hive, this URI will be used directly. Otherwise, if schema is not specified or authority,hive, the URI of Namenode is specified using schema and authority,fs.default.name defined in the Hadoop configuration file. If the path is not absolute, Hive is interpreted in relation to/user/. Hive moves the contents of the file specified in filepath to the path specified in table (or partition).
  • OVERWRITE

    With the OVERWRITE keyword, the content (if any) in the target table (or partition) is deleted, and then the contents of the file/directory pointed to by FilePath are added to the table/partition. If the target table (partition) already has a file, and the file name conflicts with the file name in the filepath, the existing file will be replaced by the new one.

For example:

‘./examples/files/kv1.txt‘ OVERWRITE INTO TABLE pokes;

(2) inserting query results into hive table

INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement
    • Multi-insert Mode
FROM from_statementINSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1[INSERT OVERWRITE TABLE tablename2 [PARTITION ...] select_statement2] ...
    • Auto Partition mode
INSERT OVERWRITE TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement

(3) Write query results to the HDFs file system

INSERT OVERWRITE [LOCAL] DIRECTORY directory1 SELECT ... FROM ...FROM from_statementINSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1[INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select_statement2] ...

When data is written to the file system, text is serialized, and each column is separated by a ^A \n newline. If any of the columns are not of the original type, then these will be serialized in JSON format.

V. Hive QL Query operation

SQL operations:

  • Basic Select operation
  • Partition-based queries
  • Join

(1) Basic Select operation

SELECT [ALL | DISTINCT] select_expr, select_expr, ...FROM table_reference[WHERE where_condition][GROUP BY col_list][ CLUSTER BY col_list | [DISTRIBUTE BY col_list] [SORT BY| ORDER BY col_list]][LIMIT number]

Use the All and DISTINCT options to differentiate the processing of duplicate records. The default is all, which indicates that all records are queried. DISTINCT means to remove duplicate records;

Limit limits the number of records that can be output by the query;

(2) query based on Partition

The General SELECT query scans the entire table and uses the partitioned by clause to build the table, and the query can take advantage of the features of the partition pruning (input pruning).

The current partition pruning of Hive, only the partition assertion appears in the WHERE clause closest to the FROM clause, enabling partition pruning.

For example, if a table Page_view is partitioned by the value of the Date column, the following query can retrieve a row record with a date of 2010-03-01:

SELECT page_view    FROM page_view    WHERE page_view.date >= ‘2010-03-01‘ ADN page_views.date <= ‘2010-03-31‘

(3) Join

The syntax for Join is as follows:

join_table:    table_reference JOIN table_factor [join_condition]  | table_reference {LEFT|RIGHT|FULL} [OUTER] JOIN table_reference join_condition  | table_reference LEFT SEMI JOIN table_reference join_condition  | table_reference CROSS JOIN table_reference [join_condition] (as of Hive 0.10)table_reference:    table_factor  | join_tabletable_factor:    tbl_name [alias]  | table_subquery alias  | ( table_references )join_condition:    ON equality_expression ( AND equality_expression )*equality_expression: expression = expression

Hive supports only such connections (equality joins), Outer joins (outer joins), and left-half connections (Ieft semi joins). Hive does not support non-equal join conditions because it is difficult to implement such conditions in the map/reduce job. Also, Hive can join more than two tables.

Reference documents
  • "Hadoop Combat 2nd Edition" Lu Jiaheng, mechanical industry press;
  • A detailed description of Hadoop Hive SQL syntax;

Hive QL Introduction

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.