Hive Base SQL Syntax (DML)

Source: Internet
Author: User

DML operations (Data manipulation Language)

refer to the official documentation: DML document

    • Because update and delete are not commonly used in hive, this article does not explain. This article mainly describes load and insert operations.
1. Load (loading data)

The load function is to load the file into the table (Loading files into tables)

    • The following is the syntax that is listed on the official website for us:

      LOAD DATA [LOCAL] INPATH ‘filepath‘ [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]LOAD DATA [LOCAL] INPATH ‘filepath‘ [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)] [INPUTFORMAT ‘inputformat‘ SERDE ‘serde‘] (3.0 or later)
    • 1. When loading data into a table, hive does not make any conversions. The load operation simply copies or moves the data, moving the data file to the appropriate location in the Hive table.

    • 2. The loaded target can be either a table or a partition. If the table is partitioned, you must specify the partition of a table by specifying the values of all the partitioning columns.

    • 3.filepath can be a file, or it can be a directory. Under no circumstances, FilePath is considered a collection of files.

Local: Indicates that the input file is on the native file system (Linux), and if no local,hive is added, it will go to HDFs to find the file.
OVERWRITE: Indicates that if there is data in the table, the data is deleted first, then the new data is inserted, and if there is no such keyword, the data is appended directly to the table.
PARTITION: If there is a partition in the table, you can import it by partition.

# Create an Employee table hive> CREATE TABLE emp > (empno int, ename string, job string, Mgr Int, hiredate string, salary double, Comm Double, Deptno int) > ROW FORMAT delimited > Fields TERMINATED by ' \ t '; Oktime taken:0.651 seconds# Emp.txt Import table in the filesystem hive> LOAD DATA LOCAL inpath '/home/hadoop/emp.txt ' OVERWRITE into table emp; Loading data to table default.emptable default.emp stats: [Numfiles=1, Numrows=0, totalsize=886, Rawdatasize=0]oktime Tak en:1.848 seconds# View Table Data hive> select * from emp;ok7369 SMITH clerk 7902 1980-12-17 800.0 NULL 20749       9 ALLEN salesman 7698 1981-2-20 1600.0 300.0 307521 WARD salesman 7698 1981-2-22 1250.0 500.0 307566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 207654 MARTIN salesman 7   698 1981-9-28 1250.0 1400.0 307698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 307782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 107788 SCOTT ANALYST 7566 1987-4-19 3000.0 null 207839 KING President Null 1981-11-17    5000.0 NULL 107844 TURNER salesman 7698 1981-9-8 1500.0 0.0 307876 ADAMS Clerk 7788 1987-5-23 1100.0 null 207900 JAMES clerk 7698 1981-12-3 950.0 null 307902 FORD ANA LYST 7566 1981-12-3 3000.0 null 207934 MILLER clerk 7782 1982-1-23 1300.0 NULL 10# no Overw    Rite keyword hive> LOAD DATA LOCAL inpath '/home/hadoop/emp.txt ' into TABLE emp;# view hive> select * from emp;ok7369 again  SMITH Clerk 7902 1980-12-17 800.0 NULL 207499 ALLEN salesman 7698 1981-2-20 1600.0 300.0 307521 WARD salesman 7698 1981-2-22 1250.0 500.0 307566 JONES MANAGER 7839 1981-4 -2 2975.0 NULL 207654 MARTIN salesman 7698 1981-9-28 1250.0 1400.0 307698 BLAKE MANA    GER 7839 1981-5-1    2850.0 null 307782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 107788 SCOTT ANALYST 7566 19  87-4-19 3000.0 NULL 207839 KING President null 1981-11-17 5000.0 NULL 107844 TURNER    Salesman 7698 1981-9-8 1500.0 0.0 307876 ADAMS clerk 7788 1987-5-23 1100.0 NULL   207900 JAMES Clerk 7698 1981-12-3 950.0 NULL 307902 FORD ANALYST 7566 1981-12-3 3000.0      Null 207934 MILLER clerk 7782 1982-1-23 1300.0 NULL 107369 SMITH clerk 7902 1980-12-17        800.0 NULL 207499 ALLEN salesman 7698 1981-2-20 1600.0 300.0 307521 WARD salesman    7698 1981-2-22 1250.0 500.0 307566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 207654  MARTIN salesman 7698 1981-9-28 1250.0 1400.0 307698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 307782 CLARK  MANAGER 7839 1981-6-9 2450.0 null 107788 SCOTT ANALYST 7566 1987-4-19 3000.0 null 207839        KING president null 1981-11-17 5000.0 null 107844 TURNER salesman 7698 1981-9-8 1500.0 0.0 307876 ADAMS clerk 7788 1987-5-23 1100.0 NULL 207900 JAMES Clerk 7698 19   81-12-3 950.0 null 307902 FORD ANALYST 7566 1981-12-3 3000.0 null 207934 MILLER Clerk  7782 1982-1-23 1300.0 NULL 10Time taken:0.137 seconds, fetched:28 row (s) # again overwrite overwrite import hive> LOAD DATA LOCAL inpath '/home/hadoop/emp.txt ' OVERWRITE into TABLE emp; # Discovery data is overwritten hive> select * from emp;ok7369 SMITH clerk 7902 1980-12-17 800.0 NULL 207499 ALLEN S Alesman 7698 1981-2-20 1600.0 300.0 307521 WARD salesman 7698 1981-2-22 1250.0 5    00.0 307566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 207654MARTIN salesman 7698 1981-9-28 1250.0 1400.0 307698 BLAKE MANAGER 7839 1981-5-1 2850.0       Null 307782 CLARK MANAGER 7839 1981-6-9 2450.0 null 107788 SCOTT ANALYST 7566 1987-4-19        3000.0 NULL 207839 KING President null 1981-11-17 5000.0 null 107844 TURNER salesman 7698 1981-9-8 1500.0 0.0 307876 ADAMS clerk 7788 1987-5-23 1100.0 NULL 207900 JA MES clerk 7698 1981-12-3 950.0 null 307902 FORD ANALYST 7566 1981-12-3 3000.0 null 20 7934 MILLER Clerk 7782 1982-1-23 1300.0 NULL 10Time taken:0.164 seconds, fetched:14 row (s)
2. Insert into to table (Inserting data into Hive Tables from queries)
    • The following is the syntax that is listed on the official website for us:
Standard Syntax:insert OVERWRITE TABLE tablename1 [PARTITION (Partcol1=val1, Partcol2=val2 ...) [IF not EXISTS]] Select_statement1 from From_statement;insert to TABLE tablename1 [PARTITION (Partcol1=val1, Partcol2=val2 ...)] Select _statement1 from From_statement; Hive extension (multiple inserts): from From_statementinsert OVERWRITE TABLE tablename1 [PARTITION (Partcol1=val1, Partcol2=val2 ...) [IF not EXISTS]] Select_statement1[insert OVERWRITE TABLE tablename2 [PARTITION ... [IF not EXISTS]] Select_statement2][insert into TABLE tablename2 [PARTITION ...] select_statement2] ...; From From_statementinsert to TABLE tablename1 [PARTITION (Partcol1=val1, Partcol2=val2 ...)] Select_statement1[insert into table tablename2 [PARTITION ...] select_statement2][insert OVERWRITE TABLE tablename2 [PARTITION ... [IF not EXISTS]] Select_statement2] ...; Hive extension (Dynamic partition Inserts): INSERT OVERWRITE TABLE tablename partition (Partcol1[=val1], Partcol2[=val2]. ..) Select_statement from FRom_statement;insert into TABLE tablename PARTITION (Partcol1[=val1], Partcol2[=val2] ...) select_statement from From_ Statement

The official website also gives us a lot of grammar, look at it is very scary, but after careful sorting and then come to see you will find that there is nothing, the following analysis:

    • 1 standard Syntax (syntax): INSERT OVERWRITE TABLE tablename1 select_statement1 from From_statement; is actually a simple insert statement.
    • 2. You can use the partition keyword to insert partitions.
    • 3.OVERWRITE choose whether to overwrite.
    • 4 Use the INSERT syntax to run Mr Jobs.

    • 5 Multiple inserts: Represents the MultiRow insert.

    • 6 Dynamic Partition inserts: Active partition insertion.

Note: There are two kinds of interpolation syntax, that is, add overwrite keyword and no difference.

# Insert overwritehive> Insert Overwrite table EMP2 SELECT * from EMP; Query ID = hadoop_20180624141010_3063d504-ff2f-4003-843f-7dca60f1dd7etotal jobs = 3Launching Job 1 out of 3...OKTime taken : 19.554 secondshive> SELECT * from emp2;ok7369 SMITH clerk 7902 1980-12-17 800.0 NULL 207499 AL LEN salesman 7698 1981-2-20 1600.0 300.0 307521 WARD salesman 7698 1981-2-22 12    50.0 500.0 307566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 207654 MARTIN salesman 7698 1981-9-28 1250.0 1400.0 307698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 307782 CLARK MANAGE     R 7839 1981-6-9 2450.0 null 107788 SCOTT ANALYST 7566 1987-4-19 3000.0 null 207839 KING President Null 1981-11-17 5000.0 null 107844 TURNER salesman 7698 1981-9-8 1500 .0 0.0 307876 ADAMS clerk 7788 1987-5-23 1100.0 NULL 207900 JAMES Clerk 7698 1981-12-3 950.0 NULL 307902 FORD ANALYST 7566 1981-12-3 3 000.0 null 207934 MILLER clerk 7782 1982-1-23 1300.0 NULL 10Time taken:0.143 seconds, fetched:14 Row (s) # Insert appends hive> INSERT INTO table EMP2 SELECT * from EMP; Query ID = hadoop_20180624141010_3063d504-ff2f-4003-843f-7dca60f1dd7etotal jobs = 3...OKTime taken:18.539 secondshive& Gt SELECT * FROM emp2;ok7369 SMITH clerk 7902 1980-12-17 800.0 NULL 207499 ALLEN salesman 769 8 1981-2-20 1600.0 300.0 307521 WARD salesman 7698 1981-2-22 1250.0 500.0 307566 J ONES MANAGER 7839 1981-4-2 2975.0 NULL 207654 MARTIN salesman 7698 1981-9-28 1250.0 1        400.0 307698 BLAKE Manager 7839 1981-5-1 2850.0 NULL 307782 CLARK manager 7839 1981-6-9 2450.0 null 107788 SCOTT ANALYST 7566 1987-4-19 3000.0 null    207839 KING President null 1981-11-17 5000.0 null 107844 TURNER salesman 7698 19   81-9-8 1500.0 0.0 307876 ADAMS clerk 7788 1987-5-23 1100.0 NULL 207900 JAMES Clerk   7698 1981-12-3 950.0 null 307902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 207934 MILLER     Clerk 7782 1982-1-23 1300.0 null 107369 SMITH clerk 7902 1980-12-17 800.0 NULL 207499       ALLEN salesman 7698 1981-2-20 1600.0 300.0 307521 WARD salesman 7698 1981-2-22 1250.0 500.0 307566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 207654 MARTIN salesman 76   98 1981-9-28 1250.0 1400.0 307698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 307782 CLARK    MANAGER 7839 1981-6-9 2450.0 null 107788 SCOTT ANALYST 7566 1987-4-19 3000.0 null 207839     KING President  NULL 1981-11-17 5000.0 null 107844 TURNER salesman 7698 1981-9-8 1500.0 0.0 307876     ADAMS clerk 7788 1987-5-23 1100.0 null 207900 JAMES clerk 7698 1981-12-3 950.0 NULL 307902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 207934 MILLER clerk 7782 1982-1-23 13 00.0 NULL 10Time taken:0.132 seconds, fetched:28 row (s)
    • Inserting values into tables (manually inserting one or more records, will run Mr Jobs infrequently)
    • Official usage:
      INSERT INTO TABLE tablename [PARTITION (partcol1[=val1], partcol2[=val2] ...)] VALUES values_row [, values_row ...]
hive> create table stu(    > id int,    > name string    > )    >  ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t‘;OKTime taken: 0.253 secondshive> insert into table stu values (1,"zhangsan");Query ID = hadoop_20180624141010_3063d504-ff2f-4003-843f-7dca60f1dd7eTotal jobs = 3...OKTime taken: 16.589 secondshive> select * from stu;OK1       zhangsanTime taken: 0.123 seconds, Fetched: 1 row(s)
3. Data export (Writing to the filesystem from queries)

The query results can be inserted into the file system via a statement:

    • The following is the syntax that is listed on the official website for us:
      Standard syntax:(标准语法)INSERT OVERWRITE [LOCAL] DIRECTORY directory1[ROW FORMAT row_format] [STORED AS file_format] (Note: Only available starting with Hive 0.11.0)SELECT ... FROM ...

Hive extension (multiple Inserts): (Export multiple records)
From From_statement
INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1
[INSERT OVERWRITE [LOCAL] DIRECTORY directory2 Select_statement2] ...

Row_format
: Delimited [TERMINATED by Char [escaped by Char]] [COLLECTION ITEMS TERMINATED by Char]
[MAP KEYS TERMINATED by Char] [LINES TERMINATED by Char]
[NULL DEFINED as Char] (Note:only available starting with Hive 0.13)

**local**: Add the Local keyword to import the native system without default import HDFs; **stored as**: You can specify a storage format.  ' shellhive> insert overwrite local directory '/home/hadoop/stu ' ROW FORMAT delimited fields TERMINATED by ' \ t ' select * FROM Stu; Query ID = hadoop_20180624153131_97271014-abcf-4e70-b318-7de85d27c97ftotal jobs = 1...OKTime taken:15.09 seconds# results [[ Email protected] stu]$ pwd/home/hadoop/stu[hadoop[email protected] stu]$ cat 000000_0 1 zhangsan# export multiple records hi Ve> from emp > INSERT OVERWRITE LOCAL DIRECTORY '/home/hadoop/tmp/hivetmp1 ' > ROW FORMAT delimited fields    TERMINATED by "\ t" > select empno, ename > INSERT OVERWRITE LOCAL DIRECTORY '/HOME/HADOOP/TMP/HIVETMP2 ' > ROW FORMAT delimited fields TERMINATED by "\ t" > select ename; Query ID = hadoop_20180624153131_97271014-abcf-4e70-b318-7de85d27c97ftotal jobs = 1...OKTime taken:16.261 seconds# results [[ Email protected] tmp]$ CD hivetmp1[[email protected] hivetmp1]$ lltotal 4-rw-r--r--1 Hadoop hadoop 154 June 24 1 5:000000_0[[email protected] hivetmp1]$ cat 000000_0 7369 SMITH7499 ALLEN7521 WARD7566 JONES7654 MART IN7698 BLAKE7782 CLARK7788 SCOTT7839 KING7844 TURNER7876 ADAMS7900 JAMES7902 FORD7934 miller[[e Mail protected] hivetmp1]$ CD. [[email protected] tmp]$ CD hivetmp2[[email protected] hivetmp2]$ cat 000000_0 Smithallenwardjonesmartinblakeclarkscottkingturneradamsjamesfordmiller

Ref: 79007784

Hive Base SQL Syntax (DML)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.