DML operations (Data manipulation Language)
refer to the official documentation: DML document
- Because update and delete are not commonly used in hive, this article does not explain. This article mainly describes load and insert operations.
1. Load (loading data)
The load function is to load the file into the table (Loading files into tables)
The following is the syntax that is listed on the official website for us:
LOAD DATA [LOCAL] INPATH ‘filepath‘ [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]LOAD DATA [LOCAL] INPATH ‘filepath‘ [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)] [INPUTFORMAT ‘inputformat‘ SERDE ‘serde‘] (3.0 or later)
1. When loading data into a table, hive does not make any conversions. The load operation simply copies or moves the data, moving the data file to the appropriate location in the Hive table.
2. The loaded target can be either a table or a partition. If the table is partitioned, you must specify the partition of a table by specifying the values of all the partitioning columns.
- 3.filepath can be a file, or it can be a directory. Under no circumstances, FilePath is considered a collection of files.
Local: Indicates that the input file is on the native file system (Linux), and if no local,hive is added, it will go to HDFs to find the file.
OVERWRITE: Indicates that if there is data in the table, the data is deleted first, then the new data is inserted, and if there is no such keyword, the data is appended directly to the table.
PARTITION: If there is a partition in the table, you can import it by partition.
# Create an Employee table hive> CREATE TABLE emp > (empno int, ename string, job string, Mgr Int, hiredate string, salary double, Comm Double, Deptno int) > ROW FORMAT delimited > Fields TERMINATED by ' \ t '; Oktime taken:0.651 seconds# Emp.txt Import table in the filesystem hive> LOAD DATA LOCAL inpath '/home/hadoop/emp.txt ' OVERWRITE into table emp; Loading data to table default.emptable default.emp stats: [Numfiles=1, Numrows=0, totalsize=886, Rawdatasize=0]oktime Tak en:1.848 seconds# View Table Data hive> select * from emp;ok7369 SMITH clerk 7902 1980-12-17 800.0 NULL 20749 9 ALLEN salesman 7698 1981-2-20 1600.0 300.0 307521 WARD salesman 7698 1981-2-22 1250.0 500.0 307566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 207654 MARTIN salesman 7 698 1981-9-28 1250.0 1400.0 307698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 307782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 107788 SCOTT ANALYST 7566 1987-4-19 3000.0 null 207839 KING President Null 1981-11-17 5000.0 NULL 107844 TURNER salesman 7698 1981-9-8 1500.0 0.0 307876 ADAMS Clerk 7788 1987-5-23 1100.0 null 207900 JAMES clerk 7698 1981-12-3 950.0 null 307902 FORD ANA LYST 7566 1981-12-3 3000.0 null 207934 MILLER clerk 7782 1982-1-23 1300.0 NULL 10# no Overw Rite keyword hive> LOAD DATA LOCAL inpath '/home/hadoop/emp.txt ' into TABLE emp;# view hive> select * from emp;ok7369 again SMITH Clerk 7902 1980-12-17 800.0 NULL 207499 ALLEN salesman 7698 1981-2-20 1600.0 300.0 307521 WARD salesman 7698 1981-2-22 1250.0 500.0 307566 JONES MANAGER 7839 1981-4 -2 2975.0 NULL 207654 MARTIN salesman 7698 1981-9-28 1250.0 1400.0 307698 BLAKE MANA GER 7839 1981-5-1 2850.0 null 307782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 107788 SCOTT ANALYST 7566 19 87-4-19 3000.0 NULL 207839 KING President null 1981-11-17 5000.0 NULL 107844 TURNER Salesman 7698 1981-9-8 1500.0 0.0 307876 ADAMS clerk 7788 1987-5-23 1100.0 NULL 207900 JAMES Clerk 7698 1981-12-3 950.0 NULL 307902 FORD ANALYST 7566 1981-12-3 3000.0 Null 207934 MILLER clerk 7782 1982-1-23 1300.0 NULL 107369 SMITH clerk 7902 1980-12-17 800.0 NULL 207499 ALLEN salesman 7698 1981-2-20 1600.0 300.0 307521 WARD salesman 7698 1981-2-22 1250.0 500.0 307566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 207654 MARTIN salesman 7698 1981-9-28 1250.0 1400.0 307698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 307782 CLARK MANAGER 7839 1981-6-9 2450.0 null 107788 SCOTT ANALYST 7566 1987-4-19 3000.0 null 207839 KING president null 1981-11-17 5000.0 null 107844 TURNER salesman 7698 1981-9-8 1500.0 0.0 307876 ADAMS clerk 7788 1987-5-23 1100.0 NULL 207900 JAMES Clerk 7698 19 81-12-3 950.0 null 307902 FORD ANALYST 7566 1981-12-3 3000.0 null 207934 MILLER Clerk 7782 1982-1-23 1300.0 NULL 10Time taken:0.137 seconds, fetched:28 row (s) # again overwrite overwrite import hive> LOAD DATA LOCAL inpath '/home/hadoop/emp.txt ' OVERWRITE into TABLE emp; # Discovery data is overwritten hive> select * from emp;ok7369 SMITH clerk 7902 1980-12-17 800.0 NULL 207499 ALLEN S Alesman 7698 1981-2-20 1600.0 300.0 307521 WARD salesman 7698 1981-2-22 1250.0 5 00.0 307566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 207654MARTIN salesman 7698 1981-9-28 1250.0 1400.0 307698 BLAKE MANAGER 7839 1981-5-1 2850.0 Null 307782 CLARK MANAGER 7839 1981-6-9 2450.0 null 107788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 207839 KING President null 1981-11-17 5000.0 null 107844 TURNER salesman 7698 1981-9-8 1500.0 0.0 307876 ADAMS clerk 7788 1987-5-23 1100.0 NULL 207900 JA MES clerk 7698 1981-12-3 950.0 null 307902 FORD ANALYST 7566 1981-12-3 3000.0 null 20 7934 MILLER Clerk 7782 1982-1-23 1300.0 NULL 10Time taken:0.164 seconds, fetched:14 row (s)
2. Insert into to table (Inserting data into Hive Tables from queries)
- The following is the syntax that is listed on the official website for us:
Standard Syntax:insert OVERWRITE TABLE tablename1 [PARTITION (Partcol1=val1, Partcol2=val2 ...) [IF not EXISTS]] Select_statement1 from From_statement;insert to TABLE tablename1 [PARTITION (Partcol1=val1, Partcol2=val2 ...)] Select _statement1 from From_statement; Hive extension (multiple inserts): from From_statementinsert OVERWRITE TABLE tablename1 [PARTITION (Partcol1=val1, Partcol2=val2 ...) [IF not EXISTS]] Select_statement1[insert OVERWRITE TABLE tablename2 [PARTITION ... [IF not EXISTS]] Select_statement2][insert into TABLE tablename2 [PARTITION ...] select_statement2] ...; From From_statementinsert to TABLE tablename1 [PARTITION (Partcol1=val1, Partcol2=val2 ...)] Select_statement1[insert into table tablename2 [PARTITION ...] select_statement2][insert OVERWRITE TABLE tablename2 [PARTITION ... [IF not EXISTS]] Select_statement2] ...; Hive extension (Dynamic partition Inserts): INSERT OVERWRITE TABLE tablename partition (Partcol1[=val1], Partcol2[=val2]. ..) Select_statement from FRom_statement;insert into TABLE tablename PARTITION (Partcol1[=val1], Partcol2[=val2] ...) select_statement from From_ Statement
The official website also gives us a lot of grammar, look at it is very scary, but after careful sorting and then come to see you will find that there is nothing, the following analysis:
Note: There are two kinds of interpolation syntax, that is, add overwrite keyword and no difference.
# Insert overwritehive> Insert Overwrite table EMP2 SELECT * from EMP; Query ID = hadoop_20180624141010_3063d504-ff2f-4003-843f-7dca60f1dd7etotal jobs = 3Launching Job 1 out of 3...OKTime taken : 19.554 secondshive> SELECT * from emp2;ok7369 SMITH clerk 7902 1980-12-17 800.0 NULL 207499 AL LEN salesman 7698 1981-2-20 1600.0 300.0 307521 WARD salesman 7698 1981-2-22 12 50.0 500.0 307566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 207654 MARTIN salesman 7698 1981-9-28 1250.0 1400.0 307698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 307782 CLARK MANAGE R 7839 1981-6-9 2450.0 null 107788 SCOTT ANALYST 7566 1987-4-19 3000.0 null 207839 KING President Null 1981-11-17 5000.0 null 107844 TURNER salesman 7698 1981-9-8 1500 .0 0.0 307876 ADAMS clerk 7788 1987-5-23 1100.0 NULL 207900 JAMES Clerk 7698 1981-12-3 950.0 NULL 307902 FORD ANALYST 7566 1981-12-3 3 000.0 null 207934 MILLER clerk 7782 1982-1-23 1300.0 NULL 10Time taken:0.143 seconds, fetched:14 Row (s) # Insert appends hive> INSERT INTO table EMP2 SELECT * from EMP; Query ID = hadoop_20180624141010_3063d504-ff2f-4003-843f-7dca60f1dd7etotal jobs = 3...OKTime taken:18.539 secondshive& Gt SELECT * FROM emp2;ok7369 SMITH clerk 7902 1980-12-17 800.0 NULL 207499 ALLEN salesman 769 8 1981-2-20 1600.0 300.0 307521 WARD salesman 7698 1981-2-22 1250.0 500.0 307566 J ONES MANAGER 7839 1981-4-2 2975.0 NULL 207654 MARTIN salesman 7698 1981-9-28 1250.0 1 400.0 307698 BLAKE Manager 7839 1981-5-1 2850.0 NULL 307782 CLARK manager 7839 1981-6-9 2450.0 null 107788 SCOTT ANALYST 7566 1987-4-19 3000.0 null 207839 KING President null 1981-11-17 5000.0 null 107844 TURNER salesman 7698 19 81-9-8 1500.0 0.0 307876 ADAMS clerk 7788 1987-5-23 1100.0 NULL 207900 JAMES Clerk 7698 1981-12-3 950.0 null 307902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 207934 MILLER Clerk 7782 1982-1-23 1300.0 null 107369 SMITH clerk 7902 1980-12-17 800.0 NULL 207499 ALLEN salesman 7698 1981-2-20 1600.0 300.0 307521 WARD salesman 7698 1981-2-22 1250.0 500.0 307566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 207654 MARTIN salesman 76 98 1981-9-28 1250.0 1400.0 307698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 307782 CLARK MANAGER 7839 1981-6-9 2450.0 null 107788 SCOTT ANALYST 7566 1987-4-19 3000.0 null 207839 KING President NULL 1981-11-17 5000.0 null 107844 TURNER salesman 7698 1981-9-8 1500.0 0.0 307876 ADAMS clerk 7788 1987-5-23 1100.0 null 207900 JAMES clerk 7698 1981-12-3 950.0 NULL 307902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 207934 MILLER clerk 7782 1982-1-23 13 00.0 NULL 10Time taken:0.132 seconds, fetched:28 row (s)
hive> create table stu( > id int, > name string > ) > ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t‘;OKTime taken: 0.253 secondshive> insert into table stu values (1,"zhangsan");Query ID = hadoop_20180624141010_3063d504-ff2f-4003-843f-7dca60f1dd7eTotal jobs = 3...OKTime taken: 16.589 secondshive> select * from stu;OK1 zhangsanTime taken: 0.123 seconds, Fetched: 1 row(s)
3. Data export (Writing to the filesystem from queries)
The query results can be inserted into the file system via a statement:
Hive extension (multiple Inserts): (Export multiple records)
From From_statement
INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1
[INSERT OVERWRITE [LOCAL] DIRECTORY directory2 Select_statement2] ...
Row_format
: Delimited [TERMINATED by Char [escaped by Char]] [COLLECTION ITEMS TERMINATED by Char]
[MAP KEYS TERMINATED by Char] [LINES TERMINATED by Char]
[NULL DEFINED as Char] (Note:only available starting with Hive 0.13)
**local**: Add the Local keyword to import the native system without default import HDFs; **stored as**: You can specify a storage format. ' shellhive> insert overwrite local directory '/home/hadoop/stu ' ROW FORMAT delimited fields TERMINATED by ' \ t ' select * FROM Stu; Query ID = hadoop_20180624153131_97271014-abcf-4e70-b318-7de85d27c97ftotal jobs = 1...OKTime taken:15.09 seconds# results [[ Email protected] stu]$ pwd/home/hadoop/stu[hadoop[email protected] stu]$ cat 000000_0 1 zhangsan# export multiple records hi Ve> from emp > INSERT OVERWRITE LOCAL DIRECTORY '/home/hadoop/tmp/hivetmp1 ' > ROW FORMAT delimited fields TERMINATED by "\ t" > select empno, ename > INSERT OVERWRITE LOCAL DIRECTORY '/HOME/HADOOP/TMP/HIVETMP2 ' > ROW FORMAT delimited fields TERMINATED by "\ t" > select ename; Query ID = hadoop_20180624153131_97271014-abcf-4e70-b318-7de85d27c97ftotal jobs = 1...OKTime taken:16.261 seconds# results [[ Email protected] tmp]$ CD hivetmp1[[email protected] hivetmp1]$ lltotal 4-rw-r--r--1 Hadoop hadoop 154 June 24 1 5:000000_0[[email protected] hivetmp1]$ cat 000000_0 7369 SMITH7499 ALLEN7521 WARD7566 JONES7654 MART IN7698 BLAKE7782 CLARK7788 SCOTT7839 KING7844 TURNER7876 ADAMS7900 JAMES7902 FORD7934 miller[[e Mail protected] hivetmp1]$ CD. [[email protected] tmp]$ CD hivetmp2[[email protected] hivetmp2]$ cat 000000_0 Smithallenwardjonesmartinblakeclarkscottkingturneradamsjamesfordmiller
Ref: 79007784
Hive Base SQL Syntax (DML)