Hive Quick Start
@ (HIVE) [HIVE] (a) Simple introduction
1. Create a table
CREATE table if not exists ljh_emp (
name string,
salary float,
gender string)
comment ' Basic information of a employee '
row format delimited fields terminated by ', ';
2. Prepare data files
Create the test directory with only one file in the directory, and the contents of the file are as follows:
Ljh,25000,male
Jediael,25000,male
Llq,15000,female
3, import the data into the table
Load data local inpath '/home/ljhn1829/test ' into table ljh_emp;
4, the contents of the query table
SELECT * from Ljh_emp;
OK
ljh 25000.0 male
jediael 25000.0
male llq 15000.0 Female Time taken:0.159 seconds, Fetched:3 row (s)
(ii) about separators
1. Default Separator
The line default delimiter in hive is \ n, the field delimiter is CTRL + A, and ctrl+b,ctrl+c can be used to separate array,struct,map, as detailed in the Hive Programming Guide P44.
Therefore, if the table is in the form without specifying row format delimited fields terminated by ', the default field delimiter is considered Ctrl + A.
There are 2 different solutions:
The first is to specify the separator when creating the table, as shown in the previous example,
The second is to use CTRL + A in the data file, see the following example
2. Use CTRL + a full separator in the data file
(1) Create a table
CREATE table Ljh_test_emp (name string, salary float, gender string);
(2) Prepare data files
Create the Test2 directory, with only one file in the directory, the contents of the file are as follows:
Ljh^a25000^amale
Jediael^a25000^amale
Llq^a15000^afemale
One of the ^a characters can only be seen in VI, cat can not see.
The output ^a method is: In VI insert mode, first press CTRL + V, then press CTRL + A
(3) Import the data into the table
CREATE table Ljh_test_emp (name string, salary float, gender string);
(4) Query data
Hive> select * from Ljh_test_emp;
OK
ljh 25000.0 male
jediael 25000.0
male llq 15000.0 female time taken : 0.2 seconds, Fetched:3 row (s)
3, the separator is not specified, and CTRL + A is not used as a separator in the file, the following error occurred
(1) Create a table
CREATE table if not exists ljh_emp_test (
name string,
salary float,
gender string)
comment ' Basic Information of a employee ';
(2) Preparing data
Ljh,25000,male
Jediael,25000,male
Llq,15000,female
(3) Import the data into the table
Load data local inpath '/home/ljhn1829/test ' into table ljh_emp_test;
(4) View the data in the table
SELECT * from Ljh_emp_test;
OK
ljh,25000,male null null
jediael,25000,male null
Llq,15000,female NULL NULL time
taken:0.185 seconds, Fetched:3 row (s)
As you can see, because the delimiter is CTRL + A, each row in the file is treated as the first field when the data is imported, resulting in the subsequent 2 fields being null.
(iii) a more complex table
1. Create a table
CREATE TABLE Employees (
name string,
slalary float,
suboddinates array<string>,
deductions Map <string,float>, Address
struct<stree:string, city:string, state:string, zip:int>
)
Partitioned by (Country string, state string);
2. Prepare data
John doe^a100001.1^amary smith^btodd jones^afederal taxes^c.2^bstatetaxes^c.05^binsurance^c.1^a1 Michigan Ave.^ bchicago^bil^b60600
Mary smith^a80000.0^abill king^afederal taxes^c.2^bstate taxes^c.05^binsurance^c.1^a100 Ontario st.^bchicago^bil^b60601
Todd jones^a70000.0^a^afederal taxes^c.15^bstate taxes^c.03^binsurance^c.1^ A200 Chicago ave.^boak park^bil^b60700
Bill king^a60001.0^a^afederal taxes^c.15^bstate taxes^c.03^binsurance^ c.1^a300 obscure dr.^bobscuria^bil^b60100
Note ^a: Separating fields ^b: separating elements in Array/struct/map ^c: Separating the KV in the map
See "Hive Programming Guide" P44.
3, import the data into the table
Load data local inpath '/home/ljhn1829/phd ' into table employees partition (country= ' Us ', state= ' Ca ');
4. View Table Data
Hive> SELECT * FROM Employees;
OK
John Doe 100001.1 ["Mary Smith", "Todd Jones"] {"The taxes": 0.2, "statetaxes": 0.05, "Insurance ": 0.1} {" Stree ":" 1 Michigan Ave. "," City ":" Chicago "," state ":" IL "," Zip ": 60600} US ca
Mary Smith 80000.0 ["Bill King"] {"The taxes": 0.2, "state taxes": 0.05, "Insurance": 0.1} {"Stree": "Ontario St.", "City": "Chicago", "state": "IL", "Zip": 60601} US CA
Todd Jones 70000.0 [] {"Taxes": 0.15, "state taxes": 0.03, "Insurance": 0.1} {"Stree": "Chicago Ave.", "City": "Oak Park", "state": "IL", "Zip": 60700} US ca
Bill King 60001.0 [] {"Taxes": 0.15, "state taxes": 0.03, "Insurance": 0.1} {"Stree": "Obscure Dr.", " City ': ' Obscuria ', ' state ': ' IL ', ' Zip ': 60100} US ca time
taken:0.312 seconds, Fetched:4 row (s)
5, view the files in the HDFs
Hadoop fs-ls/data/gamein/g4_us/meta/employees/country=us/state=ca
Found 1 Items
-rwxr-x--- 3 ljhn1829 G4_us 428 2015-05-12 12:49/data/gamein/g4_us/meta/employees/country=us/state=ca/progamming_hive_data.txt
The content in this file is consistent with the original file.
(iv) Inserting data through a SELECT clause
1. Create a table
CREATE TABLE Employees2 (
name string,
slalary float,
suboddinates array<string>,
deductions Map<string,float>, Address
struct<stree:string, city:string, state:string, zip:int>
)
Partitioned by (Country string, state string);
2. Inserting data
Hive> set hive.exec.dynamic.partition.mode=nonstrict;
Otherwise, the following exceptions occur:
failed:semanticexception [Error 10096]: Dynamic partition Strict mode requires at least one static partition col Umn. To turn this off set hive.exec.dynamic.partition.mode=nonstrict inserts into table Employees2 partition (country,state) SE
Lect name,slalary,suboddinates,deductions,address, E.country, e.state from Employees e;