A new feature has been introduced in the Hive0.11.0 version that allows users to specify the delimiter for a column when the user outputs the hive query results to a file, whereas the previous version is not a separator between columns.
Before Hive0.11.0 the version is used as follows, cannot specify delimiter, default is \x01:
Hive (Hive) > insertoverwrite Local directory '/home/hadoop/export_hive ' select * from A;
Query ID =hadoop_20150627174342_64852f3a-56ed-48d5-a545-fc28f109be74
Total Jobs = 1
Launching Job 1 out of 1
Number of reduce tasks are set to 0since there ' s no reduce operator
Starting Job = job_1435392961740_0025,tracking URL = http://gpmaster:8088/proxy/application_1435392961740_0025/
Kill Command =/home/hadoop/hadoop-2.6.0/bin/hadoop Job-kill job_1435392961740_0025
Hadoop job information for Stage-1:number of mappers:1; Number of reducers:0
2015-06-27 17:43:54,395 Stage-1 map =0%, reduce = 0%
2015-06-27 17:44:07,615 Stage-1 map =100%, reduce = 0, cumulative CPU 4.82sec
MapReduce Total Cumulative CPU time:4seconds 820 msec
Ended Job = job_1435392961740_0025
Copying data to local directory/home/hadoop/export_hive
Copying data to local directory/home/hadoop/export_hive
MapReduce Jobs Launched:
Stage-stage-1: Map:1 Cumulative cpu:4.82 sec HDFs read:2416833 HDFs write:1188743success
Total MapReduce CPU time Spent:4seconds 820 msec
Ok
A.key A.value
Time taken:26.475 seconds
(It is created automatically when the directory does not exist)
To view the generated file contents:
[Email protected] export_hive]$ head-n 10/home/hadoop/export_hive/000000_0 | cat-a
2610^aaaanz$
32196^aaaaownz$
78606^aaaayxfz$
3804^aaaaz$
30102^aaabewez$
21744^aaabukz$
39666^aaabz$
1632^aaabz$
82464^aaabz$
88320^aaaccaz$
I printed the end of each line in the file with the $ symbol and delimiter ^a (that is, \x01) using the cat-a parameter.
Next, we use the new feature introduced in the Hive0.11.0 version to specify the delimiter between the output result columns:
Hive (Hive) > insertoverwrite Local directory '/home/hadoop/export_hive ' row format delimitedfields Terminated by ' * ' select * from A;
Query ID =hadoop_20150627180045_fced1513-8f1b-44a8-8e88-3cd678552aa5
Total Jobs = 1
Launching Job 1 out of 1
Number of reduce tasks are set to 0since there ' s no reduce operator
Starting Job = job_1435392961740_0028,tracking URL = http://gpmaster:8088/proxy/application_1435392961740_0028/
Kill Command =/home/hadoop/hadoop-2.6.0/bin/hadoop Job-kill job_1435392961740_0028
Hadoop job information for Stage-1:number of mappers:1; Number of reducers:0
2015-06-27 18:00:57,354 Stage-1 map =0%, reduce = 0%
2015-06-27 18:01:10,567 Stage-1 map =100%, reduce = 0, cumulative CPU 4.68sec
MapReduce Total Cumulative CPU time:4seconds 680 msec
Ended Job = job_1435392961740_0028
Copying data to local directory/home/hadoop/export_hive
Copying data to local directory/home/hadoop/export_hive
MapReduce Jobs Launched:
Stage-stage-1: Map:1 Cumulative cpu:4.68 sec HDFs read:2417042 HDFs write:1188743success
Total MapReduce CPU time Spent:4seconds 680 msec
Ok
A.key A.value
Time taken:26.607 seconds
After viewing the specified delimiter as *, the exported data is as follows:
[Email protected] export_hive]$ head-n10/home/hadoop/export_hive/000000_0
2610*aaanz
32196*aaaownz
78606*aaayxfz
3804*aaaz
30102*aabewez
21744*aabukz
39666*aabz
1632*aabz
82464*aabz
88320*aaccaz
You can see that the delimiter for the column is indeed the * number delimiter we specified.
If it is a composite type, such as the Struct,map type, you can also specify the corresponding delimiter:
Below we do an example of the actual operation of the following:
(1) Create a table of composite types
Hive (Hive) > CREATE TABLE userinfo (ID int,name string,job_listarray<string>,perf map<int,string >,info struct<address:string,size:int>);
(2) Constructing data (constructed with default delimiter)
1^a Xiao Ming ^ait engineer ^b teacher ^a10086^c normal ^b10010^c not normal ^a Beijing ^b130
2^a floret ^a babysitter ^b nurse ^a10086^c normal ^b10010^c normal ^a Nanjing ^b130
Comments:
\001 use ^a instead, \002 use ^c instead of ^b,\003
Using the VI editor, use CTRL + V and then CTRL + A to enter this control \001. In order, the \002 input is ctrl+v,ctrl+b, and so on.
(3) Import data
Hive > load Data local inpath '/home/hadoop/hivetestdata/userinfo.txt ' overwrite into table userinfo;
(4) Query data
Hive (Hive) > select * from UserInfo;
Ok
Userinfo.id userinfo.name userinfo.job_list Userinfo.perf userinfo.info
1 Xiaoming ["It engineer", "teacher"] {10086: "normal", 10010: "Not normal"} {"Address": "Beijing", "Size": 130}
2 Floret ["Nanny", "Nurse"] {10086: "normal", 10010: "Normal"} {"Address": "Nanjing", "Size": 130}
Time taken:0.088 seconds, Fetched:2 row (s)
(5) Export data
The delimiter we specify is:
Column delimiter is \ t
The map Keys delimiter is: (colon)
Collection Items Delimiter is:, (comma)
To execute an export command:
Hive (Hive) > Insert overwrite localdirectory '/home/hadoop/export_hive '
> Row formatdelimited fields terminated by ' \ t '
> Collectionitems terminated by ', '
> Map keysterminated by ': '
>select * from UserInfo;
View the exported data as:
[email protected] export_hive]$ cat 000000_0
1 Xiao Ming it engineer, teacher 10086: Normal, 10010: Not normal Beijing, 130
2 Little Flower Babysitter, nurse 10086: Normal, 10010: normal Nanjing, 130
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Export query results to a file of the specified delimiter in hive