Hive中將查詢結果匯出到指定分隔字元的檔案中,
在Hive0.11.0版本中新引進了一個新的特性,當使用者將Hive查詢結果輸出到檔案中時,使用者可以指定列的分割符,而在之前的版本是不能指定列之間的分隔字元。
在Hive0.11.0之前版本如下使用,無法指定分隔字元,預設為\x01:
hive (hive)> insertoverwrite local directory '/home/hadoop/export_hive' select * from a;
Query ID =hadoop_20150627174342_64852f3a-56ed-48d5-a545-fc28f109be74
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0since there's no reduce operator
Starting Job = job_1435392961740_0025,Tracking URL = http://gpmaster:8088/proxy/application_1435392961740_0025/
Kill Command =/home/hadoop/hadoop-2.6.0/bin/hadoop job -kill job_1435392961740_0025
Hadoop job information for Stage-1:number of mappers: 1; number of reducers: 0
2015-06-27 17:43:54,395 Stage-1 map =0%, reduce = 0%
2015-06-27 17:44:07,615 Stage-1 map =100%, reduce = 0%, Cumulative CPU 4.82sec
MapReduce Total cumulative CPU time: 4seconds 820 msec
Ended Job = job_1435392961740_0025
Copying data to local directory /home/hadoop/export_hive
Copying data to local directory/home/hadoop/export_hive
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 4.82 sec HDFS Read: 2416833 HDFS Write: 1188743SUCCESS
Total MapReduce CPU Time Spent: 4seconds 820 msec
OK
a.key a.value
Time taken: 26.475 seconds
(目錄不存在時,會自動建立)
查看產生的檔案內容:
[hadoop@gpmaster export_hive]$ head-n 10 /home/hadoop/export_hive/000000_0 | cat -A
2610^AaAAnz$
32196^AaAAoWnz$
78606^AaAAyXFz$
3804^AaAAz$
30102^AaABEWez$
21744^AaABukz$
39666^AaABz$
1632^AaABz$
82464^AaABz$
88320^AaACCaz$
我使用cat -A參數,將檔案中每行的結尾$符號和分隔字元^A(即是\x01)列印了出來。
接下來,我們使用Hive0.11.0版本新引進的新特性,指定輸出結果列之間的分隔字元:
hive (hive)> insertoverwrite local directory '/home/hadoop/export_hive' row format delimitedfields terminated by '*' select * from a;
Query ID =hadoop_20150627180045_fced1513-8f1b-44a8-8e88-3cd678552aa5
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0since there's no reduce operator
Starting Job = job_1435392961740_0028,Tracking URL = http://gpmaster:8088/proxy/application_1435392961740_0028/
Kill Command =/home/hadoop/hadoop-2.6.0/bin/hadoop job -kill job_1435392961740_0028
Hadoop job information for Stage-1:number of mappers: 1; number of reducers: 0
2015-06-27 18:00:57,354 Stage-1 map =0%, reduce = 0%
2015-06-27 18:01:10,567 Stage-1 map =100%, reduce = 0%, Cumulative CPU 4.68sec
MapReduce Total cumulative CPU time: 4seconds 680 msec
Ended Job = job_1435392961740_0028
Copying data to local directory/home/hadoop/export_hive
Copying data to local directory/home/hadoop/export_hive
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 4.68 sec HDFS Read: 2417042 HDFS Write: 1188743SUCCESS
Total MapReduce CPU Time Spent: 4seconds 680 msec
OK
a.key a.value
Time taken: 26.607 seconds
查看指定分隔字元為*後,匯出的資料如下:
[hadoop@gpmaster export_hive]$ head -n10 /home/hadoop/export_hive/000000_0
2610*aAAnz
32196*aAAoWnz
78606*aAAyXFz
3804*aAAz
30102*aABEWez
21744*aABukz
39666*aABz
1632*aABz
82464*aABz
88320*aACCaz
可以看到列的分隔字元的確是我們指定的*號分隔字元。
如果是複合類型,比如struct,map類型等也可以指定對應的分隔字元:
以下我們做個執行個體操作實際操作以下:
(1) 建立複合類型的表
hive (hive)> create table userinfo(id int,name string,job_listarray<string>,perf map<int,string>,info struct<address:STRING,size:INT>);
(2) 構造資料(使用預設分隔符號構造)
1^A小明^AIT工程師^B教師^A10086^C正常^B10010^C不正常^A北京市^B130
2^A小花^A保姆^B護士^A10086^C正常^B10010^C正常^A南京市^B130
注釋:
\001使用^A代替,\002使用^B,\003使用^C代替
造資料在使用vi編輯器裡面,用ctrl+v然後再ctrl+a可以輸入這個控制符\001。按順序,\002的輸入方式為ctrl+v,ctrl+b,依次類推。
(3) 匯入資料
hive (hive)> load data local inpath'/home/hadoop/hivetestdata/userinfo.txt' overwrite into table userinfo;
(4) 查詢資料
hive (hive)> select * from userinfo;
OK
userinfo.id userinfo.name userinfo.job_list userinfo.perf userinfo.info
1 小明 ["IT工程師","教師"] {10086:"正常",10010:"不正常"} {"address":"北京市","size":130}
2 小花 ["保姆","護士"] {10086:"正常",10010:"正常"} {"address":"南京市","size":130}
Time taken: 0.088 seconds, Fetched: 2 row(s)
(5) 匯出資料
我們指定的分隔字元為:
資料行分隔符號為 \t
map keys分隔字元為:(冒號)
collection items分隔字元為:,(逗號)
執行匯出命令:
hive (hive)> insert overwrite localdirectory '/home/hadoop/export_hive'
> row formatdelimited fields terminated by '\t'
> collectionitems terminated by ','
> map keysterminated by ':'
>select * from userinfo;
查看匯出的資料為:
[hadoop@gpmaster export_hive]$ cat 000000_0
1 小明 IT工程師,教師 10086:正常,10010:不正常 北京市,130
2 小花 保姆,護士 10086:正常,10010:正常 南京市,130