Hive中將查詢結果匯出到指定分隔字元的檔案中，

最後更新：2015-06-28 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

在Hive0.11.0版本中新引進了一個新的特性，當使用者將Hive查詢結果輸出到檔案中時，使用者可以指定列的分割符，而在之前的版本是不能指定列之間的分隔字元。

在Hive0.11.0之前版本如下使用，無法指定分隔字元，預設為\x01：

hive (hive)> insertoverwrite local directory '/home/hadoop/export_hive' select * from a;

Query ID =hadoop_20150627174342_64852f3a-56ed-48d5-a545-fc28f109be74

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks is set to 0since there's no reduce operator

Starting Job = job_1435392961740_0025,Tracking URL = http://gpmaster:8088/proxy/application_1435392961740_0025/

Kill Command =/home/hadoop/hadoop-2.6.0/bin/hadoop job -kill job_1435392961740_0025

Hadoop job information for Stage-1:number of mappers: 1; number of reducers: 0

2015-06-27 17:43:54,395 Stage-1 map =0%, reduce = 0%

2015-06-27 17:44:07,615 Stage-1 map =100%, reduce = 0%, Cumulative CPU 4.82sec

MapReduce Total cumulative CPU time: 4seconds 820 msec

Ended Job = job_1435392961740_0025

Copying data to local directory /home/hadoop/export_hive

Copying data to local directory/home/hadoop/export_hive

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1 Cumulative CPU: 4.82 sec HDFS Read: 2416833 HDFS Write: 1188743SUCCESS

Total MapReduce CPU Time Spent: 4seconds 820 msec

a.key a.value

Time taken: 26.475 seconds

(目錄不存在時，會自動建立)

查看產生的檔案內容：

[hadoop@gpmaster export_hive]$ head-n 10 /home/hadoop/export_hive/000000_0 | cat -A

2610^AaAAnz$

32196^AaAAoWnz$

78606^AaAAyXFz$

3804^AaAAz$

30102^AaABEWez$

21744^AaABukz$

39666^AaABz$

1632^AaABz$

82464^AaABz$

88320^AaACCaz$

我使用cat -A參數，將檔案中每行的結尾$符號和分隔字元^A(即是\x01)列印了出來。

接下來，我們使用Hive0.11.0版本新引進的新特性，指定輸出結果列之間的分隔字元：

hive (hive)> insertoverwrite local directory '/home/hadoop/export_hive' row format delimitedfields terminated by '*' select * from a;

Query ID =hadoop_20150627180045_fced1513-8f1b-44a8-8e88-3cd678552aa5

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks is set to 0since there's no reduce operator

Starting Job = job_1435392961740_0028,Tracking URL = http://gpmaster:8088/proxy/application_1435392961740_0028/

Kill Command =/home/hadoop/hadoop-2.6.0/bin/hadoop job -kill job_1435392961740_0028

Hadoop job information for Stage-1:number of mappers: 1; number of reducers: 0

2015-06-27 18:00:57,354 Stage-1 map =0%, reduce = 0%

2015-06-27 18:01:10,567 Stage-1 map =100%, reduce = 0%, Cumulative CPU 4.68sec

MapReduce Total cumulative CPU time: 4seconds 680 msec

Ended Job = job_1435392961740_0028

Copying data to local directory/home/hadoop/export_hive

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1 Cumulative CPU: 4.68 sec HDFS Read: 2417042 HDFS Write: 1188743SUCCESS

Total MapReduce CPU Time Spent: 4seconds 680 msec

a.key a.value

Time taken: 26.607 seconds

查看指定分隔字元為*後，匯出的資料如下：

[hadoop@gpmaster export_hive]$ head -n10 /home/hadoop/export_hive/000000_0

2610*aAAnz

32196*aAAoWnz

78606*aAAyXFz

3804*aAAz

30102*aABEWez

21744*aABukz

39666*aABz

1632*aABz

82464*aABz

88320*aACCaz

可以看到列的分隔字元的確是我們指定的*號分隔字元。

如果是複合類型，比如struct，map類型等也可以指定對應的分隔字元：

以下我們做個執行個體操作實際操作以下：

(1) 建立複合類型的表

hive (hive)> create table userinfo(id int,name string,job_listarray<string>,perf map<int,string>,info struct<address:STRING,size:INT>);

(2) 構造資料（使用預設分隔符號構造）

1^A小明^AIT工程師^B教師^A10086^C正常^B10010^C不正常^A北京市^B130

2^A小花^A保姆^B護士^A10086^C正常^B10010^C正常^A南京市^B130

注釋：

\001使用^A代替，\002使用^B，\003使用^C代替

造資料在使用vi編輯器裡面，用ctrl+v然後再ctrl+a可以輸入這個控制符\001。按順序，\002的輸入方式為ctrl+v,ctrl+b，依次類推。

(3) 匯入資料

hive (hive)> load data local inpath'/home/hadoop/hivetestdata/userinfo.txt' overwrite into table userinfo;

(4) 查詢資料

hive (hive)> select * from userinfo;

userinfo.id userinfo.name userinfo.job_list userinfo.perf userinfo.info

1 小明 ["IT工程師","教師"] {10086:"正常",10010:"不正常"} {"address":"北京市","size":130}

2 小花 ["保姆","護士"] {10086:"正常",10010:"正常"} {"address":"南京市","size":130}

Time taken: 0.088 seconds, Fetched: 2 row(s)

(5) 匯出資料

我們指定的分隔字元為：

資料行分隔符號為 \t

map keys分隔字元為：（冒號）

collection items分隔字元為：，（逗號）

執行匯出命令：

hive (hive)> insert overwrite localdirectory '/home/hadoop/export_hive'

> row formatdelimited fields terminated by '\t'

> collectionitems terminated by ','

> map keysterminated by ':'

>select * from userinfo;

查看匯出的資料為：

[hadoop@gpmaster export_hive]$ cat 000000_0

1 小明 IT工程師,教師 10086:正常,10010:不正常北京市,130

2 小花保姆,護士 10086:正常,10010:正常南京市,130

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More