Export query results to a file of the specified delimiter in hive

Last Update:2015-06-27 Source: Internet

Author: User

Tags ming

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

A new feature has been introduced in the Hive0.11.0 version that allows users to specify the delimiter for a column when the user outputs the hive query results to a file, whereas the previous version is not a separator between columns.

Before Hive0.11.0 the version is used as follows, cannot specify delimiter, default is \x01:

Hive (Hive) > insertoverwrite Local directory '/home/hadoop/export_hive ' select * from A;

Query ID =hadoop_20150627174342_64852f3a-56ed-48d5-a545-fc28f109be74

Total Jobs = 1

Launching Job 1 out of 1

Number of reduce tasks are set to 0since there ' s no reduce operator

Starting Job = job_1435392961740_0025,tracking URL = http://gpmaster:8088/proxy/application_1435392961740_0025/

Kill Command =/home/hadoop/hadoop-2.6.0/bin/hadoop Job-kill job_1435392961740_0025

Hadoop job information for Stage-1:number of mappers:1; Number of reducers:0

2015-06-27 17:43:54,395 Stage-1 map =0%, reduce = 0%

2015-06-27 17:44:07,615 Stage-1 map =100%, reduce = 0, cumulative CPU 4.82sec

MapReduce Total Cumulative CPU time:4seconds 820 msec

Ended Job = job_1435392961740_0025

Copying data to local directory/home/hadoop/export_hive

MapReduce Jobs Launched:

Stage-stage-1: Map:1 Cumulative cpu:4.82 sec HDFs read:2416833 HDFs write:1188743success

Total MapReduce CPU time Spent:4seconds 820 msec

A.key A.value

Time taken:26.475 seconds

(It is created automatically when the directory does not exist)

To view the generated file contents:

[Email protected] export_hive]$ head-n 10/home/hadoop/export_hive/000000_0 | cat-a

2610^aaaanz$

32196^aaaaownz$

78606^aaaayxfz$

3804^aaaaz$

30102^aaabewez$

21744^aaabukz$

39666^aaabz$

1632^aaabz$

82464^aaabz$

88320^aaaccaz$

I printed the end of each line in the file with the $ symbol and delimiter ^a (that is, \x01) using the cat-a parameter.

Next, we use the new feature introduced in the Hive0.11.0 version to specify the delimiter between the output result columns:

Hive (Hive) > insertoverwrite Local directory '/home/hadoop/export_hive ' row format delimitedfields Terminated by ' * ' select * from A;

Query ID =hadoop_20150627180045_fced1513-8f1b-44a8-8e88-3cd678552aa5

Total Jobs = 1

Launching Job 1 out of 1

Number of reduce tasks are set to 0since there ' s no reduce operator

Starting Job = job_1435392961740_0028,tracking URL = http://gpmaster:8088/proxy/application_1435392961740_0028/

Kill Command =/home/hadoop/hadoop-2.6.0/bin/hadoop Job-kill job_1435392961740_0028

Hadoop job information for Stage-1:number of mappers:1; Number of reducers:0

2015-06-27 18:00:57,354 Stage-1 map =0%, reduce = 0%

2015-06-27 18:01:10,567 Stage-1 map =100%, reduce = 0, cumulative CPU 4.68sec

MapReduce Total Cumulative CPU time:4seconds 680 msec

Ended Job = job_1435392961740_0028

Copying data to local directory/home/hadoop/export_hive

MapReduce Jobs Launched:

Stage-stage-1: Map:1 Cumulative cpu:4.68 sec HDFs read:2417042 HDFs write:1188743success

Total MapReduce CPU time Spent:4seconds 680 msec

A.key A.value

Time taken:26.607 seconds

After viewing the specified delimiter as *, the exported data is as follows:

[Email protected] export_hive]$ head-n10/home/hadoop/export_hive/000000_0

2610*aaanz

32196*aaaownz

78606*aaayxfz

3804*aaaz

30102*aabewez

21744*aabukz

39666*aabz

1632*aabz

82464*aabz

88320*aaccaz

You can see that the delimiter for the column is indeed the * number delimiter we specified.

If it is a composite type, such as the Struct,map type, you can also specify the corresponding delimiter:

Below we do an example of the actual operation of the following:

(1) Create a table of composite types

Hive (Hive) > CREATE TABLE userinfo (ID int,name string,job_listarray<string>,perf map<int,string >,info struct<address:string,size:int>);

(2) Constructing data (constructed with default delimiter)

1^a Xiao Ming ^ait engineer ^b teacher ^a10086^c normal ^b10010^c not normal ^a Beijing ^b130

2^a floret ^a babysitter ^b nurse ^a10086^c normal ^b10010^c normal ^a Nanjing ^b130

Comments:

\001 use ^a instead, \002 use ^c instead of ^b,\003

Using the VI editor, use CTRL + V and then CTRL + A to enter this control \001. In order, the \002 input is ctrl+v,ctrl+b, and so on.

(3) Import data

Hive > load Data local inpath '/home/hadoop/hivetestdata/userinfo.txt ' overwrite into table userinfo;

(4) Query data

Hive (Hive) > select * from UserInfo;

Userinfo.id userinfo.name userinfo.job_list Userinfo.perf userinfo.info

1 Xiaoming ["It engineer", "teacher"] {10086: "normal", 10010: "Not normal"} {"Address": "Beijing", "Size": 130}

2 Floret ["Nanny", "Nurse"] {10086: "normal", 10010: "Normal"} {"Address": "Nanjing", "Size": 130}

Time taken:0.088 seconds, Fetched:2 row (s)

(5) Export data

The delimiter we specify is:

Column delimiter is \ t

The map Keys delimiter is: (colon)

Collection Items Delimiter is:, (comma)

To execute an export command:

Hive (Hive) > Insert overwrite localdirectory '/home/hadoop/export_hive '

> Row formatdelimited fields terminated by ' \ t '

> Collectionitems terminated by ', '

> Map keysterminated by ': '

>select * from UserInfo;

View the exported data as:

[email protected] export_hive]$ cat 000000_0

1 Xiao Ming it engineer, teacher 10086: Normal, 10010: Not normal Beijing, 130

2 Little Flower Babysitter, nurse 10086: Normal, 10010: normal Nanjing, 130

Export query results to a file of the specified delimiter in hive

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More