Export query results to a file of the specified delimiter in hive

Source: Internet
Author: User
Tags ming

A new feature has been introduced in the Hive0.11.0 version that allows users to specify the delimiter for a column when the user outputs the hive query results to a file, whereas the previous version is not a separator between columns.

Before Hive0.11.0 the version is used as follows, cannot specify delimiter, default is \x01:

Hive (Hive) > insertoverwrite Local directory '/home/hadoop/export_hive ' select * from A;

Query ID =hadoop_20150627174342_64852f3a-56ed-48d5-a545-fc28f109be74

Total Jobs = 1

Launching Job 1 out of 1

Number of reduce tasks are set to 0since there ' s no reduce operator

Starting Job = job_1435392961740_0025,tracking URL = http://gpmaster:8088/proxy/application_1435392961740_0025/

Kill Command =/home/hadoop/hadoop-2.6.0/bin/hadoop Job-kill job_1435392961740_0025

Hadoop job information for Stage-1:number of mappers:1; Number of reducers:0

2015-06-27 17:43:54,395 Stage-1 map =0%, reduce = 0%

2015-06-27 17:44:07,615 Stage-1 map =100%, reduce = 0, cumulative CPU 4.82sec

MapReduce Total Cumulative CPU time:4seconds 820 msec

Ended Job = job_1435392961740_0025

Copying data to local directory/home/hadoop/export_hive

Copying data to local directory/home/hadoop/export_hive

MapReduce Jobs Launched:

Stage-stage-1: Map:1 Cumulative cpu:4.82 sec HDFs read:2416833 HDFs write:1188743success

Total MapReduce CPU time Spent:4seconds 820 msec

Ok

A.key A.value

Time taken:26.475 seconds

(It is created automatically when the directory does not exist)

To view the generated file contents:

[Email protected] export_hive]$ head-n 10/home/hadoop/export_hive/000000_0 | cat-a

2610^aaaanz$

32196^aaaaownz$

78606^aaaayxfz$

3804^aaaaz$

30102^aaabewez$

21744^aaabukz$

39666^aaabz$

1632^aaabz$

82464^aaabz$

88320^aaaccaz$

I printed the end of each line in the file with the $ symbol and delimiter ^a (that is, \x01) using the cat-a parameter.

Next, we use the new feature introduced in the Hive0.11.0 version to specify the delimiter between the output result columns:

Hive (Hive) > insertoverwrite Local directory '/home/hadoop/export_hive ' row format delimitedfields Terminated by ' * ' select * from A;

Query ID =hadoop_20150627180045_fced1513-8f1b-44a8-8e88-3cd678552aa5

Total Jobs = 1

Launching Job 1 out of 1

Number of reduce tasks are set to 0since there ' s no reduce operator

Starting Job = job_1435392961740_0028,tracking URL = http://gpmaster:8088/proxy/application_1435392961740_0028/

Kill Command =/home/hadoop/hadoop-2.6.0/bin/hadoop Job-kill job_1435392961740_0028

Hadoop job information for Stage-1:number of mappers:1; Number of reducers:0

2015-06-27 18:00:57,354 Stage-1 map =0%, reduce = 0%

2015-06-27 18:01:10,567 Stage-1 map =100%, reduce = 0, cumulative CPU 4.68sec

MapReduce Total Cumulative CPU time:4seconds 680 msec

Ended Job = job_1435392961740_0028

Copying data to local directory/home/hadoop/export_hive

Copying data to local directory/home/hadoop/export_hive

MapReduce Jobs Launched:

Stage-stage-1: Map:1 Cumulative cpu:4.68 sec HDFs read:2417042 HDFs write:1188743success

Total MapReduce CPU time Spent:4seconds 680 msec

Ok

A.key A.value

Time taken:26.607 seconds

After viewing the specified delimiter as *, the exported data is as follows:

[Email protected] export_hive]$ head-n10/home/hadoop/export_hive/000000_0

2610*aaanz

32196*aaaownz

78606*aaayxfz

3804*aaaz

30102*aabewez

21744*aabukz

39666*aabz

1632*aabz

82464*aabz

88320*aaccaz

You can see that the delimiter for the column is indeed the * number delimiter we specified.

If it is a composite type, such as the Struct,map type, you can also specify the corresponding delimiter:

Below we do an example of the actual operation of the following:

(1) Create a table of composite types

Hive (Hive) > CREATE TABLE userinfo (ID int,name string,job_listarray<string>,perf map<int,string >,info struct<address:string,size:int>);

(2) Constructing data (constructed with default delimiter)

1^a Xiao Ming ^ait engineer ^b teacher ^a10086^c normal ^b10010^c not normal ^a Beijing ^b130

2^a floret ^a babysitter ^b nurse ^a10086^c normal ^b10010^c normal ^a Nanjing ^b130

Comments:

\001 use ^a instead, \002 use ^c instead of ^b,\003

Using the VI editor, use CTRL + V and then CTRL + A to enter this control \001. In order, the \002 input is ctrl+v,ctrl+b, and so on.

(3) Import data

Hive > load Data local inpath '/home/hadoop/hivetestdata/userinfo.txt ' overwrite into table userinfo;

(4) Query data

Hive (Hive) > select * from UserInfo;

Ok

Userinfo.id userinfo.name userinfo.job_list Userinfo.perf userinfo.info

1 Xiaoming ["It engineer", "teacher"] {10086: "normal", 10010: "Not normal"} {"Address": "Beijing", "Size": 130}

2 Floret ["Nanny", "Nurse"] {10086: "normal", 10010: "Normal"} {"Address": "Nanjing", "Size": 130}

Time taken:0.088 seconds, Fetched:2 row (s)

(5) Export data

The delimiter we specify is:

Column delimiter is \ t

The map Keys delimiter is: (colon)

Collection Items Delimiter is:, (comma)

To execute an export command:

Hive (Hive) > Insert overwrite localdirectory '/home/hadoop/export_hive '

> Row formatdelimited fields terminated by ' \ t '

> Collectionitems terminated by ', '

> Map keysterminated by ': '

>select * from UserInfo;

View the exported data as:

[email protected] export_hive]$ cat 000000_0

1 Xiao Ming it engineer, teacher 10086: Normal, 10010: Not normal Beijing, 130

2 Little Flower Babysitter, nurse 10086: Normal, 10010: normal Nanjing, 130

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Export query results to a file of the specified delimiter in hive

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.