Several ways to hive data import and export

Last Update:2018-08-12 Source: Internet

Author: User

Tags deprecated hadoop fs log4j

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

One, several ways of hive data import

Start by listing the data and hive tables that describe the following ways of importing.

Hive table:

Create Testa:

CREATE TABLE Testa (
	ID INT,
	name string, area
	string
) partitioned by (Create_time string) ROW FORMAT DEL imited FIELDS terminated by ', ' STORED as textfile;

Create TESTB:

CREATE TABLE TESTB (
	ID INT,
	name string, area
	string,
	code string
) partitioned by (Create_time Stri NG) ROW FORMAT delimited FIELDS terminated by ', ' STORED as textfile;

Data file (sourceA.txt):

1,fish1,sz
2,fish2,sh
3,fish3,hz
4,fish4,qd
5,FISH5,SR

Data file (sourceB.txt):

1,zy1,sz,1001
2,zy2,sh,1002
3,zy3,hz,1003
4,zy4,qd,1004
5,zy5,sr,1005

(1) Local file import to hive table

hive> LOAD DATA local inpath '/home/hadoop/sourcea.txt ' into TABLE testa PARTITION (create_time= ' 2015-07-08 '); Copying data from File:/home/hadoop/sourcea.txt copying File:file:/home/hadoop/sourcea.txt Loading data to table default . Testa partition (CREATE_TIME=2015-07-08) partition default.testa{create_time=2015-07-08} stats: [Numfiles=1, NumRows =0, totalsize=58, rawdatasize=0] OK time taken:0.237 seconds hive> LOAD DATA local inpath '/home/hadoop/sourceb.txt ' I
NTO TABLE testb PARTITION (create_time= ' 2015-07-09 '); Copying data from File:/home/hadoop/sourceb.txt copying File:file:/home/hadoop/sourceb.txt Loading data to table default . TESTB partition (CREATE_TIME=2015-07-09) partition default.testb{create_time=2015-07-09} stats: [Numfiles=1, NumRows
=0, totalsize=73, rawdatasize=0] OK time taken:0.212 seconds hive> select * from Testa; OK 1 fish1 SZ 2015-07-08 2 fish2 SH 2015-07-08 3 fish3 HZ 2015-07-08 4 fish4 QD 2015-07-08 5 fish5 SR 2015-07-08 time Take n:0.029 seconds, FetcHed:5 row (s) hive> select * from Testb; OK 1 zy1 SZ 1001 2015-07-09 2 zy2 SH 1002 2015-07-09 3 zy3 HZ 1003 2015-07-09 4 zy4 QD 1004 2015-07-09 5 Zy5 SR 1005 2015- 07-09 time taken:0.047 seconds, Fetched:5 row (s)

(2) Hive table Import to Hive table

Import the TESTB data into the Testa table

Hive> INSERT into TABLE testa PARTITION (create_time= ' 2015-07-11 ') Select ID, name, area from TESTB where id = 1;
. .. (omitted)
OK time
taken:14.744 seconds
hive> INSERT to TABLE Testa PARTITION (create_time) Select ID, name, area, code From Testb where id = 2;
<pre name= "code" class= "Java" .... Omitted

Oktime taken:19.852 secondshive> Select * from testa;ok2 zy2 sh 10021 fish1 SZ 2015-07-082 fish2 sh 2015-07-083 fish3 HZ 2015-07-084 fish4 QD 2015-07-085 fish5 SR 2015-07-081 zy1 SZ 2015-07-11time taken:0.032 seconds row (s)

Description

1, id=1 the line in Testb, import to Testa, partition for 2015-07-11

2, the id=2 row in the TESTB is imported into Testa, and the partition Create_time to the code value of the id=2 line.

(3) HDFs file import to hive table

Upload sourceA.txt and SourceB.txt to HDFs, respectively,/home/hadoop/sourcea.txt and/home/hadoop/sourceb.txt

hive> LOAD DATA inpath '/home/hadoop/sourcea.txt ' into TABLE testa PARTITION (create_time= ' 2015-07-08 ');
... (omitted)
OK time
taken:0.237 seconds
hive> LOAD DATA inpath '/home/hadoop/sourceb.txt ' into TABLE testb PARTITION (Crea Te_time= ' 2015-07-09 ');
<pre name= "code" class= "Java" .... (omitted)
OK time
taken:0.212 seconds
hive> select * from Testa;
OK
1	fish1	SZ	2015-07-08
2	fish2	SH	2015-07-08
3	fish3	HZ	2015-07-08
4	fish4	QD	2015-07-08
5	fish5	SR	2015-07-08
Time taken:0.029 seconds, Fetched:5 row (s)
hive> select * from Testb;
OK
1	zy1	SZ	1001	2015-07-09
2	zy2	SH	1002	2015-07-09
3	Zy3	HZ	1003	2015-07-09
4	zy4	QD	1004	2015-07-09
5	zy5	SR	1005	2015-07-09 time
taken:0.047 seconds, Fetched:5 row (s)

/home/hadoop/sourcea.txt ' Import to Testa table

/home/hadoop/sourceb.txt ' Import to Testb table

(4) importing from other tables in the process of creating a table

Hive> CREATE table TESTC as select name, code from TESTB;  Total jobs = 3 Launching Job 1 out of 3 number of reduce tasks are set to 0 since there ' s no reduce operator starting job = JOB_1449746265797_0106, Tracking URL = Http://hadoopcluster79:8088/proxy/application_1449746265797_0106/Kill Command =/home/hadoop/apache/hadoop-2.4.1/bin/hadoop Job-kill job_1449746265797_0106 Hadoop job information for Stage- 1:number of Mappers:1;  Number of reducers:0 2015-12-24 16:40:17,981 Stage-1 map = 0, reduce = 0% 2015-12-24, 16:40:23,115 Stage-1 map = 100%, reduce = 0, Cumulative CPU 1.11 SEC MapReduce Total cumulative cpu time:1 seconds \ msec Job = Ended
_0106 Stage-4 is selected by condition Resolver.
Stage-3 is filtered out by condition Resolver.
Stage-5 is filtered out by condition Resolver.
Moving Data to:hdfs://hadoop2cluster/tmp/hive-root/hive_2015-12-24_16-40-09_983_6048680148773453194-1/-ext-10001 Moving Data to:hdfs://hadoop2cluster/home/hadoop/hivedata/WAREHOUSE/TESTC Table DEFAULT.TESTC Stats: [Numfiles=1, Numrows=0, totalsize=45, rawdatasize=0] MapReduce Jobs launched: Job 0:map:1 Cumulative cpu:1.11 sec HDFS read:297 HDFS write:45 SUCCESS total MapReduce CPU time spent:1
S msec OK time taken:14.292 seconds hive> desc TESTC;              	                    
OK Name String Code string Time taken:0.032 seconds, Fetched:2 row (s)

Two, several ways of hive data export

(1) Export to local file system

hive> INSERT OVERWRITE local DIRECTORY '/home/hadoop/output ' ROW FORMAT delimited FIELDS terminated by ', ' SELECT * fro
M Testa;  Total jobs = 1 Launching Job 1 out of 1 number of reduce tasks are set to 0 since there ' s no reduce operator starting job = JOB_1451024007879_0001, Tracking URL = Http://hadoopcluster79:8088/proxy/application_1451024007879_0001/Kill Command =/home/hadoop/apache/hadoop-2.4.1/bin/hadoop Job-kill job_1451024007879_0001 Hadoop job information for Stage- 1:number of Mappers:1;  Number of reducers:0 2015-12-25 17:04:30,447 Stage-1 map = 0, reduce = 0% 2015-12-25, 17:04:35,616 Stage-1 map = 100%, reduce = 0, Cumulative CPU 1.16 SEC MapReduce Total Cumulative cpu time:1 seconds 160 msec Ended Job = job_1451024007879 _0001 copying data to local directory/home/hadoop/output copying data to local directory/home/hadoop/output MapReduce Jo BS launched:job 0:map:1 Cumulative cpu:1.16 sec HDFS read:305 HDFS write:110 SUCCESS total MapReduce CPU time S PeNt:1 seconds 160 msec OK time taken:16.701 seconds

To view data results:

[Hadoop@hadoopcluster78 output]$ cat/home/hadoop/output/000000_0 
1,fish1,sz,2015-07-08
2,fish2,SH, 2015-07-08
3,fish3,hz,2015-07-08
4,fish4,qd,2015-07-08
5,fish5,sr,2015-07-08

Import hive table Testa data into the/home/hadoop directory by Insert OVERWRITE Local directory, as is known to all, HQL will start mapreduce complete, in fact/home/ Hadoop is the MapReduce output path, resulting in the file name: 000000_0.

(2) export to HDFs

Importing to HDFs and importing local files is similar, and removing the HQL statement is OK

hive> INSERT OVERWRITE DIRECTORY '/home/hadoop/output ' select * from Testa;  Total jobs = 3 Launching Job 1 out of 3 number of reduce tasks are set to 0 since there ' s no reduce operator starting job = JOB_1451024007879_0002, Tracking URL = Http://hadoopcluster79:8088/proxy/application_1451024007879_0002/Kill Command =/home/hadoop/apache/hadoop-2.4.1/bin/hadoop Job-kill job_1451024007879_0002 Hadoop job information for Stage- 1:number of Mappers:1;  Number of reducers:0 2015-12-25 17:08:51,034 Stage-1 map = 0, reduce = 0% 2015-12-25, 17:08:59,313 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.4 SEC MapReduce Total cumulative cpu time:1 seconds msec Job = Ended
0002 Stage-3 is selected by condition Resolver.
Stage-2 is filtered out by condition Resolver.
Stage-4 is filtered out by condition Resolver. Moving Data to:hdfs://hadoop2cluster/home/hadoop/hivedata/hive-hadoop/hive_2015-12-25_17-08-43_733_ 1768532778392261937-1/-ext-10000 moving data to:/home/Hadoop/output MapReduce Jobs launched:job 0:map:1 Cumulative cpu:1.4 sec HDFS read:305 HDFS write:110 SUCCESS T Otal MapReduce CPU time spent:1 seconds msec OK time taken:16.667

To view the Hfds output file:

[Hadoop@hadoopcluster78 bin]$./hadoop fs-cat/home/hadoop/output/000000_0
1fish1sz2015-07-08
2fish2sh2015-07-08
3fish3hz2015-07-08
4fish4qd2015-07-08
5fish5sr2015-07-08

Other

The Hive-E and-f parameters are used to export the data.

Parameter is:- e usage, followed by the SQL statement. >> After the output file path

[Hadoop@hadoopcluster78 bin]$./hive-e "SELECT * from Testa" >>/home/hadoop/output/testa.txt 15/12/25
17:15:07 WARN Conf. Hiveconf:deprecated:hive.metastore.ds.retry.* no longer has any effect.  Use hive.hmshandler.retry.* instead

Logging initialized using configuration in file:/home/hadoop/apache/ Hive-0.13.1/conf/hive-log4j.properties
OK time
taken:1.128 seconds, Fetched:5 row (s)
[ Hadoop@hadoopcluster78 bin]$ cat/home/hadoop/output/testa.txt 
1	fish1	SZ	2015-07-08
2	Fish2	SH	2015-07-08
3	fish3	HZ	2015-07-08
4	fish4	QD	2015-07-08
5	fish5	SR	2015-07-08

Parameters are: - F, followed by the file that holds the SQL statement. >>The output file path follows

SQL statement File:

[Hadoop@hadoopcluster78 bin]$ cat/home/hadoop/output/sql.sql 
select * from Testa

Execute using the-F parameter:

[Hadoop@hadoopcluster78 bin]$./hive-f/home/hadoop/output/sql.sql >>/home/hadoop/output/testb.txt
15/12 /25 17:20:52 WARN Conf. Hiveconf:deprecated:hive.metastore.ds.retry.* no longer has any effect.  Use hive.hmshandler.retry.* instead

Logging initialized using configuration in file:/home/hadoop/apache/ Hive-0.13.1/conf/hive-log4j.properties
OK time
taken:1.1 seconds, Fetched:5 row (s)

Refer to the results:

[Hadoop@hadoopcluster78 bin]$ cat/home/hadoop/output/testb.txt 
1	fish1	SZ	2015-07-08
2	Fish2	SH	2015-07-08
3	fish3	HZ	2015-07-08
4	fish4	QD	2015-07-08
5	fish5	SR	2015-07-08

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More