One, several ways of hive data import
Start by listing the data and hive tables that describe the following ways of importing.
Hive table:
Create Testa:
CREATE TABLE Testa (
ID INT,
name string, area
string
) partitioned by (Create_time string) ROW FORMAT DEL imited FIELDS terminated by ', ' STORED as textfile;
Create TESTB:
CREATE TABLE TESTB (
ID INT,
name string, area
string,
code string
) partitioned by (Create_time Stri NG) ROW FORMAT delimited FIELDS terminated by ', ' STORED as textfile;
Data file (sourceA.txt):
1,fish1,sz
2,fish2,sh
3,fish3,hz
4,fish4,qd
5,FISH5,SR
Data file (sourceB.txt):
1,zy1,sz,1001
2,zy2,sh,1002
3,zy3,hz,1003
4,zy4,qd,1004
5,zy5,sr,1005
(1) Local file import to hive table
hive> LOAD DATA local inpath '/home/hadoop/sourcea.txt ' into TABLE testa PARTITION (create_time= ' 2015-07-08 '); Copying data from File:/home/hadoop/sourcea.txt copying File:file:/home/hadoop/sourcea.txt Loading data to table default . Testa partition (CREATE_TIME=2015-07-08) partition default.testa{create_time=2015-07-08} stats: [Numfiles=1, NumRows =0, totalsize=58, rawdatasize=0] OK time taken:0.237 seconds hive> LOAD DATA local inpath '/home/hadoop/sourceb.txt ' I
NTO TABLE testb PARTITION (create_time= ' 2015-07-09 '); Copying data from File:/home/hadoop/sourceb.txt copying File:file:/home/hadoop/sourceb.txt Loading data to table default . TESTB partition (CREATE_TIME=2015-07-09) partition default.testb{create_time=2015-07-09} stats: [Numfiles=1, NumRows
=0, totalsize=73, rawdatasize=0] OK time taken:0.212 seconds hive> select * from Testa; OK 1 fish1 SZ 2015-07-08 2 fish2 SH 2015-07-08 3 fish3 HZ 2015-07-08 4 fish4 QD 2015-07-08 5 fish5 SR 2015-07-08 time Take n:0.029 seconds, FetcHed:5 row (s) hive> select * from Testb; OK 1 zy1 SZ 1001 2015-07-09 2 zy2 SH 1002 2015-07-09 3 zy3 HZ 1003 2015-07-09 4 zy4 QD 1004 2015-07-09 5 Zy5 SR 1005 2015- 07-09 time taken:0.047 seconds, Fetched:5 row (s)
(2) Hive table Import to Hive table
Import the TESTB data into the Testa table
Hive> INSERT into TABLE testa PARTITION (create_time= ' 2015-07-11 ') Select ID, name, area from TESTB where id = 1;
. .. (omitted)
OK time
taken:14.744 seconds
hive> INSERT to TABLE Testa PARTITION (create_time) Select ID, name, area, code From Testb where id = 2;
<pre name= "code" class= "Java" .... Omitted
Oktime taken:19.852 secondshive> Select * from testa;ok2 zy2 sh 10021 fish1 SZ 2015-07-082 fish2 sh 2015-07-083 fish3 HZ 2015-07-084 fish4 QD 2015-07-085 fish5 SR 2015-07-081 zy1 SZ 2015-07-11time taken:0.032 seconds row (s)
Description
1, id=1 the line in Testb, import to Testa, partition for 2015-07-11
2, the id=2 row in the TESTB is imported into Testa, and the partition Create_time to the code value of the id=2 line.
(3) HDFs file import to hive table
Upload sourceA.txt and SourceB.txt to HDFs, respectively,/home/hadoop/sourcea.txt and/home/hadoop/sourceb.txt
hive> LOAD DATA inpath '/home/hadoop/sourcea.txt ' into TABLE testa PARTITION (create_time= ' 2015-07-08 ');
... (omitted)
OK time
taken:0.237 seconds
hive> LOAD DATA inpath '/home/hadoop/sourceb.txt ' into TABLE testb PARTITION (Crea Te_time= ' 2015-07-09 ');
<pre name= "code" class= "Java" .... (omitted)
OK time
taken:0.212 seconds
hive> select * from Testa;
OK
1 fish1 SZ 2015-07-08
2 fish2 SH 2015-07-08
3 fish3 HZ 2015-07-08
4 fish4 QD 2015-07-08
5 fish5 SR 2015-07-08
Time taken:0.029 seconds, Fetched:5 row (s)
hive> select * from Testb;
OK
1 zy1 SZ 1001 2015-07-09
2 zy2 SH 1002 2015-07-09
3 Zy3 HZ 1003 2015-07-09
4 zy4 QD 1004 2015-07-09
5 zy5 SR 1005 2015-07-09 time
taken:0.047 seconds, Fetched:5 row (s)
/home/hadoop/sourcea.txt ' Import to Testa table
/home/hadoop/sourceb.txt ' Import to Testb table
(4) importing from other tables in the process of creating a table
Hive> CREATE table TESTC as select name, code from TESTB; Total jobs = 3 Launching Job 1 out of 3 number of reduce tasks are set to 0 since there ' s no reduce operator starting job = JOB_1449746265797_0106, Tracking URL = Http://hadoopcluster79:8088/proxy/application_1449746265797_0106/Kill Command =/home/hadoop/apache/hadoop-2.4.1/bin/hadoop Job-kill job_1449746265797_0106 Hadoop job information for Stage- 1:number of Mappers:1; Number of reducers:0 2015-12-24 16:40:17,981 Stage-1 map = 0, reduce = 0% 2015-12-24, 16:40:23,115 Stage-1 map = 100%, reduce = 0, Cumulative CPU 1.11 SEC MapReduce Total cumulative cpu time:1 seconds \ msec Job = Ended
_0106 Stage-4 is selected by condition Resolver.
Stage-3 is filtered out by condition Resolver.
Stage-5 is filtered out by condition Resolver.
Moving Data to:hdfs://hadoop2cluster/tmp/hive-root/hive_2015-12-24_16-40-09_983_6048680148773453194-1/-ext-10001 Moving Data to:hdfs://hadoop2cluster/home/hadoop/hivedata/WAREHOUSE/TESTC Table DEFAULT.TESTC Stats: [Numfiles=1, Numrows=0, totalsize=45, rawdatasize=0] MapReduce Jobs launched: Job 0:map:1 Cumulative cpu:1.11 sec HDFS read:297 HDFS write:45 SUCCESS total MapReduce CPU time spent:1
S msec OK time taken:14.292 seconds hive> desc TESTC;
OK Name String Code string Time taken:0.032 seconds, Fetched:2 row (s)
Two, several ways of hive data export
(1) Export to local file system
hive> INSERT OVERWRITE local DIRECTORY '/home/hadoop/output ' ROW FORMAT delimited FIELDS terminated by ', ' SELECT * fro
M Testa; Total jobs = 1 Launching Job 1 out of 1 number of reduce tasks are set to 0 since there ' s no reduce operator starting job = JOB_1451024007879_0001, Tracking URL = Http://hadoopcluster79:8088/proxy/application_1451024007879_0001/Kill Command =/home/hadoop/apache/hadoop-2.4.1/bin/hadoop Job-kill job_1451024007879_0001 Hadoop job information for Stage- 1:number of Mappers:1; Number of reducers:0 2015-12-25 17:04:30,447 Stage-1 map = 0, reduce = 0% 2015-12-25, 17:04:35,616 Stage-1 map = 100%, reduce = 0, Cumulative CPU 1.16 SEC MapReduce Total Cumulative cpu time:1 seconds 160 msec Ended Job = job_1451024007879 _0001 copying data to local directory/home/hadoop/output copying data to local directory/home/hadoop/output MapReduce Jo BS launched:job 0:map:1 Cumulative cpu:1.16 sec HDFS read:305 HDFS write:110 SUCCESS total MapReduce CPU time S PeNt:1 seconds 160 msec OK time taken:16.701 seconds
To view data results:
[Hadoop@hadoopcluster78 output]$ cat/home/hadoop/output/000000_0
1,fish1,sz,2015-07-08
2,fish2,SH, 2015-07-08
3,fish3,hz,2015-07-08
4,fish4,qd,2015-07-08
5,fish5,sr,2015-07-08
Import hive table Testa data into the/home/hadoop directory by Insert OVERWRITE Local directory, as is known to all, HQL will start mapreduce complete, in fact/home/ Hadoop is the MapReduce output path, resulting in the file name: 000000_0.
(2) export to HDFs
Importing to HDFs and importing local files is similar, and removing the HQL statement is OK
hive> INSERT OVERWRITE DIRECTORY '/home/hadoop/output ' select * from Testa; Total jobs = 3 Launching Job 1 out of 3 number of reduce tasks are set to 0 since there ' s no reduce operator starting job = JOB_1451024007879_0002, Tracking URL = Http://hadoopcluster79:8088/proxy/application_1451024007879_0002/Kill Command =/home/hadoop/apache/hadoop-2.4.1/bin/hadoop Job-kill job_1451024007879_0002 Hadoop job information for Stage- 1:number of Mappers:1; Number of reducers:0 2015-12-25 17:08:51,034 Stage-1 map = 0, reduce = 0% 2015-12-25, 17:08:59,313 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.4 SEC MapReduce Total cumulative cpu time:1 seconds msec Job = Ended
0002 Stage-3 is selected by condition Resolver.
Stage-2 is filtered out by condition Resolver.
Stage-4 is filtered out by condition Resolver. Moving Data to:hdfs://hadoop2cluster/home/hadoop/hivedata/hive-hadoop/hive_2015-12-25_17-08-43_733_ 1768532778392261937-1/-ext-10000 moving data to:/home/Hadoop/output MapReduce Jobs launched:job 0:map:1 Cumulative cpu:1.4 sec HDFS read:305 HDFS write:110 SUCCESS T Otal MapReduce CPU time spent:1 seconds msec OK time taken:16.667
To view the Hfds output file:
[Hadoop@hadoopcluster78 bin]$./hadoop fs-cat/home/hadoop/output/000000_0
1fish1sz2015-07-08
2fish2sh2015-07-08
3fish3hz2015-07-08
4fish4qd2015-07-08
5fish5sr2015-07-08
Other
The Hive-E and-f parameters are used to export the data.
Parameter is:- e usage, followed by the SQL statement. >> After the output file path
[Hadoop@hadoopcluster78 bin]$./hive-e "SELECT * from Testa" >>/home/hadoop/output/testa.txt 15/12/25
17:15:07 WARN Conf. Hiveconf:deprecated:hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
Logging initialized using configuration in file:/home/hadoop/apache/ Hive-0.13.1/conf/hive-log4j.properties
OK time
taken:1.128 seconds, Fetched:5 row (s)
[ Hadoop@hadoopcluster78 bin]$ cat/home/hadoop/output/testa.txt
1 fish1 SZ 2015-07-08
2 Fish2 SH 2015-07-08
3 fish3 HZ 2015-07-08
4 fish4 QD 2015-07-08
5 fish5 SR 2015-07-08
Parameters are:
- F, followed by the file that holds the SQL statement.
>>The output file path follows
SQL statement File:
[Hadoop@hadoopcluster78 bin]$ cat/home/hadoop/output/sql.sql
select * from Testa
Execute using the-F parameter:
[Hadoop@hadoopcluster78 bin]$./hive-f/home/hadoop/output/sql.sql >>/home/hadoop/output/testb.txt
15/12 /25 17:20:52 WARN Conf. Hiveconf:deprecated:hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
Logging initialized using configuration in file:/home/hadoop/apache/ Hive-0.13.1/conf/hive-log4j.properties
OK time
taken:1.1 seconds, Fetched:5 row (s)
Refer to the results:
[Hadoop@hadoopcluster78 bin]$ cat/home/hadoop/output/testb.txt
1 fish1 SZ 2015-07-08
2 Fish2 SH 2015-07-08
3 fish3 HZ 2015-07-08
4 fish4 QD 2015-07-08
5 fish5 SR 2015-07-08