1. Import by external table
The user constructs the external table on the hive, constructs the table to specify the HDFs path, simultaneously completes the data inserts the external table while the data copies to the specified HDFs path.
For example:
Edit File Test.txt
$ cat Test.txt
1 Hello
2 World
3 Test
4 case
Divide between fields by ' t '
Start Hive:
$ hive
Build External table:
hive> CREATE EXTERNAL TABLE MYTEST (num INT, name STRING)
> COMMENT ' This is a test '
> ROW FORMAT delimited FIELDS terminated by ' t '
> STORED as Textfile
> LOCATION '/data/test ';
Ok
Time taken:0.714 seconds
Hive> Show tables;
Ok
MyTest
Partition_test
Partition_test_input
Test
Time taken:0.07 seconds
hive> desc mytest;
Ok
Num INT
Name string
Time taken:0.121 seconds|
Data Copy to HDFs:
$ Hadoop fs-put test.txt/data/test
View Hive Table Data:
Hive> select * from MyTest;
Ok
1 Hello
2 World
3 Test
4 case
Time taken:0.375 seconds
Hive> Select num from mytest;
Total MapReduce jobs = 1
Launching Job 1 out of 1
......
Total MapReduce CPU time spent:510 msec
Ok
1
2
3
4
Time taken:27.157 seconds
This approach is often used when there is some historical data on the HDFS, and we need to do some hive operations on that data. This way avoids the cost of copying data
2. Import from local
Data is not on HDFs, import hive table directly from local
Document/home/work/test.txt Content ditto
Build table:
Hive> CREATE TABLE MYTEST2 (num INT, name STRING)
> COMMENT ' This is a test2 '
> ROW FORMAT delimited FIELDS terminated by ' t '
> STORED as Textfile;
Ok
Time taken:0.077 seconds
Guide data into table:
hive> LOAD DATA local inpath '/home/work/test.txt ' into TABLE MYTEST2;
Copying data from File:/home/work/test.txt
Copying File:file:/home/work/test.txt
Loading Data to Table Default.mytest2
Ok
Time taken:0.24 seconds
View data:
Hive> select * from MYTEST2;
Ok
1 Hello
2 World
3 Test
4 case
Time taken:0.11 seconds
The local data imported in this way can be a file, a folder, or a wildcard, and it should be noted that if it is a folder, the folder cannot contain subdirectories, and the wildcard character can only be used for matching files.
3. Import from HDFs
The above Test.txt file has been imported/data/test
You can import data directly into the hive table using the following command:
Hive> CREATE TABLE MYTEST3 (num INT, name STRING)
> COMMENT "This is a test3"
> ROW FORMAT delimited FIELDS terminated by ' t '
> STORED as Textfile;
Ok
Time taken:4.735 seconds
hive> LOAD DATA inpath '/data/test/test.txt ' into TABLE MYTEST3;
Loading Data to Table Default.mytest3
Ok
Time taken:0.337 seconds
Hive> select * from MYTEST3;
Ok
1 Hello
2 World
3 Test
4 case
Time taken:0.227 seconds
4. Import data from other tables:
hive> CREATE EXTERNAL TABLE MYTEST4 (num INT);
Ok
Time taken:0.091 seconds
Hive> from MYTEST3 test3
> INSERT OVERWRITE TABLE MYTEST4
> select Test3.num where name= ' world ';
Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks are set to 0 since there ' s no reduce operator
Starting Job = JOB_201207230024_0002, tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201207230024_0002
Kill Command =/home/work/hadoop/hadoop-1.0.3/libexec/. /bin/hadoop Job-dmapred.job.tracker=localhost:9001-kill job_201207230024_0002
Hadoop Job information for Stage-1: number of mappers:1; Number of reducers:0
2012-07-23 18:59:02,365 Stage-1 map = 0, reduce = 0%
2012-07-23 18:59:08,417 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.62 sec
2012-07-23 18:59:09,435 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.62 sec
2012-07-23 18:59:10,445 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.62 sec
2012-07-23 18:59:11,455 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.62 sec
2012-07-23 18:59:12,470 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.62 sec
2012-07-23 18:59:13,489 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.62 sec
2012-07-23 18:59:14,508 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 0.62 sec
MapReduce Total cumulative CPU time:620 msec
Ended Job = job_201207230024_0002
Ended job = -174856900, job is filtered out (removed at runtime).
Moving Data to:hdfs://localhost:9000/tmp/hive-work/hive_2012-07-23_18-58-44_166_189728317691010041/-ext-10000
Loading Data to Table Default.mytest4
Deleted Hdfs://localhost:9000/user/hive/warehouse/mytest4
Table default.mytest4 Stats: [num_partitions:0, Num_files:1, num_rows:0, Total_size:2, raw_data_size:0]
1 Rows loaded to Mytest4
MapReduce Jobs Launched:
Job 0:map:1 accumulative cpu:0.62 sec HDFS read:242 HDFS write:2 sucess
Total MapReduce CPU time spent:620 msec
Ok
Time taken:30.663 seconds
Hive> select * from Mytest4;
Ok
2
Time taken:0.103 seconds