How to import hive data

Last Update:2017-02-28 Source: Internet

Author: User

Tags hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Import by external table

The user constructs the external table on the hive, constructs the table to specify the HDFs path, simultaneously completes the data inserts the external table while the data copies to the specified HDFs path.

For example:

Edit File Test.txt

$ cat Test.txt

1 Hello

2 World

3 Test

4 case

Divide between fields by ' t '

Start Hive:

$ hive

Build External table:

hive> CREATE EXTERNAL TABLE MYTEST (num INT, name STRING)

> COMMENT ' This is a test '

> ROW FORMAT delimited FIELDS terminated by ' t '

> STORED as Textfile

> LOCATION '/data/test ';

Time taken:0.714 seconds

Hive> Show tables;

MyTest

Partition_test

Partition_test_input

Test

Time taken:0.07 seconds

hive> desc mytest;

Num INT

Name string

Time taken:0.121 seconds|

Data Copy to HDFs:

$ Hadoop fs-put test.txt/data/test

View Hive Table Data:

Hive> select * from MyTest;

1 Hello

2 World

3 Test

4 case

Time taken:0.375 seconds

Hive> Select num from mytest;

Total MapReduce jobs = 1

Launching Job 1 out of 1

......

Total MapReduce CPU time spent:510 msec

Time taken:27.157 seconds

This approach is often used when there is some historical data on the HDFS, and we need to do some hive operations on that data. This way avoids the cost of copying data

2. Import from local

Data is not on HDFs, import hive table directly from local

Document/home/work/test.txt Content ditto

Build table:

Hive> CREATE TABLE MYTEST2 (num INT, name STRING)

> COMMENT ' This is a test2 '

> ROW FORMAT delimited FIELDS terminated by ' t '

> STORED as Textfile;

Time taken:0.077 seconds

Guide data into table:

hive> LOAD DATA local inpath '/home/work/test.txt ' into TABLE MYTEST2;

Copying data from File:/home/work/test.txt

Copying File:file:/home/work/test.txt

Loading Data to Table Default.mytest2

Time taken:0.24 seconds

View data:

Hive> select * from MYTEST2;

1 Hello

2 World

3 Test

4 case

Time taken:0.11 seconds

The local data imported in this way can be a file, a folder, or a wildcard, and it should be noted that if it is a folder, the folder cannot contain subdirectories, and the wildcard character can only be used for matching files.

3. Import from HDFs

The above Test.txt file has been imported/data/test

You can import data directly into the hive table using the following command:

Hive> CREATE TABLE MYTEST3 (num INT, name STRING)

> COMMENT "This is a test3"

> ROW FORMAT delimited FIELDS terminated by ' t '

> STORED as Textfile;

Time taken:4.735 seconds

hive> LOAD DATA inpath '/data/test/test.txt ' into TABLE MYTEST3;

Loading Data to Table Default.mytest3

Time taken:0.337 seconds

Hive> select * from MYTEST3;

1 Hello

2 World

3 Test

4 case

Time taken:0.227 seconds

4. Import data from other tables:

hive> CREATE EXTERNAL TABLE MYTEST4 (num INT);

Time taken:0.091 seconds

Hive> from MYTEST3 test3

> INSERT OVERWRITE TABLE MYTEST4

> select Test3.num where name= ' world ';

Total MapReduce jobs = 2

Launching Job 1 out of 2

Number of reduce tasks are set to 0 since there ' s no reduce operator

Starting Job = JOB_201207230024_0002, tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201207230024_0002

Kill Command =/home/work/hadoop/hadoop-1.0.3/libexec/. /bin/hadoop Job-dmapred.job.tracker=localhost:9001-kill job_201207230024_0002

Hadoop Job information for Stage-1: number of mappers:1; Number of reducers:0

2012-07-23 18:59:02,365 Stage-1 map = 0, reduce = 0%

2012-07-23 18:59:08,417 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.62 sec

2012-07-23 18:59:09,435 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.62 sec

2012-07-23 18:59:10,445 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.62 sec

2012-07-23 18:59:11,455 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.62 sec

2012-07-23 18:59:12,470 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.62 sec

2012-07-23 18:59:13,489 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.62 sec

2012-07-23 18:59:14,508 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 0.62 sec

MapReduce Total cumulative CPU time:620 msec

Ended Job = job_201207230024_0002

Ended job = -174856900, job is filtered out (removed at runtime).

Moving Data to:hdfs://localhost:9000/tmp/hive-work/hive_2012-07-23_18-58-44_166_189728317691010041/-ext-10000

Loading Data to Table Default.mytest4

Deleted Hdfs://localhost:9000/user/hive/warehouse/mytest4

Table default.mytest4 Stats: [num_partitions:0, Num_files:1, num_rows:0, Total_size:2, raw_data_size:0]

1 Rows loaded to Mytest4

MapReduce Jobs Launched:

Job 0:map:1 accumulative cpu:0.62 sec HDFS read:242 HDFS write:2 sucess

Total MapReduce CPU time spent:620 msec

Time taken:30.663 seconds

Hive> select * from Mytest4;

Time taken:0.103 seconds

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More