How to import hive data

Source: Internet
Author: User
Tags hadoop fs

1. Import by external table

The user constructs the external table on the hive, constructs the table to specify the HDFs path, simultaneously completes the data inserts the external table while the data copies to the specified HDFs path.

For example:

Edit File Test.txt

$ cat Test.txt

1 Hello

2 World

3 Test

4 case

Divide between fields by ' t '

Start Hive:

$ hive

Build External table:

hive> CREATE EXTERNAL TABLE MYTEST (num INT, name STRING)

> COMMENT ' This is a test '

> ROW FORMAT delimited FIELDS terminated by ' t '

> STORED as Textfile

> LOCATION '/data/test ';

Ok

Time taken:0.714 seconds

Hive> Show tables;

Ok

MyTest

Partition_test

Partition_test_input

Test

Time taken:0.07 seconds

hive> desc mytest;

Ok

Num INT

Name string

Time taken:0.121 seconds|

Data Copy to HDFs:

$ Hadoop fs-put test.txt/data/test

View Hive Table Data:

Hive> select * from MyTest;

Ok

1 Hello

2 World

3 Test

4 case

Time taken:0.375 seconds

Hive> Select num from mytest;

Total MapReduce jobs = 1

Launching Job 1 out of 1

......

Total MapReduce CPU time spent:510 msec

Ok

1

2

3

4

Time taken:27.157 seconds

This approach is often used when there is some historical data on the HDFS, and we need to do some hive operations on that data. This way avoids the cost of copying data

2. Import from local

Data is not on HDFs, import hive table directly from local

Document/home/work/test.txt Content ditto

Build table:

Hive> CREATE TABLE MYTEST2 (num INT, name STRING)

> COMMENT ' This is a test2 '

> ROW FORMAT delimited FIELDS terminated by ' t '

> STORED as Textfile;

Ok

Time taken:0.077 seconds

Guide data into table:

hive> LOAD DATA local inpath '/home/work/test.txt ' into TABLE MYTEST2;

Copying data from File:/home/work/test.txt

Copying File:file:/home/work/test.txt

Loading Data to Table Default.mytest2

Ok

Time taken:0.24 seconds

View data:

Hive> select * from MYTEST2;

Ok

1 Hello

2 World

3 Test

4 case

Time taken:0.11 seconds

The local data imported in this way can be a file, a folder, or a wildcard, and it should be noted that if it is a folder, the folder cannot contain subdirectories, and the wildcard character can only be used for matching files.

3. Import from HDFs

The above Test.txt file has been imported/data/test

You can import data directly into the hive table using the following command:

Hive> CREATE TABLE MYTEST3 (num INT, name STRING)

> COMMENT "This is a test3"

> ROW FORMAT delimited FIELDS terminated by ' t '

> STORED as Textfile;

Ok

Time taken:4.735 seconds

hive> LOAD DATA inpath '/data/test/test.txt ' into TABLE MYTEST3;

Loading Data to Table Default.mytest3

Ok

Time taken:0.337 seconds

Hive> select * from MYTEST3;

Ok

1 Hello

2 World

3 Test

4 case

Time taken:0.227 seconds

4. Import data from other tables:

hive> CREATE EXTERNAL TABLE MYTEST4 (num INT);

Ok

Time taken:0.091 seconds

Hive> from MYTEST3 test3

> INSERT OVERWRITE TABLE MYTEST4

> select Test3.num where name= ' world ';

Total MapReduce jobs = 2

Launching Job 1 out of 2

Number of reduce tasks are set to 0 since there ' s no reduce operator

Starting Job = JOB_201207230024_0002, tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201207230024_0002

Kill Command =/home/work/hadoop/hadoop-1.0.3/libexec/. /bin/hadoop Job-dmapred.job.tracker=localhost:9001-kill job_201207230024_0002

Hadoop Job information for Stage-1: number of mappers:1; Number of reducers:0

2012-07-23 18:59:02,365 Stage-1 map = 0, reduce = 0%

2012-07-23 18:59:08,417 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.62 sec

2012-07-23 18:59:09,435 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.62 sec

2012-07-23 18:59:10,445 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.62 sec

2012-07-23 18:59:11,455 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.62 sec

2012-07-23 18:59:12,470 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.62 sec

2012-07-23 18:59:13,489 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.62 sec

2012-07-23 18:59:14,508 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 0.62 sec

MapReduce Total cumulative CPU time:620 msec

Ended Job = job_201207230024_0002

Ended job = -174856900, job is filtered out (removed at runtime).

Moving Data to:hdfs://localhost:9000/tmp/hive-work/hive_2012-07-23_18-58-44_166_189728317691010041/-ext-10000

Loading Data to Table Default.mytest4

Deleted Hdfs://localhost:9000/user/hive/warehouse/mytest4

Table default.mytest4 Stats: [num_partitions:0, Num_files:1, num_rows:0, Total_size:2, raw_data_size:0]

1 Rows loaded to Mytest4

MapReduce Jobs Launched:

Job 0:map:1 accumulative cpu:0.62 sec HDFS read:242 HDFS write:2 sucess

Total MapReduce CPU time spent:620 msec

Ok

Time taken:30.663 seconds

Hive> select * from Mytest4;

Ok

2

Time taken:0.103 seconds

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.