Examples of using hive operation statements

Last Update:2018-12-04 Source: Internet

Author: User

Tags hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

# Create a table person information table person (string name, int age)
Hive> Create Table person (name string, age INT) Row format delimited fields terminated by '\ t' escaped by' \ 'stored as textfile;
OK
Time taken: 0.541 seconds
# Create a table ticket price information table ticket (INT age, float price)
Hive> Create Table ticket (age int, price float) Row format delimited fields terminated by '\ t' escaped by' \ 'stored as textfile;
OK
Time taken: 0.154 seconds
# Create a local data file
-RW-r -- 1 hadoop 40 Feb 6 13:28 person.txt
-RW-r -- 1 hadoop 45 Feb 6 13:28 ticket.txt
# Load local data files to the hive Data Warehouse
Hive> load data local inpath '/home/hadoop/hfxdoc/person.txt' overwrite into Table person;
Copying data from file:/home/hadoop/hfxdoc/person.txt
Copying file:/home/hadoop/hfxdoc/person.txt
Loading data to table default. Person
Deleted HDFS: // 10.15.107.155: 8000/user/hive/warehouse/person
OK
Time taken: 0.419 seconds
Hive> load data local inpath'/home/hadoop/hfxdoc/ticket.txt 'Overwrite into Table ticket;
Copying data from file:/home/hadoop/hfxdoc/ticket.txt
Copying file:/home/hadoop/hfxdoc/ticket.txt
Loading data to table default. Ticket
Deleted HDFS: // 10.15.107.155: 8000/user/hive/warehouse/ticket
OK
Time taken: 0.25 seconds
# The load command will move the data file to the configured data path:/user/hive/warehouse
Hive> show tables;
Hive> describe person
Hive> select * From person;
OK
Huang 26
Lili 25
Dongdong 13
Wangxiao 5
Time taken: 0.092 seconds
Hive>
# Note that the select * statement is not compiled into a mapreduce program, so it is very fast.
# Join queries with slightly complex points
Hive> select * From person join ticket on person. Age = ticket. Age;
Mapreduce total cumulative CPU time: 5 secondds 510 msec
Ended job = job_201301211420_0011
Mapreduce Jobs launched:
Job 0: Map: 2 reduce: 1 Cumulative CPU: 5.51 sec HDFS read: 519 HDFS write: 71 success
Total mapreduce CPU time spent: 5 secondds 510 msec
OK
Wangxiao 5 5 10.0
Dongdong 13 13 20.0
Lili 25 25, 30.0
Huang 26 26, 30.0
Time taken: 32.465 seconds
# The query statement is compiled into a mapreduce program and executed on hadoop.
# Using external tables
# First put the local file to the HDFS file path
[Hadoop @ localhost hfxdoc] $ hadoop FS-mkdir/tmp/ticket
[Hadoop @ localhost hfxdoc] $ hadoop FS-put person.txt/tmp/ticket
[Hadoop @ localhost hfxdoc] $ hadoop FS-put ticket.txt/tmp/ticket
[Hadoop @ localhost hfxdoc] $ hadoop FS-ls/tmp/ticket
Found 2 items
-RW-r -- 1 hadoop supergroup 40 2013-02-06 13:45/tmp/ticket/person.txt
-RW-r -- 1 hadoop supergroup 45 2013-02-06 13:45/tmp/ticket/ticket.txt
Create external table person_ext (name string, age INT) Row format delimited fields terminated by '\ t' escaped by' \ 'stored as textfile location'/tmp/ticket'
# Location can only be used to configure the data path. What is the file of two tables under our path? Can a table be created in this way?
#No! Therefore, all files under a file path should be data files associated with the data table.
# If there are files from other tables, no error will be reported during the creation process, because the string type in hive default text can be implicitly converted to any other data type. For example, if you still have a file with a row and three columns, the third column
# The person table cannot be parsed. If each row has only one column, the second column will be filled with null. So we can adjust the HDFS file path.
Hive> select * From person_ext;
OK
Huang 26
Lili 25
Dongdong 13
Wangxiao 5
1 10
2 10
5 10
13 20
14 20
25 30
26 30
31 40
Time taken: 0.088 seconds
Hive> drop table person_ext;
# The operation to drop the External table does not delete the data that the metadata assumes, so there are still data files on HDFS.

#Complex data tablesIn this example, columns are separated by '\ t', and array elements are separated ', '.
# The data file content is as follows:
1 huangfengxiao Beijing, Shanghai, Tianjin, Hangzhou
2 Linan changchu, Chengdu, Wuhan

Hive> Create Table complex (name string, work_locations array <string>)
> Row format delimited
> Fields terminated by '\ t'
> Collection items terminated ',';

Hive> describe complex;
OK
Name string
Work_locations array <string>

Hive> load data local inpath'/home/hadoop/hfxdoc/complex.txt 'Overwrite into Table Complex
Hive> select * from complex;
OK
Huangfengxiao ["Beijing", "Shanghai", "Tianjin", "Hangzhou"]
Linan ["changchu", "Chengdu", "Wuhan"]
Time taken: 0.125 seconds

Hive> select name, work_locations [0] from complex;
Mapreduce total cumulative CPU time: 790 msec
Ended job = job_201301211420_0012
Mapreduce Jobs launched:
Job 0: Map: 1 Cumulative CPU: 0.79 sec HDFS read: 296 HDFS write: 37 success
Total mapreduce CPU time spent: 790 msec
OK
Huangfengxiao Beijing
Linan changchu
Time taken: 20.703 seconds
# How to partition?
Table class (teacher sting, student string, age INT)
MIS Li huangfengxiao 20
MIS Li Lijie 21
MIS Li Dongdong 21
MIS Li liqiang 21
MIS Li hemeng 21
Mr Xu Dingding 19
Mr Xu wangqiang 19
Mr Xu Lidong 19
Mr Xu hexing 19
If we partition the data of this class member by teacher
Create Table classmem (student string, age INT) partitioned by (teacher string)
Partition File
Classmem_misli.txt
Huangfengxiao 20
Lijie 21
Dongdong 21
Liqiang 21
Hemeng 21
Classmem_mrxu.txt
Dingding 19
Wangqiang 19
Lidong 19
Hexing 19
Load data local inpath '/home/hadoop/hfxdoc/classmem_misli.txt' into Table classmem partition (Teacher = 'mis. li ')
Load data local inpath '/home/hadoop/hfxdoc/classmem_mrxu.txt' into Table classmem partition (Teacher = 'mis. xu ')

# Partition columns are defaulted to the last column
Hive> select * From classmem where Teacher = 'Mr. Xu ';
OK
Dingding 19 null Mr. Xu
Wangqiang 19 null Mr. Xu
Lidong 19 null Mr. Xu
Hexing 19 null Mr. Xu
Time taken: 0.196 seconds
# Directly retrieve from the partition to accelerate the process. If the WHERE clause is not a partition column, the SQL statement will be compiled into a mapreduce program with a high latency.
# Therefore, we create partitions for some frequently used filtering query fields.

# Bucket usage? More efficient! Sampling allowed! It is mainly used for sampling large datasets.
The principle of bucket is to slice a table (or partition), select the field to be slice, set the number of buckets, and use the hash value of the field and number to enter the bucket.
For example, the content of the bucket.txt data file is as follows:
ID name age
1 Huang 11
2 Li 11
3 Xu 12
4 Zhong 14
5 Hu 15
6 liqiang 17
7 Zhonghua 19
If we want to cut the data table into three buckets, the slice field is ID
After hash using the ID field, the content of the three buckets is as follows:
Bucket ID hash 3 = 0
3 Xu 12
6 liqiang 17
Bucket ID hash 3 = 1
1 Huang 11
4 Zhong 14
7 Zhonghua 19
Bucket ID hash 3 = 2
2 Li 11
5 Hu 15
The statement for creating a table in this process is as follows:
Create Table bucketmem (ID int, name string, age INT) clustered by (ID) sorted by (id asc) into 3 buckets
Row format delimited fields terminated by '\ T ';

Load data local inpath '/home/hadoop/hfxdoc/bucketmem.txt' into Table bucketmem;
Select * From bucketmem tablesample (bucket 1 out of 4)

# Other operation reference, more complete please refer to the official website: https://cwiki.apache.org/confluence/display/Hive/Home
1) create a table with the same structure as a known table like:
Only copy the table structure, instead of the table content.
Create Table test_like_table like test_bucket;

2) Rename the table:
Alter table table_name Rename to new_table_name

3) add partition add partitions:
Alter table table_name add partition_spec [location 'location1'] partition_spec [location 'location2']

4) modify a column in the table, including the column name/column data type/column position/column comment
Alter table table_name change [column] col_old_name col_new_name column_type [comment col_comment] [first | after column_name]

5) Add/replacecolumns
Alter table table_name add | replace columns (col_name data_type [commentcol_comment],...)
Add columns allows you to add a new column at the end of the current column, but before the partition column.

6) complete statement for creating a table:
Create [External] Table [if not exists] table_name
[(Col_name data_type [comment col_comment],...)]
[Comment table_comment]
[Partitioned by (col_name data_type [comment col_comment],...)]
[Clustered by (col_name, col_name,...) [sorted by (col_name [ASC | DESC],...)] into num_buckets buckets]
[Row format row_format]
[Stored as file_format]
[Location hdfs_path]

7) view HDFS files in hive
> DFS-ls/user;

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More