Introduction to hive partition

Last Update:2018-05-27 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I have explained how to use hive partition on the Internet. I have reprinted it: 1. Background 1. the whole table content is usually scanned in HiveSelect queries, it will consume a lot of time to do unnecessary work. Sometimes you only need to scan a part of the data in the table, so the partition concept is introduced during table creation. 2. A partition table is specified during table creation.

I have explained how to use hive partition on the Internet. I have reprinted the following: 1. Background 1. The entire table is typically scanned in Hive Select queries, it will consume a lot of time to do unnecessary work. Sometimes you only need to scan a part of the data in the table, so the partition concept is introduced during table creation. 2. A partition table is specified during table creation.

I have explained how to use hive partition on the Internet:

I. background

1. In Hive Select queries, the entire table content is usually scanned, which consumes a lot of time for unnecessary work. Sometimes you only need to scan a part of the data in the table, so the partition concept is introduced during table creation.

2. A partition table refers to the partition space specified during table creation.

3. To create a partitioned table, call the optional partitioned by parameter when creating the table. For details, see the syntax structure created in the table.

Ii. Technical Details

1. A table can have one or more partitions. Each partition exists in a folder directory.

2. the table and column names are case-insensitive.

3. partitions exist in the table structure in the form of fields. You can view the existence of fields through the describe table command. However, this field does not store actual data, but only indicates partitions.

4. Table creation syntax (see the partitioned by parameter for partition creation ):

CREATE [EXTERNAL] TABLE [if not exists] table_name [(col_name data_type [COMMENT col_comment],...)] [COMMENT table_comment] [partitioned by (col_name data_type [COMMENT col_comment],...)] [clustered by (col_name, col_name ,...) [sorted by (col_name [ASC | DESC],...)] INTO num_buckets BUCKETS] [row format row_format] [stored as file_format] [LOCATION hdfs_path]

5. There are two types of Partition Table creation: one is single partition, that is, there is only one level folder directory under the table folder directory. The other is multi-partition, and the multi-Folder nesting mode appears under the table folder.

A. Single-Partition table creation statement: create table day_table (id int, content string) partitioned by (dt string); single-partition table, partitioned by day, with id, content, dt columns.

B. Statements for table creation in two partitions: create table day_hour_table (id int, content string) partitioned by (dt string, hour string); dual-partition table, partitioned by day and hour, the dt and hour columns are added to the table structure.

Table folder directory (Multi-partition table ):

6. Add Partition Table syntax (the table has been created and partitions are added on this basis ):

Alter table table_name ADD partition_spec [LOCATION 'location1'] partition_spec [LOCATION 'location2']... partition_spec: PARTITION (partition_col = PARTITION, partition_col = PARTITION ,...)

You can use alter table add partition to ADD partitions to a TABLE. When the partition name is a string, enclose it with quotation marks. Example:

Alter table day_table add partition (dt = '2017-08-08 ', hour = '08') location'/path/pv1.txt 'partition (dt = '2017-08-08 ', hour = '09') location '/path/pv2.txt ';

7. Partition deletion Syntax:

Alter table table_name DROP partition_spec, partition_spec ,...

You can use alter table drop partition to delete partitions. The metadata and data of the partition will be deleted. Example:

Alter table day_hour_table drop partition (dt = '2017-08-08 ', hour = '09 ');

8. syntax for loading data into a partition table:

Load data [LOCAL] INPATH 'filepath' [OVERWRITE] into table tablename [PARTITION (partcol1 = val1, partcol2 = val2...)]

Example:

Load data inpath '/user/pv.txt 'into TABLE day_hour_table PARTITION (dt = '2017-08-08', hour = '08 '); load data local INPATH '/user/hua/*' into table day_hour partition (dt = '2017-07-07 ');

When data is loaded into a table, no conversion is performed on the data. The Load operation only copies data to the location corresponding to the Hive table. When data is loaded, a directory is automatically created under the table, and the files are stored in the partition.

9. Partition-based query statement:

SELECT day_table. * FROM day_table WHERE day_table.dt> = '2017-08-08 ';

10. view the partition statement:

Hive> show partitions day_hour_table; OK dt = 2008-08-08/hour = 08 dt = 2008-08-08/hour = 09 dt = 2008-08-09/hour = 09

Iii. Summary

1. In Hive, a Partition in the table corresponds to a directory in the table, and all Partition data is stored in the directory of the most word set.

2. In general, partition is used to assist in queries, narrow the query scope, speed up data retrieval, and manage data according to certain specifications and conditions.

--------------------------------------

Operations on partition in hive:
Hive> create table mp (a string) partitioned by (B string, c string );
OK
Time taken: 0.044 seconds
Hive> alter table mp add partition (B = '1', c = '1 ');
OK
Time taken: 0.079 seconds
Hive> alter table mp add partition (B = '1', c = '2 ');
OK
Time taken: 0.052 seconds
Hive> alter table mp add partition (B = '2', c = '2 ');
OK
Time taken: 0.056 seconds
Hive> show partitions mp;
OK
B = 1/c = 1
B = 1/c = 2
B = 2/c = 2
Time taken: 0.046 seconds
Hive> explain extended alter table mp drop partition (B = '1 ');
OK
Abstract syntax tree:
(TOK_ALTERTABLE_DROPPARTS mp (TOK_PARTSPEC (TOK_PARTVAL B '1 ')))

Stage dependencies:
Stage-0 is a root stage

Stage plans:
Stage: Stage-0
Drop Table Operator:
Drop Table
Table: mp

Time taken: 0.048 seconds
Hive> alter table mp drop partition (B = '1 ');
FAILED: Error in metadata: table is partitioned but partition spec is not specified or tab: {B = 1}
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exe c. DDLTask
Hive> show partitions mp;
OK
B = 1/c = 1
B = 1/c = 2
B = 2/c = 2
Time taken: 0.044 seconds
Hive> alter table mp add partition (B = '1', c = '3') partition (B = '1', c = '4 ');
OK
Time taken: 0.168 seconds
Hive> show partitions mp;
OK
B = 1/c = 1
B = 1/c = 2
B = 1/c = 3
B = 1/c = 4
B = 2/c = 2
B = 2/c = 3
Time taken: 0.066 seconds
Hive> insert overwrite table mp partition (B = '1', c = '1') select cnt from tmp_et3;

Hive> alter table mp add columns (newcol string );

Location specifies the directory structure
Hive> alter table alter2 add partition (insertdate = '2017-01-01 ') location '2017/01 ';

Hive> alter table alter2 add partition (insertdate = '2017-01-02 ') location '2017/02 ';

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Introduction to hive partition

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Introduction to hive partition

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support