Deep understanding of hive partitioning, partitioning is just a logical structure

Last Update:2015-04-27 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

On the internet there is a partition about the use of hive to explain the better, reproduced:

I. BACKGROUND

1, in the Hive Select query will generally scan the entire table content, will consume a lot of time to do unnecessary work. Sometimes you only need to scan a subset of the data in the table, so the partition concept is introduced when the table is built.

2. The partition table refers to the partition space of the partition specified when the table was created.

3, if you need to create a partitioned table, you need to call the CREATE table when the optional parameters partitioned by, see table creation syntax structure.

Second, technical details

1, a table can have one or more partitions, each partition as a folder in the form of a separate table folder directory.

2, table and column names are case-insensitive.

3, the partition is in the form of a field in the table structure exists, through the describe Table command can be viewed to the existence of a field, but the field does not hold the actual data content, is only the representation of the partition.

4. Build the syntax of the table (see partitioned by parameter for partitioning):

CREATE [EXTERNAL] TABLE [IF not EXISTS] table_name [(col_name data_type [COMMENT col_comment], ...)] [COMMENT Table_comment] [Partitioned by (Col_name data_type [COMMENT col_comment], ...)] [CLUSTERED by (Col_name, col_name, ...) [SORTED by (Col_name [asc| DESC], ...)] into num_buckets buckets] [ROW FORMAT Row_format] [STORED as File_format] [location Hdfs_path]

5, the partition table is divided into 2 kinds, one is a single partition, that is, in the Table folder directory is only a folder directory. Another is multi-partition, the table folder under the Multi-folder nested mode.

A, single partition build Table statement: CREATE TABLE day_table (id int, content String) partitioned by (DT string), single partition table, partitioned by day, Id,content,dt three columns exist in the table structure.

b, two-partition statement: CREATE TABLE day_hour_table (id int, content String) partitioned by (DT string, hour string), double-partitioned table, partitioned by day and hour, new in table structure Added DT and hour two columns.

Table folder directory (multi-partition Table):

6. Add Partition table syntax (table created, add partition on this basis):

ALTER TABLE table_name ADD partition_spec [location ' Location1 '] partition_spec [location ' Location2 '] ... partition_s PEC:: PARTITION (Partition_col = partition_col_value, Partition_col = Partiton_col_value, ...)

Users can use ALTER table add PARTITION to add partitions to a table. Quoted when the partition name is a string. Cases:

ALTER TABLE day_table ADD PARTITION (dt= ' 2008-08-08 ', hour= ') location '/path/pv1.txt ' PARTITION (dt= ' 2008-08-08 ', hou R= ' ") location '/path/pv2.txt ';

7. Delete Partition syntax:

ALTER TABLE table_name DROP partition_spec, Partition_spec,...

The user can delete the partition with ALTER TABLE drop PARTITION. The metadata and data for the partition will be deleted together. Cases:

ALTER TABLE day_hour_table DROP PARTITION (dt= ' 2008-08-08 ', hour= ' 09 ');

8. The data is loaded into the partition table syntax:

LOAD DATA [LOCAL] inpath ' filepath ' [OVERWRITE] into TABLE tablename [PARTITION (Partcol1=val1, Partcol2=val2 ...)]

Cases:

LOAD DATA inpath '/user/pv.txt ' into TABLE day_hour_table PARTITION (dt= ' 2008-08-08 ', hour= ' 08 '); LOAD DATA local inpath '/user/hua/* ' into TABLE day_hour partition (dt= ' 2010-07-07 ');

When the data is loaded into the table, no transformations are made to the data. The load operation simply copies the data to the location corresponding to the Hive table. When the data is loaded, a directory is automatically created under the table, and the file is stored under that partition.

9. Partition-based query statements:

SELECT day_table.* from day_table WHERE day_table.dt>= ' 2008-08-08 ';

10. View the partition statement:

Hive> Show Partitions day_hour_table; OK dt=2008-08-08/hour=08 dt=2008-08-08/hour=09 dt=2008-08-09/hour=09

Iii. Summary

1. In Hive, one of the Partition in the table corresponds to a directory under the table, and all Partition data is stored in the directory of the most-word set.

2, generally partition is a supplementary query, narrow the scope of the query, speed up the retrieval of data and data in accordance with certain specifications and conditions for management.

——————————————————————————————————————

Operations on partition in hive:
Hive> CREATE TABLE MP (a string) partitioned by (b string, c string);
Ok
Time taken:0.044 seconds
Hive> ALTER TABLE MP add partition (b= ' 1 ', c= ' 1 ');
Ok
Time taken:0.079 seconds
Hive> ALTER TABLE MP add partition (b= ' 1 ', c= ' 2 ');
Ok
Time taken:0.052 seconds
Hive> ALTER TABLE MP add partition (b= ' 2 ', c= ' 2 ');
Ok
Time taken:0.056 seconds
Hive> Show Partitions MP;
Ok
B=1/c=1
b=1/c=2
b=2/c=2
Time taken:0.046 seconds
Hive> explain extended ALTER TABLE MP drop partition (b= ' 1 ');
Ok
ABSTRACT SYNTAX TREE:
(Tok_altertable_dropparts MP (Tok_partspec (tok_partval B ' 1 ')))

STAGE DEPENDENCIES:
Stage-0 is a root stage

STAGE plans:
stage:stage-0
Drop Table Operator:
Drop Table
Table:mp

Time taken:0.048 seconds
Hive> ALTER TABLE MP drop partition (b= ' 1 ');
Failed:error in metadata:table was partitioned but partition spec was not specified or tab: {b=1}
Failed:execution Error, return code 1 from Org.apache.hadoop.hive.ql.exec.DDLTask
Hive> Show Partitions MP;
Ok
B=1/c=1
b=1/c=2
b=2/c=2
Time taken:0.044 seconds
Hive> ALTER TABLE MP add partition (b= ' 1 ', c = ' 3 ') partition (b= ' 1 ', c= ' 4 ');
Ok
Time taken:0.168 seconds
Hive> Show Partitions MP;
Ok
B=1/c=1
b=1/c=2
B=1/c=3
B=1/c=4
b=2/c=2
B=2/c=3
Time taken:0.066 seconds
Hive>insert Overwrite table MP partition (b= ' 1 ', c= ' 1 ') select CNT from Tmp_et3;

Hive>alter Table MP Add columns (Newcol string);

Location-specific directory structure
Hive> ALTER TABLE ALTER2 add partition (insertdate= ' 2008-01-01 ') location ' 2008/01/01 ';

Hive> ALTER TABLE ALTER2 add partition (insertdate= ' 2008-01-02 ') location ' 2008/01/02 ';

Deep understanding of hive partitioning, partitioning is just a logical structure

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Deep understanding of hive partitioning, partitioning is just a logical structure

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Deep understanding of hive partitioning, partitioning is just a logical structure

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support