Introduction to hive partition

Source: Internet
Author: User
I have explained how to use hive partition on the Internet. I have reprinted it: 1. Background 1. the whole table content is usually scanned in HiveSelect queries, it will consume a lot of time to do unnecessary work. Sometimes you only need to scan a part of the data in the table, so the partition concept is introduced during table creation. 2. A partition table is specified during table creation.

I have explained how to use hive partition on the Internet. I have reprinted the following: 1. Background 1. The entire table is typically scanned in Hive Select queries, it will consume a lot of time to do unnecessary work. Sometimes you only need to scan a part of the data in the table, so the partition concept is introduced during table creation. 2. A partition table is specified during table creation.

I have explained how to use hive partition on the Internet:

I. background

1. In Hive Select queries, the entire table content is usually scanned, which consumes a lot of time for unnecessary work. Sometimes you only need to scan a part of the data in the table, so the partition concept is introduced during table creation.

2. A partition table refers to the partition space specified during table creation.

3. To create a partitioned table, call the optional partitioned by parameter when creating the table. For details, see the syntax structure created in the table.

Ii. Technical Details

1. A table can have one or more partitions. Each partition exists in a folder directory.

2. the table and column names are case-insensitive.

3. partitions exist in the table structure in the form of fields. You can view the existence of fields through the describe table command. However, this field does not store actual data, but only indicates partitions.

4. Table creation syntax (see the partitioned by parameter for partition creation ):

CREATE [EXTERNAL] TABLE [if not exists] table_name [(col_name data_type [COMMENT col_comment],...)] [COMMENT table_comment] [partitioned by (col_name data_type [COMMENT col_comment],...)] [clustered by (col_name, col_name ,...) [sorted by (col_name [ASC | DESC],...)] INTO num_buckets BUCKETS] [row format row_format] [stored as file_format] [LOCATION hdfs_path]

5. There are two types of Partition Table creation: one is single partition, that is, there is only one level folder directory under the table folder directory. The other is multi-partition, and the multi-Folder nesting mode appears under the table folder.

A. Single-Partition table creation statement: create table day_table (id int, content string) partitioned by (dt string); single-partition table, partitioned by day, with id, content, dt columns.

B. Statements for table creation in two partitions: create table day_hour_table (id int, content string) partitioned by (dt string, hour string); dual-partition table, partitioned by day and hour, the dt and hour columns are added to the table structure.

Table folder directory (Multi-partition table ):


6. Add Partition Table syntax (the table has been created and partitions are added on this basis ):

Alter table table_name ADD partition_spec [LOCATION 'location1'] partition_spec [LOCATION 'location2']... partition_spec: PARTITION (partition_col = PARTITION, partition_col = PARTITION ,...)

You can use alter table add partition to ADD partitions to a TABLE. When the partition name is a string, enclose it with quotation marks. Example:

Alter table day_table add partition (dt = '2017-08-08 ', hour = '08') location'/path/pv1.txt 'partition (dt = '2017-08-08 ', hour = '09') location '/path/pv2.txt ';

7. Partition deletion Syntax:

Alter table table_name DROP partition_spec, partition_spec ,...

You can use alter table drop partition to delete partitions. The metadata and data of the partition will be deleted. Example:

Alter table day_hour_table drop partition (dt = '2017-08-08 ', hour = '09 ');

8. syntax for loading data into a partition table:

Load data [LOCAL] INPATH 'filepath' [OVERWRITE] into table tablename [PARTITION (partcol1 = val1, partcol2 = val2...)]

Example:

Load data inpath '/user/pv.txt 'into TABLE day_hour_table PARTITION (dt = '2017-08-08', hour = '08 '); load data local INPATH '/user/hua/*' into table day_hour partition (dt = '2017-07-07 ');

When data is loaded into a table, no conversion is performed on the data. The Load operation only copies data to the location corresponding to the Hive table. When data is loaded, a directory is automatically created under the table, and the files are stored in the partition.

9. Partition-based query statement:

SELECT day_table. * FROM day_table WHERE day_table.dt> = '2017-08-08 ';

10. view the partition statement:

Hive> show partitions day_hour_table; OK dt = 2008-08-08/hour = 08 dt = 2008-08-08/hour = 09 dt = 2008-08-09/hour = 09

Iii. Summary

1. In Hive, a Partition in the table corresponds to a directory in the table, and all Partition data is stored in the directory of the most word set.

2. In general, partition is used to assist in queries, narrow the query scope, speed up data retrieval, and manage data according to certain specifications and conditions.

--------------------------------------

Operations on partition in hive:
Hive> create table mp (a string) partitioned by (B string, c string );
OK
Time taken: 0.044 seconds
Hive> alter table mp add partition (B = '1', c = '1 ');
OK
Time taken: 0.079 seconds
Hive> alter table mp add partition (B = '1', c = '2 ');
OK
Time taken: 0.052 seconds
Hive> alter table mp add partition (B = '2', c = '2 ');
OK
Time taken: 0.056 seconds
Hive> show partitions mp;
OK
B = 1/c = 1
B = 1/c = 2
B = 2/c = 2
Time taken: 0.046 seconds
Hive> explain extended alter table mp drop partition (B = '1 ');
OK
Abstract syntax tree:
(TOK_ALTERTABLE_DROPPARTS mp (TOK_PARTSPEC (TOK_PARTVAL B '1 ')))

Stage dependencies:
Stage-0 is a root stage

Stage plans:
Stage: Stage-0
Drop Table Operator:
Drop Table
Table: mp


Time taken: 0.048 seconds
Hive> alter table mp drop partition (B = '1 ');
FAILED: Error in metadata: table is partitioned but partition spec is not specified or tab: {B = 1}
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exe c. DDLTask
Hive> show partitions mp;
OK
B = 1/c = 1
B = 1/c = 2
B = 2/c = 2
Time taken: 0.044 seconds
Hive> alter table mp add partition (B = '1', c = '3') partition (B = '1', c = '4 ');
OK
Time taken: 0.168 seconds
Hive> show partitions mp;
OK
B = 1/c = 1
B = 1/c = 2
B = 1/c = 3
B = 1/c = 4
B = 2/c = 2
B = 2/c = 3
Time taken: 0.066 seconds
Hive> insert overwrite table mp partition (B = '1', c = '1') select cnt from tmp_et3;

Hive> alter table mp add columns (newcol string );

Location specifies the directory structure
Hive> alter table alter2 add partition (insertdate = '2017-01-01 ') location '2017/01 ';

Hive> alter table alter2 add partition (insertdate = '2017-01-02 ') location '2017/02 ';

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.