Hive Dynamic Partitioning Combat

Source: Internet
Author: User

a) Two types of partitions are supported in hive:
    • static partition sp (static partition)
    • Dynamic partition DP (partition)
The main difference between static partitioning and dynamic partitioning is that static partitioning is specified manually, and dynamic partitioning is judged by data. In detail, the columns of a static partition are actually compiled at the time of compilation, and are determined by the user's passing; dynamic partitioning can only be determined when SQL is executed.

II) Actual combat demo How to use dynamic partitioning in hive

1. Create a partitioned table with two partitions DT and HT represent the date and hour

CREATE TABLE partition_table001 (    name STRING,    IP string) partitioned by (DT string, HT string) ROW FORMAT delimited Fields TERMINATED by "\ T";

2, enable hive dynamic partition, only need to set two parameters in hive session:

Set Hive.exec.dynamic.partition=true;set hive.exec.dynamic.partition.mode=nonstrict;

3. Load the data under a date partition of the PARTITION_TABLE001 table to the target table partition_table002
When using a static partition, you must specify the value of the partition, such as:
CREATE table if not exists partition_table002 like Partition_table001;insert overwrite table partition_table002 partition (dt= ' 20150617 ', ht= ' xx ') select Name, IP from partition_table001 where dt= ' 20150617 ' and ht= ' 00 ';

At this point we find that if you want to insert 24 hours of data per day, you need to execute the above statement 24 times. Dynamic partitioning automatically determines which partition the data is to load from, based on the results of the Select.

4. Using Dynamic partitioning

Insert Overwrite table partition_table002 partition (DT, HT) SELECT * from partition_table001 where dt= ' 20150617 ';
Hive first obtains the DT and HT parameter values for the last two locations of select, and then fills the values into the two DT and HT variables in the INSERT statement partition, where the dynamic partition corresponds to the partition value by location. The relationship between the value of the original table select and the value of the output partition is determined only by location, not with the name, such as the name of DT and St are not related at all.
You can insert 24 HT partitions under 20150617 into a new table with just one sentence of SQL.

c) Static and dynamic partitioning can be mixed
1. All DP
INSERT OVERWRITE TABLE T PARTITION (ds, hr) SELECT key, Value, DS, hr from Srcpart WHERE DS was not null and hr>10;
2, DP/SP combination
INSERT OVERWRITE TABLE T PARTITION (ds= ' 2010-03-03 ', hr) SELECT key, value,/*ds,*/hr from Srcpart WHERE DS isn't null an D hr>10;
3. When the SP is a sub-partition of DP, the following DML will error because the partitioning order determines the inheritance of the directory in HDFs, which cannot be changed
--Throw an exceptioninsert OVERWRITE TABLE T PARTITION (ds, hr = one) SELECT key, value, ds/*, hr*/from Srcpart WHERE ds I s not null and hr=11;
4. Multiple table Inserts
From Sinsert OVERWRITE TABLE T PARTITION (ds= ' 2010-03-03 ', hr) SELECT key, Value, DS, hr from Srcpart WHERE DS was not null and Hr>10insert OVERWRITE TABLE R PARTITION (ds= ' 2010-03-03, hr=12) SELECT key, Value, DS, hr from Srcpart where DS is n OT null and hr = 12;
5, CTAS, (Create-as statement), DP and SP under the CTAS syntax is slightly different, because the target table schema cannot be completely passed from the SELECT statement. You need to specify the partition column in the CREATE statement
CREATE TABLE T (key int, value string) partitioned by (DS string, hr int.) Asselect key, Value, DS, Hr+1 hr1 from Srcpart W Here DS was not null and hr>10;
6, the above shows the CTAs usage under DP, if you want to add some of your own constants on the partition column, you can do this
CREATE TABLE T (key int, value string) partitioned by (DS string, hr int.) Asselect key, value, "2010-03-03", hr+1 hr1 from Srcpart WHERE DS is not null and hr>10;

IV) Summary:
From the above case, we can find the best practice of using dynamic partitioning features in hive: For tables that have a large number of two-level partitions, dynamic partitioning can be very smart for loading tables, while static partition values must be in front of dynamic partition values when used in combination

Hive Dynamic Partitioning Combat

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.