partitions and buckets of hive tables

Last Update:2016-01-31 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1.Hive partition Table

When Hive uses a SELECT statement to query, it generally scans the entire table and consumes a lot of time to do unnecessary work. Hive can specify the partition space when creating the table, so it can improve the query efficiency when the query is made.

syntax for creating partitioned tables:

[Java]View PlainCopy

CREATE TABLE TableName (
Name string
) partitioned by (Key,type ...);

Example

[Java]View PlainCopy

drop table if exists employees;
CREATE TABLE if not EXISTS employees (
Name String,
Salary Float,
subordinate Array<string>
Deductions map<string,float>
Address struct<street:string,city:string,num:int>
) partitioned by (Date_time String,type string)
Row format delimited fields terminated by ' \ t '
Collection items terminated by ', '
Map keys terminated by ': '
Lines terminated by ' \ n '
Stored as Textfile
Location '/hive/inner ';

Attached: The above statement indicates that the Date_time and type two partitions are also called two partitions when the table is built, and a partition is called a single partition, and when the above statement is executed, we can see that the result of the table is more than two fields of the partition.

[Java]View PlainCopy

DESC employees;

The results are as follows:

Note: The performance in the file system is date_time as a folder, and the type is a date_time subfolder.

Inserting data into a partitioned table (to specify a partition)

[Java]View PlainCopy

hive> Load Data local inpath '/usr/local/src/employee_data ' into table employees partition (Date_time=' 2015-01_24 ', type=' userInfo ');
Copying data from File:/usr/local/src/employee_data
Copying File:file:/usr/local/src/employee_data
Loading data to Table default.employees partition (date_time=2015-01_24, type=userinfo)
Ok
Time taken: 0.22 seconds
Hive>

After the data is inserted, it appears in the file system as:

Note: From here we can find that the type partition exists as a subfolder.

To add a partition:

[Java]View PlainCopy

ALTER TABLE employees add if not exists partition (date_time=' 2088-08-18 ', type=' liaozhongmin ');

Note: We can add the partition first and then add the data to the corresponding partition.

To view partitions:

[Java]View PlainCopy

Show partitions employees;

Attached: Employees here represents the table name.

Delete the unwanted partitions

[Java]View PlainCopy

ALTER TABLE Employees drop if exists partition (date_time=' 2015-01_24 ', type=' userInfo ');

To view the partitions again:

2.Hive Bucket Table

For each table or partition, hive can be further organized into buckets, which means that the buckets are more granular data range divisions. Hive is a bucket for a column. Hive uses a hash of the column values and then divides the number of buckets to determine which bucket the record is stored in. The benefit of the buckets is that higher query processing efficiency can be achieved. Make sampling more efficient.

Example:

[Java]View PlainCopy

CREATE TABLE Bucketed_user (
ID int,
Name string
)
Clustered by (ID) sorted by (name) into 4 buckets
Row format delimited fields terminated by ' \ t '
stored as textfile;

We use the user ID to determine how to divide the bucket (hive uses a hash of the value and divides the result by the number of buckets to take the remainder)

Another problem to note is that when using the bucket table we have to open the bucket table:

[Java]View PlainCopy

Set hive.enforce.bucketing = true;

Now we will query the table employees in name and salary and insert it into this table:

[Java]View PlainCopy

Insert Overwrite table Bucketed_user select Salary,name from Employees;

We can view the inserted data through the query statement:

The data is represented in the file as follows, divided into four buckets:

When querying from a bucket table, hive calculates the data stored in buckets based on the field of the bucket, and then goes directly to the corresponding bucket to fetch the data, which improves efficiency.

Partition and bucket of hive table

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

partitions and buckets of hive tables

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

partitions and buckets of hive tables

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support