partitions and buckets in the hive

Source: Internet
Author: User

partitions and buckets in the hive


Hive the table into a "Partition" Partition. This is a mechanism for rough partitioning of tables based on the value of the partition column (Partition column, such as date), which speeds up the query speed of data fragmentation (Slice) using partitions


Tables and partitions can be further divided into "buckets" (Bucket) it provides additional results for the data for more efficient query processing, for example, by dividing buckets by user ID, we can quickly compute a query-based query on a random sample of all user collections.



The following is considered in the log file, where each record in the log file contains a partition that we normally partition by date, and the same day records are placed in the same area.

Partitions are defined by using the partition BY clause when the table is created, and the clause needs to define a list of columns

Bucket, you can divide the table into a set of sections, each of which is based on a set of column modulo to determine the following we are based on TS to the 4 modulo decision bucket

As follows:



The specified partition value to display when we load the data into the partition table, for example, we have a file under the directory 20140418GB.txt.


We loaded the data into the table logs



Now let's look at the HDFS structure and the data inside.

We view in eclipse


But the strange thing is, we can see the catalogue inside.

Here does not appear the barrel, and then we do the data lookup, in the form of a split bucket



The results are as follows


This result includes all three files that meet the required records

I used to use the barrel alone, the experiment shows that if the separate barrel then will see the pieces of the barrel, but partitions and barrels together, but only see the partition directory


We are not able to understand this, when partitions and drums come together, divided the good area is the mainstream, we can see the directory in the partition, but the divided bucket, we no longer see, structure exists, just do not show

And when we do the search, we can use the


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.