partitions and buckets in the hive
Hive the table into a "Partition" Partition. This is a mechanism for rough partitioning of tables based on the value of the partition column (Partition column, such as date), which speeds up the query speed of data fragmentation (Slice) using partitions
Tables and partitions can be further divided into "buckets" (Bucket) it provides additional results for the data for more efficient query processing, for example, by dividing buckets by user ID, we can quickly compute a query-based query on a random sample of all user collections.
The following is considered in the log file, where each record in the log file contains a partition that we normally partition by date, and the same day records are placed in the same area.
Partitions are defined by using the partition BY clause when the table is created, and the clause needs to define a list of columns
Bucket, you can divide the table into a set of sections, each of which is based on a set of column modulo to determine the following we are based on TS to the 4 modulo decision bucket
As follows:
The specified partition value to display when we load the data into the partition table, for example, we have a file under the directory 20140418GB.txt.
We loaded the data into the table logs
Now let's look at the HDFS structure and the data inside.
We view in eclipse
But the strange thing is, we can see the catalogue inside.
Here does not appear the barrel, and then we do the data lookup, in the form of a split bucket
The results are as follows
This result includes all three files that meet the required records
I used to use the barrel alone, the experiment shows that if the separate barrel then will see the pieces of the barrel, but partitions and barrels together, but only see the partition directory
We are not able to understand this, when partitions and drums come together, divided the good area is the mainstream, we can see the directory in the partition, but the divided bucket, we no longer see, structure exists, just do not show
And when we do the search, we can use the