Create an External table with partitions
The advantage of creating an external table is that data can be mounted to the table from hdfs at any time.
The advantage of using partitions is that the query range can be shortened.
The following example shows how to create an external table.
CREATE EXTERNAL TABLE my_daily_report( last_update string, col_a string, col_b string, col_c string, col_d string, col_e string, col_f string, col_g string, col_h string, col_i string, col_j string) PARTITIONED BY ( par_dt string) location '/user/chenshu/data/daily';
Mount the partition directory
alter table my_daily_report add partition (par_dt='20140530') location '/user/chenshu/data/daily/my_daily_report/20140530';
In the preceding example, only one partition is used. In fact, multiple partitions can be used. For example, if a partition is used for daily report management, the partition corresponds to a directory, and the partition can have hour partitions under this directory, store reports of different hours in different directories. At this time, the relationship between partitions is the relationship between the directory tree.
Delete Partition
Of course, you also need to provide a method to delete the part_dt = '000000' partition:
alter table my_daily_report drop partition (par_dt='20140530')
Drop partition deletes all partitions and data. drop partition_spec deletes only the partition metadata and does not delete data.
Note: There is no delete from statement in HIVE. If you only delete all statements in a partition, you can use drop partition here.
Query by partition
Now that you have a partition, if you need to find the data in the partition, it is much faster to specify the partition directory as the query condition in the where clause.
select count(*) from my_daily_report where par_dt='20140531';
Recommended articles:
Http://my.oschina.net/leejun2005/blog/82065