Partitioning is a way in which hive stores data. Storing a column value as a directory for data is a partition. In this way, the query uses the partition column to filter, simply scan the corresponding directory data according to the column values, do not scan other not concerned about the partition, fast location, improve query efficiency. Two types of dynamic and static partitioning:
1. Static partitioning: If the value of the partition is deterministic, it is called a static partition. A partition name has been specified when a new partition is added or when the partition data is loaded.
create table if not exists day_part1( uid int, uname string ) partitioned by(year int,month int) row format delimited fields terminated by '\t'; ##加载数据指定分区 load data local inpath '/root/Desktop/student.txt' into table day_part1 partition(year=2017,month=04); ##新增分区指定分区名 alter table day_part1 add partition(year=2017,month=1) partition(year=2016,month=12);
2. Dynamic partitioning: The value of the partition is non-deterministic and is determined by the input data
2.1 Related properties of dynamic partitioning:
hive.exec.dynamic.partition=true :是否允许动态分区 hive.exec.dynamic.partition.mode=strict :分区模式设置 strict:最少需要有一个是静态分区 nostrict:可以全部是动态分区 hive.exec.max.dynamic.partitions=1000 :允许动态分区的最大数量 hive.exec.max.dynamic.partitions.pernode =100 :单个节点上的mapper/reducer允许创建的最大分区
2.2 Actions for dynamic partitioning
# #创建临时表 CREATE table if not EXISTS TMP (UID int., commentid bigint, recommentid bigint, year int, MO nth int, day int) row format delimited. Terminated by ' \ t '; # #加载数据 Load Data local inpath '/root/desktop/comm ' into table tmp; # #创建动态分区表 CREATE table if not EXISTS DYP1 (UID int., commentid bigint, recommentid bigint) partitioned by T,month int,day int) row format delimited fields terminated by ' \ t '; # #严格模式 INSERT INTO table DYP1 partition (year=2016,month,day) select Uid,commentid,recommentid,month,day from TMP; # #非严格模式 # #设置非严格模式动态分区 set hive.exec.dynamic.partition.mode=nostrict; # #创建动态分区表 CREATE table if not EXISTS DYP2 (UID int., commentid bigint, recommentid bigint) partitioned by T,month int,day int) row format delimited fields terminated by ' \ t '; # #为非严格模式动态分区加载数据 INSERT INTO table DYP2 partition (year,month,day) Select Uid,commentid,recommentid,year,month,day from tmp;
3. Zoning attention to detail
(1), try not to use dynamic partition, because the dynamic partition, will be allocated for each partition reducer number, when the number of partitions, the number of reducer will increase, the server is a disaster.
(2), dynamic partition and static partition difference, static partition regardless of whether there is no data will be created that partition, dynamic partition is a result set will be created, otherwise not created.
(3), the strict mode of hive dynamic partitioning and the strict mode of Hive.mapred.mode provided by Hive.
Hive provides us with a strict pattern: to prevent users from accidentally committing malicious hql
Hive.mapred.mode=nostrict:strict
If the mode value is strict, the following three types of queries will be blocked:
(1), the partition table query, where the filter field is not a partition field.
(2), Cartesian product join query, join query statement, without an on condition or a where condition.
(3), for the order by query, the query with order by does not have a limit statement.
184 Reads