Problem:
Junk data appears online. Because the data is imported into hive using dynamic partition, some of these junk data are in the partiition field.
Like:
> Show partitions dpdw_traffic_base;
Hp_stat_time = 234123417234719237491/hp_log_type = 0/hp_host =__ hive_default_partition _/hp_from_type =__ hive_default_partition __
Hp_stat_time = r_ready % 3d407; r_load % 3d641/hp_log_type = 0/hp_host =__ hive_default_partition _/hp_from_type =__ hive_default_partition __
Hp_stat_time = r_ready % 3d438; r_load % 3d797/hp_log_type = 0/hp_host =__ hive_default_partition _/hp_from_type =__ hive_default_partition __
.......
Write a script to clear junk data:
#!/bin/shpname=`hadoop fs -ls /user/hive/warehouse/bi.db/dpdw_traffic_base | awk '{print $NF}' | awk '{gsub(/\/user\/hive\/warehouse\/bi.db\/dpdw_traffic_base\/hp_stat_time=/,"",$0); print $0}'`for i in $pnamedo len=`expr length $i` if [ $len -ne 10 -a $i != '__HIVE_DEFAULT_PARTITION__' -a $i != 'items' ] then name="hdfs://10.2.6.102/user/hive/warehouse/bi.db/dpdw_traffic_base/hp_stat_time="$i hadoop fs -rmr $name hive -e "use bi; alter table dpdw_traffic_base drop partition(hp_stat_time='$i')" fidone
The problem occurs, similar:
Hp_stat_time = r_ready % 3d407; r_load % 3d641
The partition cannot be deleted.
Followed up and found that hive will automatically encode some characters, Here % 3d decoding is '='
Also ';' needs to be escaped
So:
Partition of dpdw_traffic_base:
Hp_stat_time = r_ready % 3d91; r_load % 3d351/hp_log_type = 0/hp_host = http % 3A % 2f % 2fwww.dianping.com % 2 fsearch % 2 fkeyword % 2f10% 2f0 _ % 25e9% 259b % 2581% 25e8% 258d % 25a1 25e6% 2583% 2585/hp_from_type =__ hive_default_partition __
The correct deletion statement is:
> Alter table dpdw_traffic_base drop partition (hp_stat_time = 'r _ ready = 91 \; r_load = 351 ');
Dropping the partition hp_stat_time = r_ready % 3d91; r_load % 3d351/hp_log_type = 0/hp_host = http % 3A % 2f % 2fwww.dianping.com % 2 fsearch % 2 fkeyword % 2f10% 2f0 _ % 25e9% 259b % 2581% 25e8% 258d % 25a1 25e6% 2583% 2585/hp_from_type =__ hive_default_partition __
OK