Hive中實現累加式更新,Hive實現累加式更新

來源:互聯網
上載者:User

Hive中實現累加式更新,Hive實現累加式更新
保險公司有一個表記錄客戶的資訊,其中包括有客戶的id,name和age(為了示範只列出這幾個欄位)。
建立Hive的表:
create table customer
(
id int,
age tinyint,
name string
)
partitioned by(dt string)
row format delimited
fields terminated by '|'
stored as textfile;


匯入初始化資料:
load data local inpath '/home/hadoop/hivetestdata/customer.txt' into table customer partition(dt = '201506');
hive> select * from customer order by id;
customer.id customer.agecustomer.name customer.dt
1 25 jiangshouzhuang201506
2 23 zhangyun201506
3 24 yiyi201506
4 32 mengmeng201506


對於保險公司來說,客戶每天都會發生變化,我們使用臨時資料表customer_temp來記錄每天客戶資訊,欄位和屬性與customer表一致,

create table customer_temp like customer;

load data local inpath '/home/hadoop/hivetestdata/customer_temp.txt' into table customer_temp partition(dt = '201506');

包含的資料樣本如下所示:

hive> select * from customer_temp;
customer_temp.id customer_temp.agecustomer_temp.name customer_temp.dt
1 26 jiangshouzhuang201506
5 45 xiaosan201506


如果需要實現客戶表的累加式更新,我們需要將兩個表進行full outer join,將customer_temp表中發生修改的資料更新到customer表中。
hive (hive)> select * from customer_temp
           > union all
           > select a.* from customer a
           > left outer join customer_temp b
           > on a.id = b.id where b.id is null;
_u1.id _u1.age_u1.name _u1.dt
2 23 zhangyun201506
3 24 yiyi201506
4 32 mengmeng201506
1 26 jiangshouzhuang201506
5 45 xiaosan201506


之前看到網上有使用類似如下的方法,感覺是存在問題的:
hive> select customer.id,
coalesce(customer_temp.age,customer.age),
customer.name,
coalesce(customer_temp.dt,customer.dt) 
      from customer_temp 
      full outer join customer on customer_temp.id = customer.id;
執行後的結果為:
customer.id _c1customer.name _c3
1 26 jiangshouzhuang201506
2 23 zhangyun201506
3 24 yiyi201506
4 32 mengmeng201506
NULL 45 NULL 201506


可以看出的確是有問題的。

如果朋友們有更好的最佳化方法請賜教,謝謝。

相關文章

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.