The insurance company has a table that records the customer's information, which includes the customer's id,name and age (to demonstrate that only these fields are listed).
To create a hive table:
CREATE TABLE Customer
(
ID int,
Age tinyint,
Name string
)
Partitioned by (DT string)
Row format delimited
Fields terminated by ' | '
stored as textfile;
Import Initialization data:
Load data local inpath '/home/hadoop/hivetestdata/customer.txt ' into table customer partition (dt = ' 201506 ');
Hive> SELECT * from the customer order by ID;
Customer.id Customer.ageCustomer.name Customer.dt
1 25 Jiangshouzhuang201506
2 23 Zhangyun201506
3 24 Yiyi201506
4 32 Mengmeng201506
For insurers, customers change every day, we use temporary data sheet customer_temp to record daily customer information, and the fields and attributes are consistent with the Customer table.
CREATE table customer_temp like customer;
Load data local inpath '/home/hadoop/hivetestdata/customer_temp.txt ' into table customer_temp partition (dt = ' 201506 ');
The included data examples are as follows:
Hive> select * from Customer_temp;
Customer_temp.id Customer_temp.ageCustomer_temp.name Customer_temp.dt
1 26 Jiangshouzhuang201506
5 45 Xiaosan201506
If you need to implement an incremental update of the Customer table, we need to perform a full outer join of the two tables and update the modified data in the Customer_temp table to the Customer table.
Hive (Hive) > select * from Customer_temp
> UNION ALL
> select a.* from Customer a
> LEFT OUTER join customer_temp b
> On a.id = b.id where b.id is null;
_u1.id _u1.age_u1.name _u1.dt
2 23 Zhangyun201506
3 24 Yiyi201506
4 32 Mengmeng201506
1 26 Jiangshouzhuang201506
5 45 Xiaosan201506
Before you see online there is a problem with the use of a method similar to the following:
Hive> Select Customer.id,
COALESCE (Customer_temp.age,customer.age),
Customer.name,
COALESCE (CUSTOMER_TEMP.DT,CUSTOMER.DT)
From Customer_temp
Full outer JOIN customer on customer_temp.id = Customer.id;
The result after execution is:
Customer.id _c1Customer.name _c3
1 26 Jiangshouzhuang201506
2 23 Zhangyun201506
3 24 Yiyi201506
4 32 Mengmeng201506
Null 45 Null 201506
It can be seen that there is a problem.
If friends have better methods of optimization please enlighten, thank you.
Implementing incremental updates in hive